Jump to content

steel51

Clients
  • Posts

    112
  • Joined

  • Last visited

  • Days Won

    1

Reputation Activity

  1. Thanks
    steel51 reacted to Matt for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  2. Like
    steel51 reacted to Stuart Silvester for a blog entry, 4.5: One More Thing...   
    Almost ten years ago we launched the Marketplace; a place to connect Invision Community owners with talented developers creating new functionality.
    Over the decade, the Marketplace has grown to hold thousands of applications, large and small. For many Invision Community owners, the Marketplace has become an essential resource.
    Our aim was always to have the Marketplace available inside your Admin Control Panel to make it even easier to purchase and install extra functionality.

    I'm pleased to say that as of Invision Community 4.5, this is now a reality. You can browse the Marketplace and install new add-ons without leaving the Admin Control Panel.

    Obtaining Resources
    Paid resources can be purchased directly from the Marketplace and are available to install immediately after the payment is complete. You no longer need to download and install the files yourself.
    You may also notice some additional information with the resource listing, we'll be introducing a new 'tab' to marketplace resources to allow the authors to provide more useful information such as answers to frequently asked questions, or configuration instructions etc.

    The video below takes you through the purchase and installation of a Marketplace application.
    marketplace-install.mp4
    Installing an Application
    Updates
    Some of the eagle-eyed among you may have noticed in the first screenshot that there are more 'bubbles' showing in the menu on the left. These are supported for Applications, Plugins, Themes and Languages.
    In Invision Community 4.5 every resource available via the AdminCP is automatically versioned, you will see update notifications for everything you have installed (previously, you would only see update notices if the resource author supports them).
    Installing an update is as simple as clicking on the update notice, then clicking 'update' on the Marketplace listing.

    Installing Updates
    Downloads Changes
    Our Marketplace is built on our Downloads application, during development of this feature we needed to add new functionality. We have included as many of these improvements as possible in our software for the benefit of our customers, some of these are:
    Custom Fields can now be set to only show to members that have purchased a file. Files can now be set to accept a single file upload instead of multiple. New file versions can now be moderated without hiding the current version from view. Downloads REST API Performance Improvements New /download endpoint that counts the download Added more data to the /downloads/file/{id} response Ability to sort file results by last updated date We hope you're as excited about this feature as we are.
  3. Like
    steel51 reacted to Charles for a blog entry, Invision Community 4.3   
    We are happy to announce the new Invision Community 4.3 is available!
    Some highlights in Invision Community 4.3 include...
    Improved Search
    We now support Elasticsearch for scalable and accurate searching that MySQL alone cannot provided. There are also enhancements to the overall search interfaces based on your feedback.

     
    Emoji
    Express yourself with native emoji support in all editors. You can also keep your custom emoticons as you have now.

     
    Member Management
    The AdminCP interface to manage your members is all new allowing you easier control and management of your membership.

     
    Automatic Community Moderation
    You as the administrator set up rules to define how many unique member reports a piece of content needs to receive before it's automatically hidden from view and moderators notified.

     
    Clubs
    The new Clubs feature has been a huge hit with Invision Community users and we are expanding it to include invite-only options, notifications, exposure on the main community pages, paid memberships, and more.
    Custom Email Footers
    Your community generates a lot of email and you can now include dynamic content in the footer to help drive engagement and content discovery. 
    New Gallery Interface
    We have reworked our Gallery system with a simplified upload process and more streamlined image viewing.
     
    The full list follows. Enjoy!
    Content Discovery
    We now support Elasticsearch which is a search utility that allows for much faster and more reliable searching. The REST API now supports search functions. Both MySQL and Elasticsearch have new settings for the admin to use to set search-defaults and default content weighting to better customize search logic to your community. Visitors can now search for Content Pages and Commerce Products. When entering a search term, members now see a more clear interface so they know what areas they are searching in and the method of search. Member Engagement
    Commerce can now send a customizable account welcome email after checkout. You can whitelist emails in the spam service to stop false-positives. REST API has many enhancements to mange members. Ability to join any OAuth service for login management. Invision Community can now be an OAuth endpoint. Wordpress OAuth login method built in. Support for Google's Invisible ReCaptcha. Groups can be excluded from Leaderboard (such as admins or bot groups). All emails generated by Invision Community can now contain admin-defined extra promotional text in the footer such as Our Picks, and Social Links. Admins can now define the order of Complete Your Profile to better control user experience. Clubs
    Option to make a Club visible but invite-only Admins can set an option so any Club a member is part of will also show in the parent application. So if you are in a Club that has a Gallery tab then those image will show both in the Club and in the main Gallery section of the community. Club members can now follow an entire Club rather than just each content section. There is a new option on the Club directory page for a list view which is useful for communities with many Clubs. If you have Commerce you can now enable paid memberships to Clubs. Admins can set limits on number of Clubs per group. If a group has delete permission in their Club, they can now delete empty containers as well. Members can ignore invitations. Moderation and Administration
    Unrestricted moderator or administrator permission sets in the AdminCP are visually flagged. This prevents administrator confusion when they cannot do something as they will be able to quickly see if their account has restrictions. You can choose to be notified with a new Club is created. Moderators can now reply to any content item with a hidden reply. Download screenshot/watermarks can now be rebuilt if you change settings. Support for Facebook Pixel to easily track visitors. Moderators can now delete Gallery albums. Automatic moderation tools with rules to define when content should auto-hide based on user reports. Totally new member management view in AdminCP. More areas are mass-selectable like comments and AdminCP functions for easier management. New Features
    Commerce now has full Stripe support including fraud tools, Apple Pay, and other Stripe features. Commerce packages can now have various custom email events configured (expiring soon, purchased, expired). Full Emojii support in the editor. Complete overhaul of the Gallery upload and image views. Announcements system overhaul. Now global on all pages (not via widget) and new modes including dismissible announcements and top-header floating bar option. Many new reports on traffic and engagement in the AdminCP. Blog has new view modes to offer options for a traditional site blog or a community multi-member blog platform. The content-starter can now leave one reply to Reviews on their item. Commerce now makes it much easier to do basic account-subscriptions when there is no product attached. Useful Improvements
    Forums has a new widget where you can filter by tags. If tags are not required, the tag input box now indicates this so the member knows they do not have to put in tags. Member cover photos can now be clicked to see the full image. Any item with a poll now has a symbol on the list view. Twitch.tv embed support. You can now update/overwrite media in the Pages Media Manager. Mapbox as an additional map provider to Google Maps. Technical Changes
    Direct support for Sparkpost has been removed. Anyone currently using Sparkpost will automatically have their settings converted to the Sparkpost SMTP mode so your email will still work. Your cache engines (like Redis) will be checked on upgrade and in the support tool to ensure they are reachable. Third-party applications will now be visually labeled to distinguish them from Invision Community official applications. The queued tasks list in the AdminCP is now collapsed by default as queued tasks are not something people need to pay much attention to during normal operations. When upgrading from version 3 series you must convert your database to UTF8 and the system saves your original data in tables prefixed with orig. The AdminCP now alerts you these are still present and allows you to remove them to reclaim storage space. On new installs there are now reasonable defaults for upload limits to keep people from eating up storage space. Categories in all apps (forums, gallery albums, databases, etc.) no longer allow HTML in their titles. This has been a concern both in terms of security and usability so we were forced to restrict it. Large improvements to the Redis cache engine including use for sessions. The login with HTTPS option has been removed and those who were using it will be given instructions to convert their entire community to HTTPS. Images loaded through the proxy system now honor image limits for normal uploads. We now consider BBCode deprecated. We are not removing support but will not fix any future issues that may come up.
     
    There's a lot to talk about here so we are going to lock this entry to comments so things do not get confusing. Feel free to comment on upcoming feature-specific entries or start a topic in our Feedback forum.
     
  4. Like
    steel51 reacted to Charles for a blog entry, IPS Community Suite 4.2 Coming Soon   
    We are well into development on IPS Community Suite 4.2 and are excited to start announcing all the new features and improvements.
    Our next big release is focused on engagement with your members. You will see enhancements to our Reputation system, new ways to encourage people to register on your community, and enhancements to existing features to make them more interactive. There are also entirely new capabilities we cannot wait to show you ranging from new ways to organize content to tools to help promote your community.
    Version 4.2 also features a refreshed AdminCP and default front-end design. Theme changes in 4.2 are mostly in the CSS framework so your existing themes will either work without issue or require minor changes to work in the new version.
    Over the next several weeks we will be posting news entries with previews of upcoming features fairly often. Be sure to follow our News section, our Facebook, or Twitter to stay up to date.
    We expect IPS Community Suite 4.2 to be out in mid-2017 with a public preview available sooner.
    Everyone at IPS has worked very hard on this update and we think you will love it!
×
×
  • Create New...