Jump to content

Matt

Management
  • Posts

    69,477
  • Joined

  • Last visited

  • Days Won

    564

Reputation Activity

  1. Thanks
    Matt got a reaction from Ricsca for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  2. Like
    Matt got a reaction from uA_Y_C_A for a blog entry, Your members don't want you to grow (and what to do about it)   
    Every time I checked in with a newly launched running community, it seemed like there were more and more new people posting.
    As a result, I found it harder to find my friends' latest run write-ups and even harder to reply directly to them. Speaking with other early adopters, they felt the same way, and we all eventually drifted out of the community's orbit. 
    It's natural to want your community to grow; indeed, a lot of community management strategies are based on increasing registrations and scaling upwards.
    However, your early adopters may feel very different about growth as they watch their close friendship circles dissolve as more members join and begin posting.
    A small and tightly connected community is very different from a large sprawling community, and often our business goals as community managers can be at odds with our member's goals.
    Let's take a look at the problem and then the solution.

     
    A new community is small and personal. Your early adopters will make friends fast by sharing their experiences and stories. They start to learn about each other and actively look forward to new posts and content. It's easy to keep track of the conversations and people in those early days when memberships are still in their infancy.
    Before themes and topics drive your community, the primary reason your members return is to strengthen burgeoning bonds.
    As your thriving community grows, more names appear, generating more posts and content. It can become harder to keep track of those personal conversations and friends. For those early adopters, it becomes overwhelming, and the feel of the community changes.
    The key to growth is to do it with consideration and understanding by allowing your members to retain smaller friendship circles within the larger community. Think of these small circles as a secure basecamp your members will use to explore more of the community together.
    How you structure your community can heavily influence member behaviour, so let's ensure you are set up for success.
    Forum structure
    Deciding how many forums to have largely depends on the size of your community. Generally, fewer is better; however, adding more when activity increases is recommended. Using the example of a running community, when you have few members, a single topic can be used to keep track of workouts; however, as membership increases, a dedicated forum where members can post and maintain their own workout log topic makes it easier for others to find specific member's logs rather than trawling through a long busy topic.
    If you're in doubt, asking your community is always a great way to draw out real honest feedback and guidance on how to improve.

    Nerd Fitness forums allow each member to maintain their own training log in their busy forum
    Clubs
    Creating a sub-community is a big decision. On the one hand, you syphon off discussion to areas outside the main community, but this can be an advantage if you want members to retain their smaller friendship circles. On the other hand, you may find an appetite for more niched discussion within your topic. For example, while your site may be based around road running, you may have a small group specifically interested in mountain running. Using a club allows them to follow that passion without altering the core purpose of your community.

    Even though our own community is here to serve our clients, we have a health club where members can discuss health and fitness away from the community's primary aim
    Follow
    Using the robust follow and notification tools is an efficient way to let members know when a favoured member posts something new or a loved topic gets a reply. Make sure your members know how to set up notifications and the different ways to receive them, such as via mobile, email, or the community's bell.
    Your members need not miss a friends update again.

    We have a very comprehensive follow system
    Discover
    Activity streams allow members to personalise their first point of discovery. In addition, the flexibility of the streams will enable members to choose which member's content to see and which forum's content to include in a single news feed style stream. 
    Giving your members the ability to customise which content they see when they first visit the community allows them to check in with their favourite areas before exploring the rest of the community.

    NerdFitness use streams to show content for each 'guild'
    Growing a community from a handful of people to tens of thousands takes a lot of planning. Unfortunately, it's easy to focus on just numbers and forget about the people behind them. However, aligning your business goals with your members' goals is critical when growing beyond your early adopters.
    Setting up your community for success using our built-in tools will help your members feel comfortable as you grow.

     
  3. Like
    Matt got a reaction from Unienc for a blog entry, Your members don't want you to grow (and what to do about it)   
    Every time I checked in with a newly launched running community, it seemed like there were more and more new people posting.
    As a result, I found it harder to find my friends' latest run write-ups and even harder to reply directly to them. Speaking with other early adopters, they felt the same way, and we all eventually drifted out of the community's orbit. 
    It's natural to want your community to grow; indeed, a lot of community management strategies are based on increasing registrations and scaling upwards.
    However, your early adopters may feel very different about growth as they watch their close friendship circles dissolve as more members join and begin posting.
    A small and tightly connected community is very different from a large sprawling community, and often our business goals as community managers can be at odds with our member's goals.
    Let's take a look at the problem and then the solution.

     
    A new community is small and personal. Your early adopters will make friends fast by sharing their experiences and stories. They start to learn about each other and actively look forward to new posts and content. It's easy to keep track of the conversations and people in those early days when memberships are still in their infancy.
    Before themes and topics drive your community, the primary reason your members return is to strengthen burgeoning bonds.
    As your thriving community grows, more names appear, generating more posts and content. It can become harder to keep track of those personal conversations and friends. For those early adopters, it becomes overwhelming, and the feel of the community changes.
    The key to growth is to do it with consideration and understanding by allowing your members to retain smaller friendship circles within the larger community. Think of these small circles as a secure basecamp your members will use to explore more of the community together.
    How you structure your community can heavily influence member behaviour, so let's ensure you are set up for success.
    Forum structure
    Deciding how many forums to have largely depends on the size of your community. Generally, fewer is better; however, adding more when activity increases is recommended. Using the example of a running community, when you have few members, a single topic can be used to keep track of workouts; however, as membership increases, a dedicated forum where members can post and maintain their own workout log topic makes it easier for others to find specific member's logs rather than trawling through a long busy topic.
    If you're in doubt, asking your community is always a great way to draw out real honest feedback and guidance on how to improve.

    Nerd Fitness forums allow each member to maintain their own training log in their busy forum
    Clubs
    Creating a sub-community is a big decision. On the one hand, you syphon off discussion to areas outside the main community, but this can be an advantage if you want members to retain their smaller friendship circles. On the other hand, you may find an appetite for more niched discussion within your topic. For example, while your site may be based around road running, you may have a small group specifically interested in mountain running. Using a club allows them to follow that passion without altering the core purpose of your community.

    Even though our own community is here to serve our clients, we have a health club where members can discuss health and fitness away from the community's primary aim
    Follow
    Using the robust follow and notification tools is an efficient way to let members know when a favoured member posts something new or a loved topic gets a reply. Make sure your members know how to set up notifications and the different ways to receive them, such as via mobile, email, or the community's bell.
    Your members need not miss a friends update again.

    We have a very comprehensive follow system
    Discover
    Activity streams allow members to personalise their first point of discovery. In addition, the flexibility of the streams will enable members to choose which member's content to see and which forum's content to include in a single news feed style stream. 
    Giving your members the ability to customise which content they see when they first visit the community allows them to check in with their favourite areas before exploring the rest of the community.

    NerdFitness use streams to show content for each 'guild'
    Growing a community from a handful of people to tens of thousands takes a lot of planning. Unfortunately, it's easy to focus on just numbers and forget about the people behind them. However, aligning your business goals with your members' goals is critical when growing beyond your early adopters.
    Setting up your community for success using our built-in tools will help your members feel comfortable as you grow.

     
  4. Like
    Matt got a reaction from BomAle for a blog entry, Your members don't want you to grow (and what to do about it)   
    Every time I checked in with a newly launched running community, it seemed like there were more and more new people posting.
    As a result, I found it harder to find my friends' latest run write-ups and even harder to reply directly to them. Speaking with other early adopters, they felt the same way, and we all eventually drifted out of the community's orbit. 
    It's natural to want your community to grow; indeed, a lot of community management strategies are based on increasing registrations and scaling upwards.
    However, your early adopters may feel very different about growth as they watch their close friendship circles dissolve as more members join and begin posting.
    A small and tightly connected community is very different from a large sprawling community, and often our business goals as community managers can be at odds with our member's goals.
    Let's take a look at the problem and then the solution.

     
    A new community is small and personal. Your early adopters will make friends fast by sharing their experiences and stories. They start to learn about each other and actively look forward to new posts and content. It's easy to keep track of the conversations and people in those early days when memberships are still in their infancy.
    Before themes and topics drive your community, the primary reason your members return is to strengthen burgeoning bonds.
    As your thriving community grows, more names appear, generating more posts and content. It can become harder to keep track of those personal conversations and friends. For those early adopters, it becomes overwhelming, and the feel of the community changes.
    The key to growth is to do it with consideration and understanding by allowing your members to retain smaller friendship circles within the larger community. Think of these small circles as a secure basecamp your members will use to explore more of the community together.
    How you structure your community can heavily influence member behaviour, so let's ensure you are set up for success.
    Forum structure
    Deciding how many forums to have largely depends on the size of your community. Generally, fewer is better; however, adding more when activity increases is recommended. Using the example of a running community, when you have few members, a single topic can be used to keep track of workouts; however, as membership increases, a dedicated forum where members can post and maintain their own workout log topic makes it easier for others to find specific member's logs rather than trawling through a long busy topic.
    If you're in doubt, asking your community is always a great way to draw out real honest feedback and guidance on how to improve.

    Nerd Fitness forums allow each member to maintain their own training log in their busy forum
    Clubs
    Creating a sub-community is a big decision. On the one hand, you syphon off discussion to areas outside the main community, but this can be an advantage if you want members to retain their smaller friendship circles. On the other hand, you may find an appetite for more niched discussion within your topic. For example, while your site may be based around road running, you may have a small group specifically interested in mountain running. Using a club allows them to follow that passion without altering the core purpose of your community.

    Even though our own community is here to serve our clients, we have a health club where members can discuss health and fitness away from the community's primary aim
    Follow
    Using the robust follow and notification tools is an efficient way to let members know when a favoured member posts something new or a loved topic gets a reply. Make sure your members know how to set up notifications and the different ways to receive them, such as via mobile, email, or the community's bell.
    Your members need not miss a friends update again.

    We have a very comprehensive follow system
    Discover
    Activity streams allow members to personalise their first point of discovery. In addition, the flexibility of the streams will enable members to choose which member's content to see and which forum's content to include in a single news feed style stream. 
    Giving your members the ability to customise which content they see when they first visit the community allows them to check in with their favourite areas before exploring the rest of the community.

    NerdFitness use streams to show content for each 'guild'
    Growing a community from a handful of people to tens of thousands takes a lot of planning. Unfortunately, it's easy to focus on just numbers and forget about the people behind them. However, aligning your business goals with your members' goals is critical when growing beyond your early adopters.
    Setting up your community for success using our built-in tools will help your members feel comfortable as you grow.

     
  5. Like
    Matt got a reaction from sobrenome for a blog entry, Your members don't want you to grow (and what to do about it)   
    Every time I checked in with a newly launched running community, it seemed like there were more and more new people posting.
    As a result, I found it harder to find my friends' latest run write-ups and even harder to reply directly to them. Speaking with other early adopters, they felt the same way, and we all eventually drifted out of the community's orbit. 
    It's natural to want your community to grow; indeed, a lot of community management strategies are based on increasing registrations and scaling upwards.
    However, your early adopters may feel very different about growth as they watch their close friendship circles dissolve as more members join and begin posting.
    A small and tightly connected community is very different from a large sprawling community, and often our business goals as community managers can be at odds with our member's goals.
    Let's take a look at the problem and then the solution.

     
    A new community is small and personal. Your early adopters will make friends fast by sharing their experiences and stories. They start to learn about each other and actively look forward to new posts and content. It's easy to keep track of the conversations and people in those early days when memberships are still in their infancy.
    Before themes and topics drive your community, the primary reason your members return is to strengthen burgeoning bonds.
    As your thriving community grows, more names appear, generating more posts and content. It can become harder to keep track of those personal conversations and friends. For those early adopters, it becomes overwhelming, and the feel of the community changes.
    The key to growth is to do it with consideration and understanding by allowing your members to retain smaller friendship circles within the larger community. Think of these small circles as a secure basecamp your members will use to explore more of the community together.
    How you structure your community can heavily influence member behaviour, so let's ensure you are set up for success.
    Forum structure
    Deciding how many forums to have largely depends on the size of your community. Generally, fewer is better; however, adding more when activity increases is recommended. Using the example of a running community, when you have few members, a single topic can be used to keep track of workouts; however, as membership increases, a dedicated forum where members can post and maintain their own workout log topic makes it easier for others to find specific member's logs rather than trawling through a long busy topic.
    If you're in doubt, asking your community is always a great way to draw out real honest feedback and guidance on how to improve.

    Nerd Fitness forums allow each member to maintain their own training log in their busy forum
    Clubs
    Creating a sub-community is a big decision. On the one hand, you syphon off discussion to areas outside the main community, but this can be an advantage if you want members to retain their smaller friendship circles. On the other hand, you may find an appetite for more niched discussion within your topic. For example, while your site may be based around road running, you may have a small group specifically interested in mountain running. Using a club allows them to follow that passion without altering the core purpose of your community.

    Even though our own community is here to serve our clients, we have a health club where members can discuss health and fitness away from the community's primary aim
    Follow
    Using the robust follow and notification tools is an efficient way to let members know when a favoured member posts something new or a loved topic gets a reply. Make sure your members know how to set up notifications and the different ways to receive them, such as via mobile, email, or the community's bell.
    Your members need not miss a friends update again.

    We have a very comprehensive follow system
    Discover
    Activity streams allow members to personalise their first point of discovery. In addition, the flexibility of the streams will enable members to choose which member's content to see and which forum's content to include in a single news feed style stream. 
    Giving your members the ability to customise which content they see when they first visit the community allows them to check in with their favourite areas before exploring the rest of the community.

    NerdFitness use streams to show content for each 'guild'
    Growing a community from a handful of people to tens of thousands takes a lot of planning. Unfortunately, it's easy to focus on just numbers and forget about the people behind them. However, aligning your business goals with your members' goals is critical when growing beyond your early adopters.
    Setting up your community for success using our built-in tools will help your members feel comfortable as you grow.

     
  6. Like
    Matt got a reaction from Chris Anderson for a blog entry, Your members don't want you to grow (and what to do about it)   
    Every time I checked in with a newly launched running community, it seemed like there were more and more new people posting.
    As a result, I found it harder to find my friends' latest run write-ups and even harder to reply directly to them. Speaking with other early adopters, they felt the same way, and we all eventually drifted out of the community's orbit. 
    It's natural to want your community to grow; indeed, a lot of community management strategies are based on increasing registrations and scaling upwards.
    However, your early adopters may feel very different about growth as they watch their close friendship circles dissolve as more members join and begin posting.
    A small and tightly connected community is very different from a large sprawling community, and often our business goals as community managers can be at odds with our member's goals.
    Let's take a look at the problem and then the solution.

     
    A new community is small and personal. Your early adopters will make friends fast by sharing their experiences and stories. They start to learn about each other and actively look forward to new posts and content. It's easy to keep track of the conversations and people in those early days when memberships are still in their infancy.
    Before themes and topics drive your community, the primary reason your members return is to strengthen burgeoning bonds.
    As your thriving community grows, more names appear, generating more posts and content. It can become harder to keep track of those personal conversations and friends. For those early adopters, it becomes overwhelming, and the feel of the community changes.
    The key to growth is to do it with consideration and understanding by allowing your members to retain smaller friendship circles within the larger community. Think of these small circles as a secure basecamp your members will use to explore more of the community together.
    How you structure your community can heavily influence member behaviour, so let's ensure you are set up for success.
    Forum structure
    Deciding how many forums to have largely depends on the size of your community. Generally, fewer is better; however, adding more when activity increases is recommended. Using the example of a running community, when you have few members, a single topic can be used to keep track of workouts; however, as membership increases, a dedicated forum where members can post and maintain their own workout log topic makes it easier for others to find specific member's logs rather than trawling through a long busy topic.
    If you're in doubt, asking your community is always a great way to draw out real honest feedback and guidance on how to improve.

    Nerd Fitness forums allow each member to maintain their own training log in their busy forum
    Clubs
    Creating a sub-community is a big decision. On the one hand, you syphon off discussion to areas outside the main community, but this can be an advantage if you want members to retain their smaller friendship circles. On the other hand, you may find an appetite for more niched discussion within your topic. For example, while your site may be based around road running, you may have a small group specifically interested in mountain running. Using a club allows them to follow that passion without altering the core purpose of your community.

    Even though our own community is here to serve our clients, we have a health club where members can discuss health and fitness away from the community's primary aim
    Follow
    Using the robust follow and notification tools is an efficient way to let members know when a favoured member posts something new or a loved topic gets a reply. Make sure your members know how to set up notifications and the different ways to receive them, such as via mobile, email, or the community's bell.
    Your members need not miss a friends update again.

    We have a very comprehensive follow system
    Discover
    Activity streams allow members to personalise their first point of discovery. In addition, the flexibility of the streams will enable members to choose which member's content to see and which forum's content to include in a single news feed style stream. 
    Giving your members the ability to customise which content they see when they first visit the community allows them to check in with their favourite areas before exploring the rest of the community.

    NerdFitness use streams to show content for each 'guild'
    Growing a community from a handful of people to tens of thousands takes a lot of planning. Unfortunately, it's easy to focus on just numbers and forget about the people behind them. However, aligning your business goals with your members' goals is critical when growing beyond your early adopters.
    Setting up your community for success using our built-in tools will help your members feel comfortable as you grow.

     
  7. Like
    Matt got a reaction from Charles for a blog entry, Your members don't want you to grow (and what to do about it)   
    Every time I checked in with a newly launched running community, it seemed like there were more and more new people posting.
    As a result, I found it harder to find my friends' latest run write-ups and even harder to reply directly to them. Speaking with other early adopters, they felt the same way, and we all eventually drifted out of the community's orbit. 
    It's natural to want your community to grow; indeed, a lot of community management strategies are based on increasing registrations and scaling upwards.
    However, your early adopters may feel very different about growth as they watch their close friendship circles dissolve as more members join and begin posting.
    A small and tightly connected community is very different from a large sprawling community, and often our business goals as community managers can be at odds with our member's goals.
    Let's take a look at the problem and then the solution.

     
    A new community is small and personal. Your early adopters will make friends fast by sharing their experiences and stories. They start to learn about each other and actively look forward to new posts and content. It's easy to keep track of the conversations and people in those early days when memberships are still in their infancy.
    Before themes and topics drive your community, the primary reason your members return is to strengthen burgeoning bonds.
    As your thriving community grows, more names appear, generating more posts and content. It can become harder to keep track of those personal conversations and friends. For those early adopters, it becomes overwhelming, and the feel of the community changes.
    The key to growth is to do it with consideration and understanding by allowing your members to retain smaller friendship circles within the larger community. Think of these small circles as a secure basecamp your members will use to explore more of the community together.
    How you structure your community can heavily influence member behaviour, so let's ensure you are set up for success.
    Forum structure
    Deciding how many forums to have largely depends on the size of your community. Generally, fewer is better; however, adding more when activity increases is recommended. Using the example of a running community, when you have few members, a single topic can be used to keep track of workouts; however, as membership increases, a dedicated forum where members can post and maintain their own workout log topic makes it easier for others to find specific member's logs rather than trawling through a long busy topic.
    If you're in doubt, asking your community is always a great way to draw out real honest feedback and guidance on how to improve.

    Nerd Fitness forums allow each member to maintain their own training log in their busy forum
    Clubs
    Creating a sub-community is a big decision. On the one hand, you syphon off discussion to areas outside the main community, but this can be an advantage if you want members to retain their smaller friendship circles. On the other hand, you may find an appetite for more niched discussion within your topic. For example, while your site may be based around road running, you may have a small group specifically interested in mountain running. Using a club allows them to follow that passion without altering the core purpose of your community.

    Even though our own community is here to serve our clients, we have a health club where members can discuss health and fitness away from the community's primary aim
    Follow
    Using the robust follow and notification tools is an efficient way to let members know when a favoured member posts something new or a loved topic gets a reply. Make sure your members know how to set up notifications and the different ways to receive them, such as via mobile, email, or the community's bell.
    Your members need not miss a friends update again.

    We have a very comprehensive follow system
    Discover
    Activity streams allow members to personalise their first point of discovery. In addition, the flexibility of the streams will enable members to choose which member's content to see and which forum's content to include in a single news feed style stream. 
    Giving your members the ability to customise which content they see when they first visit the community allows them to check in with their favourite areas before exploring the rest of the community.

    NerdFitness use streams to show content for each 'guild'
    Growing a community from a handful of people to tens of thousands takes a lot of planning. Unfortunately, it's easy to focus on just numbers and forget about the people behind them. However, aligning your business goals with your members' goals is critical when growing beyond your early adopters.
    Setting up your community for success using our built-in tools will help your members feel comfortable as you grow.

     
  8. Like
    Matt reacted to Jordan Miller for a blog entry, Achievements just became even more powerful   
    Earlier this year, Invision Community launched a native gamification system called Achievements. We added significant improvements to Achievements in our new release, 4.6.8, out now!  🎉 
    Achievements allows community leaders to reward members with points, badges and ranks for their outstanding contributions. We listened to your feedback and implemented some very exciting changes.
    In this post, you'll get a crash course on the new updates included in your Admin Control Panel (ACP) upon updating your community to 4.6.8. Once you're familiar with these concepts, you can take action to elevate your community.
     
    New! Married group promotions with Achievements. 
    New! Added metrics to better understand how Achievements functions within your community. 
    New! Implemented additional rules to further empower your members. 
    New! Updated email notifications to let your members know when they've earned a badge.
    New! Download member lists based on Achievements criteria.
     
    Before we expand on the new features, here's a recap of Achievements to refresh your memory:
     
    Related: Want to know more about Achievements? Read our original blog post.

    Now that you’re up to speed, let’s take a look at the new metrics and rules.
     
    Group promotions


     
    Group promotions lay out various user journeys.
    Based on actions a member takes in a community, for example  commenting 100 times, having a high reputation score or having joined a community a year ago, the platform will automatically place them in a group (based on the rules you previously set up). This is useful when creating a hierarchy in your community. The more your members are engaged, the more access / privileges they receive.
     Now, community leaders can automatically place members in specific groups based on what badges they've earned in the community.  
    Couple examples:
    A moderator manually awards a member the 'Helpful Superstar' badge. In this scenario, that badge can only be earned if a moderator chooses to give it. Once someone earns that badge, they're automatically placed in the 'Helpful Superstars' group. This group may have the ability to create clubs (whereas the other groups can't).  A member earns the 'Engaged' badge. 'Engaged' badges are earned when a member has replied 100 times since joining. Once they've posted 100 replies, the system automatically places them in a new group with other contributing members.   
    Related: Learn more about Group Promotions
     
    Metrics
    Metrics reports are essential for understanding what's working in your community, and what needs improving. 


     
    Badges Earned: Track what badges were earned during a defined period of time. This is especially useful to track both member engagement as well as identify how often your community moderators are awarding badges manually.
    Badges earned by member group: How many members in each group earned a badge. Track this when quantifying what groups are most engaged with your community. Understanding which group(s) earn the most badges helps you better tend to groups that might be less engaged. It might be a good idea to show them some extra attention. 
    Badges by member: Search a time-based list of all members with an earned badge total. Easily discover who your VIP members are and reward / thank them for being active contributors. 
     
    Related: Maximize community growth with our new reporting metrics
     
    Rules
    Set up rules based on various criteria. These rules will automatically take a specific action once the criteria has been met.
     


    Member downloads a file: Members may earn a badge for downloading a specific file. This could be useful if your company wanted to share new policies or an announcement; track which members took the time to download the information and publicly recognize them for staying on top of things. 
    Member purchases a package or product: Members may also earn a badge for purchasing either a package or a specific product. For example, you could create a rule for members to earn a coveted product badge for opting to purchase a physical product (like a t-shirt). Only members who've purchased an item from your community would receive this type of recognition. 
     
    Outreach
    Jump into your members' inboxes with tailor-made good news. 


     
    New Email notifications: New notification emails let your members know when they've received a coveted rank.  
     
    Segment
    Download a list of members based on a number of Achievements criteria, including points, ranks and badges.


     
    In theory, you can upload this list of members elsewhere to target this specific audience (like sending an exclusive email drip campaign in Mail Chimp).
    Several examples include downloading a list of members who've:
    Earned 500 or more points Earned a specific badge Reached a specific rank  
    Achievements is a robust feature to engage your VIP members and spark the fuse of inspiration for newcomers. There’s a lot of power at your fingertips.
    Unsure where to start with implementing Achievements? Check out our original post and determine what behaviors you want to reward within your community. Sometimes just logging in is a good place to start. Reward them for that. 🙂
    Ready to take Achievements to the next level? Check out the new Group Promotions and Achievements Metrics now available in 4.6.8.

    Where are you in your journey with Achievements? Drop us a line in the comments. We’d love to hear from you!
  9. Like
    Matt got a reaction from DreamOn for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  10. Like
    Matt got a reaction from media for a blog entry, Community Buzz: November.2021.1   
    🟢 Scaling your community requires overcoming many barriers and learning new ways of working with your community. Rosie explores this in her blog: How we are at the small scale is who we are at the large scale.
    "In community, we often say to do things that don't scale. To start small. To get the foundations right. To trust that how we are and what we do is what the community becomes, on a larger scale. Our behaviour, our intentions, our alignment, and our goals all influence what the community can become."
    🧠 What we think: There is no right or wrong way to scale your community from its humble beginnings and it can be a lot of hard work but that doesn't mean we should change our core values and how we approach helping others.
     
     
     🟢 Should you respond to questions before your members? Is a question explored by Richard at Feverbee.
    "If you (the community manager) respond to a question in a community, other members are less likely to respond. This makes it harder for top members to earn points and feel a sense of influence.
    But if you don’t respond to a question in a community, it can linger and look bad. It also means the person asking a question is waiting for a response and becoming increasingly frustrated."
     🧠 What we think: There are certain areas where you need your team to lead. Right here on this forum we want to provide the best service for our customers so our support team are active and quick to reply to all questions. There are other community-led sections that definitely benefit from allowing time for other members to reply to share their knowledge. It's a good feeling helping others.
     
     
     🟢 CMX explores how to move your community online. Much of this is great advice for anyone considering moving platform (to Invision Community, right?). 
    "Christiana recommends viewing community migration as a process that requires patiences, “this is not a race meant to be run fast. We are changing the mindset of the people in our ecosystem”. "
      🧠 What we think: Patience is definitely key when moving platforms. The sooner you start engaging with your own community and explaining the reasons for the move and the benefits it'll bring, the easier it will be.
     
     
     🟢 Michelle can't find the bathroom when at a party which inspires a blog on 5 secrets to community onboarding.
    "Walking into a party without your host can feel confusing, alienating, and frustrating. And for your customers, joining a new community without onboarding is just as bad."
       🧠 What we think: Onboarding is critical to your community's success. New members can often feel lost and unsure where to start. It can be intimidating in real life to enter a room full of people that know each other, and this is true in the online space too.
     
     
    🎧 Podcast: What makes a community a home? Patrick explores this by interviewing members of his own community, which opened 20 years ago and is still going strong.
        🧠 What we think: We love hearing about long established communities that are still thriving and hearing how those early online relationships shaped people's lives.
     
     
  11. Like
    Matt got a reaction from Markus Jung for a blog entry, Community Buzz: November.2021.1   
    🟢 Scaling your community requires overcoming many barriers and learning new ways of working with your community. Rosie explores this in her blog: How we are at the small scale is who we are at the large scale.
    "In community, we often say to do things that don't scale. To start small. To get the foundations right. To trust that how we are and what we do is what the community becomes, on a larger scale. Our behaviour, our intentions, our alignment, and our goals all influence what the community can become."
    🧠 What we think: There is no right or wrong way to scale your community from its humble beginnings and it can be a lot of hard work but that doesn't mean we should change our core values and how we approach helping others.
     
     
     🟢 Should you respond to questions before your members? Is a question explored by Richard at Feverbee.
    "If you (the community manager) respond to a question in a community, other members are less likely to respond. This makes it harder for top members to earn points and feel a sense of influence.
    But if you don’t respond to a question in a community, it can linger and look bad. It also means the person asking a question is waiting for a response and becoming increasingly frustrated."
     🧠 What we think: There are certain areas where you need your team to lead. Right here on this forum we want to provide the best service for our customers so our support team are active and quick to reply to all questions. There are other community-led sections that definitely benefit from allowing time for other members to reply to share their knowledge. It's a good feeling helping others.
     
     
     🟢 CMX explores how to move your community online. Much of this is great advice for anyone considering moving platform (to Invision Community, right?). 
    "Christiana recommends viewing community migration as a process that requires patiences, “this is not a race meant to be run fast. We are changing the mindset of the people in our ecosystem”. "
      🧠 What we think: Patience is definitely key when moving platforms. The sooner you start engaging with your own community and explaining the reasons for the move and the benefits it'll bring, the easier it will be.
     
     
     🟢 Michelle can't find the bathroom when at a party which inspires a blog on 5 secrets to community onboarding.
    "Walking into a party without your host can feel confusing, alienating, and frustrating. And for your customers, joining a new community without onboarding is just as bad."
       🧠 What we think: Onboarding is critical to your community's success. New members can often feel lost and unsure where to start. It can be intimidating in real life to enter a room full of people that know each other, and this is true in the online space too.
     
     
    🎧 Podcast: What makes a community a home? Patrick explores this by interviewing members of his own community, which opened 20 years ago and is still going strong.
        🧠 What we think: We love hearing about long established communities that are still thriving and hearing how those early online relationships shaped people's lives.
     
     
  12. Like
    Matt got a reaction from IPCommerceFan for a blog entry, Community Buzz: November.2021.1   
    🟢 Scaling your community requires overcoming many barriers and learning new ways of working with your community. Rosie explores this in her blog: How we are at the small scale is who we are at the large scale.
    "In community, we often say to do things that don't scale. To start small. To get the foundations right. To trust that how we are and what we do is what the community becomes, on a larger scale. Our behaviour, our intentions, our alignment, and our goals all influence what the community can become."
    🧠 What we think: There is no right or wrong way to scale your community from its humble beginnings and it can be a lot of hard work but that doesn't mean we should change our core values and how we approach helping others.
     
     
     🟢 Should you respond to questions before your members? Is a question explored by Richard at Feverbee.
    "If you (the community manager) respond to a question in a community, other members are less likely to respond. This makes it harder for top members to earn points and feel a sense of influence.
    But if you don’t respond to a question in a community, it can linger and look bad. It also means the person asking a question is waiting for a response and becoming increasingly frustrated."
     🧠 What we think: There are certain areas where you need your team to lead. Right here on this forum we want to provide the best service for our customers so our support team are active and quick to reply to all questions. There are other community-led sections that definitely benefit from allowing time for other members to reply to share their knowledge. It's a good feeling helping others.
     
     
     🟢 CMX explores how to move your community online. Much of this is great advice for anyone considering moving platform (to Invision Community, right?). 
    "Christiana recommends viewing community migration as a process that requires patiences, “this is not a race meant to be run fast. We are changing the mindset of the people in our ecosystem”. "
      🧠 What we think: Patience is definitely key when moving platforms. The sooner you start engaging with your own community and explaining the reasons for the move and the benefits it'll bring, the easier it will be.
     
     
     🟢 Michelle can't find the bathroom when at a party which inspires a blog on 5 secrets to community onboarding.
    "Walking into a party without your host can feel confusing, alienating, and frustrating. And for your customers, joining a new community without onboarding is just as bad."
       🧠 What we think: Onboarding is critical to your community's success. New members can often feel lost and unsure where to start. It can be intimidating in real life to enter a room full of people that know each other, and this is true in the online space too.
     
     
    🎧 Podcast: What makes a community a home? Patrick explores this by interviewing members of his own community, which opened 20 years ago and is still going strong.
        🧠 What we think: We love hearing about long established communities that are still thriving and hearing how those early online relationships shaped people's lives.
     
     
  13. Like
    Matt got a reaction from BomAle for a blog entry, Community Buzz: November.2021.1   
    🟢 Scaling your community requires overcoming many barriers and learning new ways of working with your community. Rosie explores this in her blog: How we are at the small scale is who we are at the large scale.
    "In community, we often say to do things that don't scale. To start small. To get the foundations right. To trust that how we are and what we do is what the community becomes, on a larger scale. Our behaviour, our intentions, our alignment, and our goals all influence what the community can become."
    🧠 What we think: There is no right or wrong way to scale your community from its humble beginnings and it can be a lot of hard work but that doesn't mean we should change our core values and how we approach helping others.
     
     
     🟢 Should you respond to questions before your members? Is a question explored by Richard at Feverbee.
    "If you (the community manager) respond to a question in a community, other members are less likely to respond. This makes it harder for top members to earn points and feel a sense of influence.
    But if you don’t respond to a question in a community, it can linger and look bad. It also means the person asking a question is waiting for a response and becoming increasingly frustrated."
     🧠 What we think: There are certain areas where you need your team to lead. Right here on this forum we want to provide the best service for our customers so our support team are active and quick to reply to all questions. There are other community-led sections that definitely benefit from allowing time for other members to reply to share their knowledge. It's a good feeling helping others.
     
     
     🟢 CMX explores how to move your community online. Much of this is great advice for anyone considering moving platform (to Invision Community, right?). 
    "Christiana recommends viewing community migration as a process that requires patiences, “this is not a race meant to be run fast. We are changing the mindset of the people in our ecosystem”. "
      🧠 What we think: Patience is definitely key when moving platforms. The sooner you start engaging with your own community and explaining the reasons for the move and the benefits it'll bring, the easier it will be.
     
     
     🟢 Michelle can't find the bathroom when at a party which inspires a blog on 5 secrets to community onboarding.
    "Walking into a party without your host can feel confusing, alienating, and frustrating. And for your customers, joining a new community without onboarding is just as bad."
       🧠 What we think: Onboarding is critical to your community's success. New members can often feel lost and unsure where to start. It can be intimidating in real life to enter a room full of people that know each other, and this is true in the online space too.
     
     
    🎧 Podcast: What makes a community a home? Patrick explores this by interviewing members of his own community, which opened 20 years ago and is still going strong.
        🧠 What we think: We love hearing about long established communities that are still thriving and hearing how those early online relationships shaped people's lives.
     
     
  14. Like
    Matt got a reaction from Sonya* for a blog entry, Community Buzz: November.2021.1   
    🟢 Scaling your community requires overcoming many barriers and learning new ways of working with your community. Rosie explores this in her blog: How we are at the small scale is who we are at the large scale.
    "In community, we often say to do things that don't scale. To start small. To get the foundations right. To trust that how we are and what we do is what the community becomes, on a larger scale. Our behaviour, our intentions, our alignment, and our goals all influence what the community can become."
    🧠 What we think: There is no right or wrong way to scale your community from its humble beginnings and it can be a lot of hard work but that doesn't mean we should change our core values and how we approach helping others.
     
     
     🟢 Should you respond to questions before your members? Is a question explored by Richard at Feverbee.
    "If you (the community manager) respond to a question in a community, other members are less likely to respond. This makes it harder for top members to earn points and feel a sense of influence.
    But if you don’t respond to a question in a community, it can linger and look bad. It also means the person asking a question is waiting for a response and becoming increasingly frustrated."
     🧠 What we think: There are certain areas where you need your team to lead. Right here on this forum we want to provide the best service for our customers so our support team are active and quick to reply to all questions. There are other community-led sections that definitely benefit from allowing time for other members to reply to share their knowledge. It's a good feeling helping others.
     
     
     🟢 CMX explores how to move your community online. Much of this is great advice for anyone considering moving platform (to Invision Community, right?). 
    "Christiana recommends viewing community migration as a process that requires patiences, “this is not a race meant to be run fast. We are changing the mindset of the people in our ecosystem”. "
      🧠 What we think: Patience is definitely key when moving platforms. The sooner you start engaging with your own community and explaining the reasons for the move and the benefits it'll bring, the easier it will be.
     
     
     🟢 Michelle can't find the bathroom when at a party which inspires a blog on 5 secrets to community onboarding.
    "Walking into a party without your host can feel confusing, alienating, and frustrating. And for your customers, joining a new community without onboarding is just as bad."
       🧠 What we think: Onboarding is critical to your community's success. New members can often feel lost and unsure where to start. It can be intimidating in real life to enter a room full of people that know each other, and this is true in the online space too.
     
     
    🎧 Podcast: What makes a community a home? Patrick explores this by interviewing members of his own community, which opened 20 years ago and is still going strong.
        🧠 What we think: We love hearing about long established communities that are still thriving and hearing how those early online relationships shaped people's lives.
     
     
  15. Like
    Matt got a reaction from sobrenome for a blog entry, Community Buzz: November.2021.1   
    🟢 Scaling your community requires overcoming many barriers and learning new ways of working with your community. Rosie explores this in her blog: How we are at the small scale is who we are at the large scale.
    "In community, we often say to do things that don't scale. To start small. To get the foundations right. To trust that how we are and what we do is what the community becomes, on a larger scale. Our behaviour, our intentions, our alignment, and our goals all influence what the community can become."
    🧠 What we think: There is no right or wrong way to scale your community from its humble beginnings and it can be a lot of hard work but that doesn't mean we should change our core values and how we approach helping others.
     
     
     🟢 Should you respond to questions before your members? Is a question explored by Richard at Feverbee.
    "If you (the community manager) respond to a question in a community, other members are less likely to respond. This makes it harder for top members to earn points and feel a sense of influence.
    But if you don’t respond to a question in a community, it can linger and look bad. It also means the person asking a question is waiting for a response and becoming increasingly frustrated."
     🧠 What we think: There are certain areas where you need your team to lead. Right here on this forum we want to provide the best service for our customers so our support team are active and quick to reply to all questions. There are other community-led sections that definitely benefit from allowing time for other members to reply to share their knowledge. It's a good feeling helping others.
     
     
     🟢 CMX explores how to move your community online. Much of this is great advice for anyone considering moving platform (to Invision Community, right?). 
    "Christiana recommends viewing community migration as a process that requires patiences, “this is not a race meant to be run fast. We are changing the mindset of the people in our ecosystem”. "
      🧠 What we think: Patience is definitely key when moving platforms. The sooner you start engaging with your own community and explaining the reasons for the move and the benefits it'll bring, the easier it will be.
     
     
     🟢 Michelle can't find the bathroom when at a party which inspires a blog on 5 secrets to community onboarding.
    "Walking into a party without your host can feel confusing, alienating, and frustrating. And for your customers, joining a new community without onboarding is just as bad."
       🧠 What we think: Onboarding is critical to your community's success. New members can often feel lost and unsure where to start. It can be intimidating in real life to enter a room full of people that know each other, and this is true in the online space too.
     
     
    🎧 Podcast: What makes a community a home? Patrick explores this by interviewing members of his own community, which opened 20 years ago and is still going strong.
        🧠 What we think: We love hearing about long established communities that are still thriving and hearing how those early online relationships shaped people's lives.
     
     
  16. Like
    Matt got a reaction from Jalal arefen for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  17. Like
    Matt got a reaction from lordi for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  18. Like
    Matt got a reaction from levsha for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  19. Like
    Matt got a reaction from Charles for a blog entry, Community Buzz: November.2021.1   
    🟢 Scaling your community requires overcoming many barriers and learning new ways of working with your community. Rosie explores this in her blog: How we are at the small scale is who we are at the large scale.
    "In community, we often say to do things that don't scale. To start small. To get the foundations right. To trust that how we are and what we do is what the community becomes, on a larger scale. Our behaviour, our intentions, our alignment, and our goals all influence what the community can become."
    🧠 What we think: There is no right or wrong way to scale your community from its humble beginnings and it can be a lot of hard work but that doesn't mean we should change our core values and how we approach helping others.
     
     
     🟢 Should you respond to questions before your members? Is a question explored by Richard at Feverbee.
    "If you (the community manager) respond to a question in a community, other members are less likely to respond. This makes it harder for top members to earn points and feel a sense of influence.
    But if you don’t respond to a question in a community, it can linger and look bad. It also means the person asking a question is waiting for a response and becoming increasingly frustrated."
     🧠 What we think: There are certain areas where you need your team to lead. Right here on this forum we want to provide the best service for our customers so our support team are active and quick to reply to all questions. There are other community-led sections that definitely benefit from allowing time for other members to reply to share their knowledge. It's a good feeling helping others.
     
     
     🟢 CMX explores how to move your community online. Much of this is great advice for anyone considering moving platform (to Invision Community, right?). 
    "Christiana recommends viewing community migration as a process that requires patiences, “this is not a race meant to be run fast. We are changing the mindset of the people in our ecosystem”. "
      🧠 What we think: Patience is definitely key when moving platforms. The sooner you start engaging with your own community and explaining the reasons for the move and the benefits it'll bring, the easier it will be.
     
     
     🟢 Michelle can't find the bathroom when at a party which inspires a blog on 5 secrets to community onboarding.
    "Walking into a party without your host can feel confusing, alienating, and frustrating. And for your customers, joining a new community without onboarding is just as bad."
       🧠 What we think: Onboarding is critical to your community's success. New members can often feel lost and unsure where to start. It can be intimidating in real life to enter a room full of people that know each other, and this is true in the online space too.
     
     
    🎧 Podcast: What makes a community a home? Patrick explores this by interviewing members of his own community, which opened 20 years ago and is still going strong.
        🧠 What we think: We love hearing about long established communities that are still thriving and hearing how those early online relationships shaped people's lives.
     
     
  20. Like
    Matt got a reaction from stu_m for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  21. Thanks
    Matt got a reaction from InvisionHQ for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  22. Like
    Matt got a reaction from SoloInter for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  23. Like
    Matt got a reaction from Daniel F for a blog entry, Community Buzz: November.2021.1   
    🟢 Scaling your community requires overcoming many barriers and learning new ways of working with your community. Rosie explores this in her blog: How we are at the small scale is who we are at the large scale.
    "In community, we often say to do things that don't scale. To start small. To get the foundations right. To trust that how we are and what we do is what the community becomes, on a larger scale. Our behaviour, our intentions, our alignment, and our goals all influence what the community can become."
    🧠 What we think: There is no right or wrong way to scale your community from its humble beginnings and it can be a lot of hard work but that doesn't mean we should change our core values and how we approach helping others.
     
     
     🟢 Should you respond to questions before your members? Is a question explored by Richard at Feverbee.
    "If you (the community manager) respond to a question in a community, other members are less likely to respond. This makes it harder for top members to earn points and feel a sense of influence.
    But if you don’t respond to a question in a community, it can linger and look bad. It also means the person asking a question is waiting for a response and becoming increasingly frustrated."
     🧠 What we think: There are certain areas where you need your team to lead. Right here on this forum we want to provide the best service for our customers so our support team are active and quick to reply to all questions. There are other community-led sections that definitely benefit from allowing time for other members to reply to share their knowledge. It's a good feeling helping others.
     
     
     🟢 CMX explores how to move your community online. Much of this is great advice for anyone considering moving platform (to Invision Community, right?). 
    "Christiana recommends viewing community migration as a process that requires patiences, “this is not a race meant to be run fast. We are changing the mindset of the people in our ecosystem”. "
      🧠 What we think: Patience is definitely key when moving platforms. The sooner you start engaging with your own community and explaining the reasons for the move and the benefits it'll bring, the easier it will be.
     
     
     🟢 Michelle can't find the bathroom when at a party which inspires a blog on 5 secrets to community onboarding.
    "Walking into a party without your host can feel confusing, alienating, and frustrating. And for your customers, joining a new community without onboarding is just as bad."
       🧠 What we think: Onboarding is critical to your community's success. New members can often feel lost and unsure where to start. It can be intimidating in real life to enter a room full of people that know each other, and this is true in the online space too.
     
     
    🎧 Podcast: What makes a community a home? Patrick explores this by interviewing members of his own community, which opened 20 years ago and is still going strong.
        🧠 What we think: We love hearing about long established communities that are still thriving and hearing how those early online relationships shaped people's lives.
     
     
  24. Thanks
    Matt got a reaction from usmf for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  25. Like
    Matt got a reaction from Silnei L Andrade for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  26. Like
    Matt got a reaction from Darek_Hugo for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  27. Like
    Matt got a reaction from Ocean West for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  28. Like
    Matt got a reaction from 4joys for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  29. Like
    Matt got a reaction from sobrenome for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  30. Like
    Matt got a reaction from kyriazhs1975 for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  31. Like
    Matt got a reaction from Ioannis D for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  32. Agree
    Matt got a reaction from COLONER for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  33. Thanks
    Matt got a reaction from evcom for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  34. Thanks
    Matt got a reaction from steel51 for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  35. Thanks
    Matt got a reaction from Unienc for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  36. Like
    Matt got a reaction from BomAle for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  37. Thanks
    Matt got a reaction from Gabriel Torres for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  38. Like
    Matt got a reaction from 403 - Forbiddeen for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  39. Like
    Matt got a reaction from Markus Jung for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  40. Thanks
    Matt got a reaction from SeNioR- for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  41. Like
    Matt got a reaction from The Old Man for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  42. Like
    Matt got a reaction from Clover13 for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  43. Like
    Matt got a reaction from AlexJ for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  44. Like
    Matt got a reaction from Adlago for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  45. Like
    Matt got a reaction from sudo for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  46. Thanks
    Matt got a reaction from Ibai for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  47. Like
    Matt got a reaction from ASTRAPI for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  48. Thanks
    Matt got a reaction from IPCommerceFan for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  49. Like
    Matt got a reaction from OptimusBain for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  50. Like
    Matt got a reaction from Lance... for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  51. Like
    Matt got a reaction from Miss_B for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  52. Like
    Matt got a reaction from Rikki for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  53. Like
    Matt got a reaction from Jim M for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  54. Like
    Matt got a reaction from DawPi for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  55. Thanks
    Matt got a reaction from Maxxius for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  56. Like
    Matt got a reaction from Charles for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  57. Like
    Matt got a reaction from Marc Stridgen for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  58. Thanks
    Matt got a reaction from Sonya* for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  59. Thanks
    Matt got a reaction from LaCollision for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  60. Like
    Matt got a reaction from PPlanet for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  61. Like
    Matt got a reaction from sound for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  62. Like
    Matt got a reaction from aXenDev for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  63. Like
    Matt got a reaction from Real Hal9000 for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
×
×
  • Create New...