Jump to content

sound

Clients
  • Posts

    1,377
  • Joined

Reputation Activity

  1. Like
    sound reacted to Matt for a blog entry, SEO: Improving crawling efficiency   
    No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
    Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
    It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
    Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.

    The short version
    This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
    Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
    Crawl depth and budget
    In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
    Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
    The current problem #1: Crawl depth
    A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in.  Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
    The current problem #2: Crawl budget and faceted content
    A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
    Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.

    The solution
    There is a straightforward solution to solve all of the problems outlined above.  We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
    Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
    This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
    Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.

    The new robots.txt generator in Invision Community
    Analysis: Before and after
    I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
    Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
    Let's compare the data from the before and after.
    First up, the raw numbers show a stark difference.

    Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.

    Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.

    As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.

    After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
    Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.

    Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.

    Conclusion
    SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
    These simple changes will offer considerable advantages to how Google and other search engines spider your site.
    The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.
  2. Like
    sound reacted to Matt for a blog entry, GDPR updates for Invision Community 4.3.3   
    Unless you've been living under a rock, or forgot to opt-in to the memo, GDPR is just around the corner.
    Last week we wrote a blog answering your questions on becoming GDPR compliant with Invision Community.
    We took away a few good points from that discussion and have the following updates coming up for Invision Community 4.3.3 due early next week.
    Downloading Personal Data
    Invision Community already has a method of downloading member data via the member export feature that produces a CSV.
    However, we wanted Invision Community to be more helpful, so we've added a feature that downloads personal data (such as name, email address, known IP addresses, known devices, opt in details and customer data from Nexus if you're using that) in a handy XML format which is very portable and machine readable.
     

    You can access this feature via the ACP member view
    The download itself is in a standard XML format.

    A sample export
    Pruning IP Addresses
    While there is much debate about whether IP addresses are personal information or not, a good number of our customers requested a way to remove IP addresses from older content.
    There are legitimate reasons to store IP addresses for purchase transactions (so fraud can be detected), for security logs (to prevent hackers gaining access) and to prevent spammers registering. However, under the bullet point of not storing information for longer than is required, we have added this feature to remove IP addresses from posted content (reviews, comments, posts, personal messages, etc) after a threshold.

    The default is 'Never', so don't worry. Post upgrade you won't see IP addresses removed unless you enter a value.

    This new setting is under Posting
    Deleting Members
    Invision Community has always had a way to delete a member and retain their content under a "Guest" name.
    We've cleaned this up in 4.3.3. When you delete a member, but want to retain their content, you are offered an option to anonymise this. Choosing this option attributes all posted content to 'Guest' and removes any stored IP addresses.

    Deleting a member
    Privacy Policy
    We've added a neat little feature to automatically list third parties you use on your privacy policy. If you enable Google Analytics, or Facebook Pixel, etc, these are added for you.

    The new setting

     
    Finding Settings Easily
    To make life a little easier, we've added "GDPR" as a live search keyword for the ACP. Simply tap that into the large search bar and Invision Community will list the relevant settings you may want to change.

     
    These changes show our ongoing commitment to helping you with your GDPR compliance. We'll be watching how GDPR in practise unfolds next month and will continue to adapt where required.

    Invision Community 4.3.3 is due out early next week.
  3. Like
    sound reacted to bfarber for a blog entry, New: Statistics   
    This is an entry about our IPS Community Suite 4.2 release
    Statistics can be an important part of monitoring your site and ensuring it grows and responds to your marketing and promotion efforts effectively, and several new statistic tools have been added to the 4.2 Community Suite which we know you will be excited to learn about!
    A simple tool has been added that will allow you to look up and list all member accounts that have last visited the site within a specified time period.

    Look up members who have visited within a set time period
    Additionally, online user (both logged in user and guest) counts are now tracked every 15 minutes and graphed in the AdminCP for you to reference. You can view online user trends over a specified date period, view just guest counts or just member counts (or both), and view the graph in multiple different modes (such as an area chart or as a column chart). By default, the data is retained for 180 days, however you can control how often to prune this statistical data in the AdminCP.

    Online user trends graphed
    You can also view a graph of member activity on the site. Member activity is defined as any "activity" beyond simply browsing, such as submitting a new post, reacting to any content item or comment, or following any content item or node.

    Activity information about your member base
    You will also be able to define keywords that you would like to monitor and then see both a graph of usage of those keywords, as well as a table listing all usages of those keywords. You can use this to track usage of competitor names, find out if hash tags you define are trending, or learn if promotional materials are making an impact on your membership.

    Keyword tracking can help you closely monitor your community
    Along with these additions, we've cleaned up the menu and wording for the rest of the existing statistic options to make their functions more clear.
    We hope these additions help you better track and control your community, making the most of your time and money.
     
    Note for developers: A new chart class has been added which allows you to populate dynamic charts using callbacks, in addition to the standard methods that already exist for pulling data directly from a specified database table.
×
×
  • Create New...