Recently Browsing 0 members
- No registered users viewing this page.
By All Astronauts
Searchlight navigable highlighted search terms on the content items themselves Select-a-Search highlight text to search feature on commenting content More Precise Search Result Snippets Faster Search and Stream results Customize Gallery stream results Truncate Stream descriptions
Highlights searched-for terms on search results content pages. When your users click a search result link, the content page that loads with those results will have their search terms highlighted! SSSR has some major improvements over the earlier standalone Searchlight plugin. Marked terms can be navigated via a side-page navigation element that also notes how many marks there are on the page and which mark is currently navigated to. Admins can configure not only the mark styling but create separate styling for the "active" mark on a page. Version 4 of SSSR incorporates major improvements into the marking routine
With 4.0.0 I've reworked the Searchlight functionality to effectively remove any double marks that would break the Searchlight highlighting and navigation, added more formatting options for the search mark navigator, and a few other things. SSSR 6 now includes a toggle to turn the marks on/off and also will no longer mark out common small inconsequential stop words (the, a, an, etc...) when marking words separately. They are still marked when the full query term is marked out.
On comment content, which are any commenting areas in the Invision Community suite, most notably Forums, IPS provides a feature where you can select text and are then presented with the option to add that selected text as a quote in a new comment/response. SSSR with 4.0.0 now piggy backs on that feature to allow you to instead search that selected text instead! Admins can just add the single option to search for the selected text as individual terms or as a direct phrase (quote). Yes, expanding this to other non-comment areas is on the punch list but it will probably require adding in a 3rd party tooltip library. Not the end of the world, just means I'll take some time before doing it.
Better / Faster Search Results Snippets
Prior to Invision Community 4.5 IPS pushed out the ENTIRE content item for every search result. That was... not ideal. So I had a Truncate Search and Stream Results plugin that lightened that load and also incorporated it into an early version of this plugin. IPS got religion with 4.5 and now natively truncates every search and stream result to 600 characters, which naturally removes the need for my truncating plugin. The hitch though, is that it is always the first 600 characters of the item. If you expect search results to have some of your searched for terms highlighted in the snippet, unless you have a hit in the first 600 characters, that ain't happening. For streams this is nothing - there is nothing to mark out. For search results this is a problem. I went out of my way with my earlier plugin to ensure at least something highlightable was returned in the search results snippets. For 4.5 this is, as you can see, still needed. SSSR skips common English stop words (Elasticsearch list, to skip common things like 'a', 'the', and so on) and looks for the first instance of a search term in the content item, grabs some text forward and back, and pushes that 600 characters out for the search result snippet, ensuring you will, usually, have something highlighted in each result returned. Works spectacularly for quoted search terms; see the screen shots for the rest. Its pretty solid though I do have an upgrade in the hopper that will take the longest of your search terms and use that as the one to mark out instead of just the first term. Either way, this is a nice search upgrade.
Invision Community 4.6: IPS pulled the truncation of search and stream results they had with 4.5. SSSR was truncating anyways for search results to get better matching snippets presented but with SSSR 6 I've re-inserted the truncation on stream results. They are, of course, quite perky now without needing your browser to truncate all that content text with JS.
When users make custom Streams, all of their choices for content appear in the description underneath the stream name. That can be a rather large amount of text. SSSR provides an option to truncate all that down with whatever remains available under a 'read more' click. See screenshot. Gallery stream results can be quite heavy, often times loading in 15 or more images per Gallery item. That's a lot of weight for a page. SSSR let's you truncate that all down to whatever level you are comfortable with. When users are on the advanced search / search results page, they probably do not need to be told in the page title that they can 'Search the community'. SSSR provides an option to kill that and replace it instead with the terms searched for with the same appearance as the page title.
Make sure you toggle the Searchlight feature ON in the plugin settings. Same for Select-a-Search. Searchlight should work most places in most apps but Gallery has been specifically exceptioned OUT due to the post-pageload image popup modal. I might come back to that later. IMPORTANT! For select-a-search, this piggy-backs on the built-in IPS quote highlight function. That feature ONLY fires if a user is able to actually quote something in a comment area. That means it will not fire for guests if you do not allow guests to post, or any forums/topics where the member is not allowed to post, locked topics, and so on. Yes there is still more to come with this mod. Search junky? Check out Social Search. It's slightly paused as IPS now natively tracks search results but my tracking is more precise and there are some neat social features around search. IF we get some more interest there I'd love to get back at it...
Any questions just pop into the support topic.
By All Astronauts
This leads to long page load times, snapping screen behavior when the js routines truncate the text after the page has loaded, and so on.
This plugin truncates these results BEFORE they hit your user's browsers, giving them faster load times and better engagement.
Speedy Search and Stream Results (SSSR) is a roll-up plugin that includes my free Truncate Stream Items plugin and adds on a truncate search results feature, a bit that allows you to select the number of images pushed out to stream results when the stream item is a Gallery album update (new images posted), and a setting to allow you to truncate down stream descriptions which can get ridiculously long when a user has selected many forums and so on for their custom streams.
See the screenshot for settings.
This is the support topic for:
1. would be nice with closed clubs have option to hide homepage or sidebars for non-members.
2. put the custom block title optional
Here's the steps to replicate the issue here on the IPS forum:
Go to this page: https://invisioncommunity.com/forums/forum/492-community/ Enter a term in the quick search field. Let's use question for this example. Select Search In This Forum and search. You'll get a page with 3118 results and 125 pages with this url: https://invisioncommunity.com/search/?q=question&quick=1&type=forums_topic&nodes=492
Here's a couple of problems:
Now look at the url for page 2 in the pagination area and you'll see it's:
https://invisioncommunity.com/search/?q=question&type=forums_topic&nodes=492&updated_after=any&sortby=relevancy&page=2 If you do open this link in a new browser tab however you'll get a page with 0 results. This is caused by the &quick=1 part missing from the pagination link.
If you do click on page 2 and instead of a new tab let ajax load the results you'll get this url:
https://invisioncommunity.com/search/?&q=question&type=forums_topic&page=2&quick=1&nodes=492&search_and_or=or&sortby=relevancy The loaded results are still 3118 but the url has different values in it. Specifically updated_after=any is removed and search_and_or=or is added instead.
(The site I originally noticed the issue on goes from ~7k results to ~10k with more pages, but I can't replicate this results count change here on IPS's forum.)
No matter how good your content is, how accurate your keywords are or how precise your microdata is, inefficient crawling reduces the number of pages Google will read and store from your site.
Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
It cannot look and store every page, so it needs to decide what to keep and how long it will spend on your site indexing pages.
Right now, Invision Community is not very good at helping Google understand what is important and how to get there quickly. This blog article runs through the changes we've made to improve crawling efficiency dramatically, starting with Invision Community 4.6.8, our November release.
The short version
This entry will get a little technical. The short version is that we remove a lot of pages from Google's view, including user profiles and filters that create faceted pages and remove a lot of redirect links to reduce the crawl depth and reduce the volume of thin content of little value. Instead, we want Google to focus wholly on topics, posts and other key user-generated content.
Let's now take a deep dive into what crawl budget is, the current problem, the solution and finally look at a before and after analysis. Note, I use the terms "Google" and "search engines" interchangeably. I know that there are many wonderful search engines available but most understand what Google is and does.
Crawl depth and budget
In terms of crawl efficiency, there are two metrics to think about: crawl depth and crawl budget. The crawl budget is the number of links Google (and other search engines) will spider per day. The time spent on your site and the number of links examined depend on multiple factors, including site age, site freshness and more. For example, Google may choose to look at fewer than 100 links per day from your site, whereas Twitter may see hundreds of thousands of links indexed per day.
Crawl depth is essentially how many links Google has to follow to index the page. The fewer links to get to a page, is better. Generally speaking, Google will reduce indexing links more than 5 to 6 clicks deep.
The current problem #1: Crawl depth
A community generates a lot of linked content. Many of these links, such as permalinks to specific posts and redirects to scroll to new posts in a topic, are very useful for logged in members but less so to spiders. These links are easy to spot; just look for "&do=getNewComment" or "&do=getLastComment" in the URL. Indeed, even guests would struggle to use these convenience links given the lack of unread tracking until logged in. Although they offer no clear advantage to guests and search engines, they are prolific, and following the links results in a redirect which increases the crawl depth for content such as topics.
The current problem #2: Crawl budget and faceted content
A single user profile page can have around 150 redirect links to existing content. User profiles are linked from many pages. A single page of a topic will have around 25 links to user profiles. That's potentially 3,750 links Google has to crawl before deciding if any of it should be stored. Even sites with a healthy crawl budget will see a lot of their budget eaten up by links that add nothing new to the search index. These links are also very deep into the site, adding to the overall average crawl depth, which can signal search engines to reduce your crawl budget.
Filters are a valuable tool to sort lists of data in particular ways. For example, when viewing a list of topics, you can filter by the number of replies or when the topic was created. Unfortunately, these filters are a problem for search engines as they create faceted navigation, which creates duplicate pages.
There is a straightforward solution to solve all of the problems outlined above. We can ask that Google avoids indexing certain pages. We can help by using a mix of hints and directives to ensure pages without valuable content are ignored and by reducing the number of links to get to the content. We have used "noindex" in the past, but this still eats up the crawl budget as Google has to crawl the page to learn we do not want it stored in the index.
Fortunately, Google has a hint directive called "nofollow", which you can apply in the <a href> code that wraps a link. This sends a strong hint that this link should not be read at all. However, Google may wish to follow it anyway, which means that we need to use a special file that contains firm instructions for Google on what to follow and index.
This file is called robots.txt. We can use this file to write rules to ensure search engines don't waste their valuable time looking at links that do not have valuable content; that create faceted navigational issues and links that lead to a redirect.
Invision Community will now create a dynamic robots.txt file with rules optimised for your community, or you can create custom rules if you prefer.
The new robots.txt generator in Invision Community
Analysis: Before and after
I took a benchmark crawl using a popular SEO site audit tool of my test community with 50 members and around 20,000 posts, most of which were populated from RSS feeds, so they have actual content, including links, etc. There are approximately 5,000 topics visible to guests.
Once I had implemented the "nofollow" changes, removed a lot of the redirect links for guests and added an optimised robots.txt file, I completed another crawl.
Let's compare the data from the before and after.
First up, the raw numbers show a stark difference.
Before our changes, the audit tool crawled 176,175 links, of which nearly 23% were redirect links. After, just 6,389 links were crawled, with only 0.4% being redirection links. This is a dramatic reduction in both crawl budget and crawl depth. Simply by guiding Google away from thin content like profiles, leaderboards, online lists and redirect links, we can ask it to focus on content such as topics and posts.
Note: You may notice a large drop in "Blocked by Robots.txt" in the 'after' crawl despite using a robots.txt for the first time. The calculation here also includes sharer images and other external links which are blocked by those sites robots.txt files. I added nofollow to the external links for the 'after' crawl so they were not fetched and then blocked externally.
As we can see in this before, the crawl depth has a low peak between 5 and 7 levels deep, with a strong peak at 10+.
After, the peak crawl depth is just 3. This will send a strong signal to Google that your site is optimised and worth crawling more often.
Let's look at a crawl visualisation before we made these changes. It's easy to see how most content was found via table filters, which led to a redirect (the red dots), dramatically increasing crawl depth and reducing crawl efficiency.
Compare that with the after, which shows a much more ordered crawl, with all content discoverable as expected without any red dots indicating redirects.
SEO is a multi-faceted discipline. In the past, we have focused on ensuring we send the correct headers, use the correct microdata such as JSON-LD and optimise meta tags. These are all vital parts of ensuring your site is optimised for crawling. However, as we can see in this blog that without focusing on the crawl budget and crawl efficiency, even the most accurately presented content is wasted if it is not discovered and added into the search index.
These simple changes will offer considerable advantages to how Google and other search engines spider your site.
The features and changes outlined in this blog will be available in our November release, which will be Invision Community 4.6.8.