What is a robots.txt file?
When Google or other search engines come to your site to read and store the content in its search index, it will look for a special file called robots.txt. This file is a set of instructions to tell search engines where they can look to crawl content and where they are not allowed to crawl content. We can use these rules to ensure that search engines don't waste their time looking at links that do not have valuable content and avoid links that produce faceted content.
Why is this important?
Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated 4.5 billion web pages active today. That's a lot of work for Google.
It cannot look and store every single page, so it needs to decide what to keep and how long it will spend on your site indexing pages. This is called a crawl budget.
How many pages a day Google will index depends on many factors, including how fresh the site is, how much content you have and how popular your site is. Some websites will have Google index as few as 30 links a day. We want every link to count and not waste Google's time.
What does the suggested Robots.txt file do?
What is the suggested Robots.txt file?
Here is the content of the suggested Robots.txt file. Depending on your configuration, Invision Community can automatically serve this. If your community is inside a directory, you will need to apply it to the root of your site manually. So, for example, if your community was at /home/site/public_html/community/ - you would need to create this robots.txt file and add it to /home/site/public_html. The Admin CP will guide you through this.
# Rules for Invision Community (https://invisioncommunity.com) User-Agent: * # Block pages with no unique content Disallow: /startTopic/ Disallow: /discover/unread/ Disallow: /markallread/ Disallow: /staff/ Disallow: /online/ Disallow: /discover/ Disallow: /leaderboard/ Disallow: /search/ Disallow: /*?advancedSearchForm= Disallow: /register/ Disallow: /lostpassword/ Disallow: /login/ # Block faceted pages and 301 redirect pages Disallow: /*?sortby= Disallow: /*?filter= Disallow: /*?tab= Disallow: /*?do= Disallow: /*ref= Disallow: /*?forumId* # Block profile pages as these have little unique value, consume a lot of crawl time and contain hundreds of 301 links Disallow: /profile/ # Sitemap URL Sitemap: http://domain.tld/sitemap.php
*Note, if you are copying this file, you may need to add the path name and correct the sitemap URL.