sadams101 Posted December 2, 2022 Posted December 2, 2022 (edited) I am hoping to get some input on my site's Robots.txt file. Some of the items in it are likely now outdated, and some may be overkill and possibly hurting my site's crawlability, so I am hoping to start a discussion about it here. Most of it was put together long before IPB offered a standard version of the Robots.txt file. Some of what I include was borrowed from past posts here, and some of it may even been poorly or incorrectly formatted, which is why I wanted to get some input on it. I've changed my site a bit regarding profiles, and have a custom plugin that has a noindex for profiles without "About Us" info, but allows indexing for those who have info there. Here is my file: # START Default Rules for Invision Community (https://invisioncommunity.com) User-Agent: * # Block pages with no unique content Disallow: /startTopic/ Disallow: /discover/unread/ Disallow: /markallread/ #Disallow: /staff/ Disallow: /online/ Disallow: /discover/ Disallow: /leaderboard/ Disallow: /search/ Disallow: /*?advancedSearchForm= Disallow: /register/ Disallow: /lostpassword/ Disallow: /login/ # # Block faceted pages and 301 redirect pages Disallow: /*?sortby= Disallow: /*?filter= Disallow: /*?tab= Disallow: /*?do= Disallow: /*ref= Disallow: /*?forumId* # # Block profile pages as these have little unique value, consume a lot of crawl time and contain hundreds of 301 links #Disallow: /profile/ # # END Default Rules for Invision Community (https://invisioncommunity.com) # # # START CUSTOM RULES Disallow: /tags/ Disallow: /notifications/ #Disallow: /applications/ Disallow: /announcement/ Disallow: /*?*sortby= Disallow: /*?*sort= Disallow: /*?sort= Disallow: /*?*sortdirection= Disallow: /*?sortdirection= Disallow: /*?set_template=mobile* Disallow: /*§ion=notifications* Disallow: /*&do=topContributors* Disallow: /*&do=askAQuestion* Disallow: /*?app=core* Disallow: /*?act=calendar* Disallow: /*?act=rssout* #Disallow: /articles/*/*/*/Page1.html/addfav #Disallow: /articles/*/*/*/Page1.html/addread #Disallow: /articles/*/*/*/Page1.html/print Disallow: /profile/*/?do=* Disallow: /profile/*/content/ Disallow: /profile/*/followers/ Disallow: /profile/*/reputation* Disallow: /profile/0-Guest/* Disallow: /blogs/*?view=grid* Disallow: /blogs/*?view=list* Disallow: /blogs/submit/* Disallow: /calendar/*/week/ Disallow: /calendar/*/submit/ Disallow: /calendar/submit/* Disallow: /clubs/*?view=grid* Disallow: /clubs/*?view=list* Disallow: /clubs/index.php?app=core* # added to stop social share links Disallow: /submit?url=* # # Custom Plugin (DP47) Bad Link Fixer for Bots Disallow: /*&do=retrieveUrl* Disallow: /*&do=retrieveUrl* Disallow: /*?app=dp47badlinksfixer* # Sitemaps Sitemap: https://www.celiac.com/sitemap.php Edited December 2, 2022 by sadams101 SeNioR- 1
Jim M Posted December 2, 2022 Posted December 2, 2022 We would recommend and support only the default robots.txt that is included in the software. Anything else would be something you can discuss with the community at large so I will move this to the proper forum. If you are wanting to suggest things to be added or removed from the default robots.txt, you can certainly suggest those in our Feature Suggestion forum for further evaluation. sadams101 and SeNioR- 2
SeNioR- Posted December 2, 2022 Posted December 2, 2022 (edited) 45 minutes ago, sadams101 said: Disallow: /tags/ The tags already have the "noindex" meta tag added, so if you prohibit robots.txt from indexing the page, bots will not be able to read it. Bad practice. Edited December 2, 2022 by SeNioR- sadams101 1
sadams101 Posted December 2, 2022 Author Posted December 2, 2022 I am fairly sure that I added the tags there before there was noindex code on them, as I recall my reason for doing this was because I kept getting flagged by google for the search throttle delay, but thank you, and this is exactly why I'm sharing this. The question is, will google start to crawl the tags again, slow my site (lots of searches slow things down)?
sadams101 Posted December 2, 2022 Author Posted December 2, 2022 I did some research on having both the meta noindex AND the robots.txt, and if it is not disallowed in robots.txt then Google WILL access the page, meaning they will follow those links, which I don't want to happen. I found this here: https://developers.google.com/search/blog/2007/03/using-robots-meta-tag "If you allow a page with robots.txt but block it from being indexed using a meta tag, Googlebot will access the page, read the meta tag, and subsequently not index it." The proper meta tag to stop google from crawling would be nofollow. It seems like the /tags should include nofollow, rather than noindex, if the goal is to stop them from being crawled by the bots. SeNioR- 1
Recommended Posts