sadams101 Posted August 17, 2018 Posted August 17, 2018 I am seeing a huge increase in errors from my site's tags. The errors are 429, and in Webmaster tools it says: Quote Googlebot couldn't access the contents of this URL because the server had an internal error when trying to process the request. These errors tend to be with the server itself, not with the request. I believe these are related to the flood control setting in admin...so google's bot is being blocked from the search due to the flood control setting. My question is this--the links listed all actually work, I tested them. Since such a spike in errors could have a negative impact in google search, what is the best way to handle this? A "nofollow" on the tags? Any idea how to do that?
sadams101 Posted August 17, 2018 Author Posted August 17, 2018 PS - Another option is robots.txt and Disallow: /tags/ but do we really not want to index these? It seems like this would be good search engine content. A better option might be rel="nofollow" or rel="nofollow noindex" but that would likely cause the same thing--the links would be excluded from the index.
sadams101 Posted August 22, 2018 Author Posted August 22, 2018 I thought of another possible approach, but I doubt this is allowed in the robots.txt. Could you put a crawl-delay on a specific directory, for example: Allow: /tags/ Crawl-delay: 60 Ever hear of this?
bfarber Posted August 23, 2018 Posted August 23, 2018 No, those options won't work. You can either prevent bots from indexing the content at all (via robots.txt), slow down their crawling globally (you do this in webmaster tools, not via crawl-delay directive), accept that Google will get some 429 responses which is ok (it doesn't hurt SEO, just limits how quickly they can crawl those pages), or remove the search flood control.
Apfelstrudel Posted August 24, 2018 Posted August 24, 2018 At the moment tag pages are not being indexed by default. The ipb default setting is „noindex“. Just a hint.
sadams101 Posted August 24, 2018 Author Posted August 24, 2018 I am using the latest version of IPB, and in the Pages app my tags do not include noindex: <a href="https://www.url.com" class='ipsTag' title="Find other content tagged with 'tag text'" rel="tag"> I also don't see this an a setting option in the ACP.
sadams101 Posted August 24, 2018 Author Posted August 24, 2018 So I found the tag template: <a href="{url="app=core&module=search&controller=search&tags={$urlEncodedTag}" seoTemplate="tags"}" class='ipsTag' title="{lang="find_tagged_content" sprintf="$tag"}" rel="tag"> and changed it to: <a href="{url="app=core&module=search&controller=search&tags={$urlEncodedTag}" seoTemplate="tags"}" class='ipsTag' title="{lang="find_tagged_content" sprintf="$tag"}" rel="tag noindex"> I guess this would be the best solution to avoid having spiders potentially running many searches per minute/second, and to stop the google errors.
sadams101 Posted August 26, 2018 Author Posted August 26, 2018 I wanted to make a correction here--the "noindex" is only a meta tag attribute, not a link attribute, so I ended up with a simple Disallow: /tags/ in my robots.txt. In a perfect world I could simply turn off the flood control, but then google and other spiders would likely be running these searches constantly, which would affect performance.
Apfelstrudel Posted August 27, 2018 Posted August 27, 2018 As I mentioned above all tag pages are already marked with the meta tag attribute „noindex“. Just open a tag page and check out the source code. But this meta tag has nothing to do with how often Google visits those pages.
sadams101 Posted August 30, 2018 Author Posted August 30, 2018 I did look for the "noindex" as an attribute, and in may page I do not see this. In any case, noindex may still, at least to google, mean ok to crawl, just no ok to index. If it really is noindex then people should have the block in robots.txt to stop google from crawling it.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.