Jump to content

Big increase in 429 Google errors for tag searches


sadams101

Recommended Posts

I am seeing a huge increase in errors from my site's tags. The errors are 429, and in Webmaster tools it says:

Quote

Googlebot couldn't access the contents of this URL because the server had an internal error when trying to process the request. These errors tend to be with the server itself, not with the request.

I believe these are related to the flood control setting in admin...so google's bot is being blocked from the search due to the flood control setting. 

My question is this--the links listed all actually work, I tested them. Since such a spike in errors could have a negative impact in google search, what is the best way to handle this? A "nofollow" on the tags? Any idea how to do that? 

Link to comment
Share on other sites

PS - Another option is robots.txt and
Disallow: /tags/
but do we really not want to index these? It seems like this would be good search engine content.

A better option might be rel="nofollow" or rel="nofollow noindex" but that would likely cause the same thing--the links would be excluded from the index.

Link to comment
Share on other sites

No, those options won't work. You can either prevent bots from indexing the content at all (via robots.txt), slow down their crawling globally (you do this in webmaster tools, not via crawl-delay directive), accept that Google will get some 429 responses which is ok (it doesn't hurt SEO, just limits how quickly they can crawl those pages), or remove the search flood control.

Link to comment
Share on other sites

So I found the tag template:

	<a href="{url="app=core&module=search&controller=search&tags={$urlEncodedTag}" seoTemplate="tags"}" class='ipsTag' title="{lang="find_tagged_content" sprintf="$tag"}" rel="tag">

and changed it to:

	<a href="{url="app=core&module=search&controller=search&tags={$urlEncodedTag}" seoTemplate="tags"}" class='ipsTag' title="{lang="find_tagged_content" sprintf="$tag"}" rel="tag noindex">

 

I guess this would be the best solution to avoid having spiders potentially running many searches per minute/second, and to stop the google errors.

Link to comment
Share on other sites

I wanted to make a correction here--the "noindex" is only a meta tag attribute, not a link attribute, so I ended up with a simple Disallow: /tags/ in my robots.txt. 

In a perfect world I could simply turn off the flood control, but then google and other spiders would likely be running these searches constantly, which would affect performance.

Link to comment
Share on other sites

I did look for the "noindex" as an attribute, and in may page I do not see this. In any case, noindex may still, at least to google, mean ok to crawl, just no ok to index. If it really is noindex then people should have the block in robots.txt to stop google from crawling it.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...