Jump to content

robots.txt file


marklcfc

Recommended Posts

Do we still use these? If so is there a recommended setup?

My host has recommended it as I'm getting hit by a lot of these, coupled with 503 service unavailable errors

Googlebot/2.1; +http://www.google.com/bot.html
YandexImages/3.0; +http://yandex.com/bots
AhrefsBot/6.1; +http://ahrefs.com/robot/
MJ12bot/v1.4.8; http://mj12bot.com/
bingbot/2.0; +http://www.bing.com/bingbot.htm
MojeekBot/0.6; +https://www.mojeek.com/bot.html
BrandVerity/1.0 (http://www.brandverity.com/why-is-brandverity-visiting-me)"
SemrushBot/3~bl; +http://www.semrush.com/bot.html)

Link to comment
Share on other sites

Those are are spider identification strings, but on their own that wouldn't exactly do anything if you added it to robots.txt.

What exactly is your host recommending? For instance, if you need to throttle googlebot because they are hitting your community too much, you would need to do that in your Webmaster Tools account.

Link to comment
Share on other sites

15 minutes ago, bfarber said:

Those are are spider identification strings, but on their own that wouldn't exactly do anything if you added it to robots.txt.

What exactly is your host recommending? For instance, if you need to throttle googlebot because they are hitting your community too much, you would need to do that in your Webmaster Tools account.

I've been trying to find out why I keep getting 503 server unavailable errors, its been happening for the past month. They seem to suggest it was when the site was busy but it wasn't, and I expect my site to be much busier than the periods these errors came up. They are suggesting it was a lot of hits from BrandVerity.

Screenshot_20190622-114739_Chrome.jpg.a0d0cffa45fd502bc3ff37c0382a301f.thumb.jpg.88fe30585312a84e970204f7e132872c.jpg

Link to comment
Share on other sites

That's entirely possible - sometimes rogue bots can consume a LOT of server resources.

Generally speaking, if that's the case, you don't block those bots with robots.txt because you're relying on the bot to actually honor robots.txt which isn't a guarantee. I would suggest that, instead, you may wish to block the bot at the firewall level (or, if not possible, use .htaccess to block any IP addresses associated with that bot).

Now - your host included googlebot and many others in the list which I would definitely NOT recommend blocking, unless you want to basically delist your site from all search engines. Be careful in other words. Some bots don't really matter, but some (like googlebot) definitely do.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...