Invision Community 4: SEO, prepare for v5 and dormant account notifications By Matt Monday at 02:04 PM
ref_san_atom Posted December 25, 2012 Posted December 25, 2012 Hello, What can be done to block spam bots beside uploading robots.txt?
Aiwa Posted December 25, 2012 Posted December 25, 2012 robots.txt doesn't stop spam bots... It is for search engine crawlers... A combination of a good Question and Answer Challenge, IPS Spam Service and 3rd party hooks, such as http://community.invisionpower.com/files/file/5143-stop-spammer-registration/ Are the first steps to take...
srpurdy Posted December 25, 2012 Posted December 25, 2012 Yeah ditto to Aiwa.., StopForumSpam is a great way. I've used SFS even on sites that are not forums it works great, and one thing I normally do if the board is busy is new users have to get atleast they're first post moderated before they can post freely. I find this just tends to keep the odd human spammer away as they can't be bothered, and the other ways prevent the non human ones. :smile: It's always best to use 3rd party tools, although there is lists of ip's that are known offenders this take up space in iptables which slows down web performance. It's better to just check these users on registration. robots.txt is basically useless, as most of these robots don't obey that file at all. It's best to have a proper firewall in place to weed out bad user agents. I like having multiple lines of defense, and even if the registration will fail it's just a waste of server resources. A lot of these bots are using either spoofed headers, or errors in the user-agent line. So an easy way to null those connections completely.
Aiwa Posted December 25, 2012 Posted December 25, 2012 robots.txt is not useless... Legit crawlers DO obey it... And it can be used to SLOW the crawlers down so they don't eat resources on your board... Yes, there are crawlers that don't obey it... Baidu for example. But they are the exception and DO need to be handled via .htaccess if you don't want them crawling your board like mad...
Aiwa Posted December 25, 2012 Posted December 25, 2012 There is a key distinction here.. Spam bots = bots that register and post... Crawlers = Search engines that crawl your sites as guest to get your content.. robots.txt is for CRAWLERS...
ref_san_atom Posted December 25, 2012 Author Posted December 25, 2012 Hello, Actually I was talking about crawlers. I have uploaded robots.txt but it is not blocking bots. Is there a way to make the forum invisible to search engine bots? If there are more steps that can be taken to stop search engine bots, please mention about them.
Aiwa Posted December 25, 2012 Posted December 25, 2012 Don't allow guests to view your content. Just so you're aware, if you block bots entirely you won't show up on search engine searches.
Dmacleo Posted December 25, 2012 Posted December 25, 2012 remember the default one is also commented out, its basically a template and needs some edits to actually work.
srpurdy Posted December 28, 2012 Posted December 28, 2012 robots.txt is not useless... Legit crawlers DO obey it... And it can be used to SLOW the crawlers down so they don't eat resources on your board... Yes, there are crawlers that don't obey it... Baidu for example. But they are the exception and DO need to be handled via .htaccess if you don't want them crawling your board like mad... I didn't mean in general. I meant that robots that don't obey, obviously a robot.txt is useless since they don't obey it. But I see now OP is asking about crawlers anyway so kind of irrelevant now. :smile:
Recommended Posts
Archived
This topic is now archived and is closed to further replies.