Apparelyzed_merged Posted December 31, 2012 Posted December 31, 2012 Hi all, I'm trying to block both Yandex and Baidu bots from crawling my site. In my html root folder, I've added the following to my htaccess file: order allow,deny deny from 199.21.99.110 deny from 180.76.5. deny from 180.76.6. allow from all Now, 199.21.99.110 is the IP number for Yandex, and according to my server logs, it's being blocked. However, every 6th attempt is getting through even on the blocked IP number. 199.21.99.110 - - [31/Dec/2012:04:10:29 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:10:31 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:10:33 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:10:35 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:10:37 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:10:55 -0600] "GET /forums/topic/12516-profuse-sweating/page__pid__123668 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:10:57 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:10:59 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:01 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:03 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:05 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:23 -0600] "GET /forums/topic/22967-mans-miracle-recovery-from-paralyzed-to-helping-others/page__p__282839 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:25 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:27 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:29 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:31 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:33 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:51 -0600] "GET /forums/topic/21131-depression/page__p__254637 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:53 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:55 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:57 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:11:59 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:12:01 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 199.21.99.110 - - [31/Dec/2012:04:12:19 -0600] "GET /forums/topic/12192-dublin-is-not-in-the-uk/page__pid__118633 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" However, Baidu was blocked, but is still getting through using the 180.76.5. and 180.76.6. ip ranges. Below are the logs when Baidu broke through. 180.76.5.88 - - [30/Dec/2012:17:22:38 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.91 - - [30/Dec/2012:17:22:42 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.187 - - [30/Dec/2012:17:23:31 -0600] "GET /forums/index.php?app=forums&module=forums§ion=findpost&pid=221587 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.194 - - [30/Dec/2012:17:23:31 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.113 - - [30/Dec/2012:17:23:32 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.168 - - [30/Dec/2012:17:23:32 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.136 - - [30/Dec/2012:17:23:33 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.145 - - [30/Dec/2012:17:23:33 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.95 - - [30/Dec/2012:17:24:25 -0600] "GET /forums/index.php?app=forums&module=forums§ion=findpost&pid=221583 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.51 - - [30/Dec/2012:17:25:20 -0600] "GET /forums/index.php?app=forums&module=forums§ion=findpost&pid=221581 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.6.232 - - [30/Dec/2012:17:26:15 -0600] "GET /forums/index.php?app=forums&module=forums§ion=findpost&pid=221394 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.161 - - [30/Dec/2012:17:27:24 -0600] "GET /forums/index.php?app=forums&module=forums§ion=findpost&pid=221064 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.6.20 - - [30/Dec/2012:17:28:18 -0600] "GET /forums/topic/22746-cardio/page__pid__287823 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.146 - - [30/Dec/2012:17:29:13 -0600] "GET /forums/topic/22588-where-to-start/page__pid__276734__settingNewSkin__1 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.6.20 - - [30/Dec/2012:17:30:07 -0600] "GET /forums/topic/22558-t-e-d-hose/page__fromsearch__1 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.169 - - [30/Dec/2012:17:31:02 -0600] "GET /forums/tags/forums/wheelchair+accessories/ HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.6.211 - - [30/Dec/2012:17:31:56 -0600] "GET /forums/index.php?app=forums&module=forums§ion=findpost&pid=220995 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.189 - - [30/Dec/2012:17:32:51 -0600] "GET /forums/index.php?app=forums&module=forums§ion=findpost&pid=220174 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.6.20 - - [30/Dec/2012:17:33:45 -0600] "GET /forums/index.php?app=forums&module=forums§ion=findpost&pid=219900 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.5.153 - - [30/Dec/2012:17:34:40 -0600] "GET /forums/index.php?app=forums&module=forums§ion=findpost&pid=219534 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.6.213 - - [30/Dec/2012:17:35:34 -0600] "GET /forums/index.php?app=forums&module=forums§ion=findpost&pid=219322 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" Any ideas on why Baidu is still getting through and every 6th attempt by Yandex is also getting through? Thanks Simon
Nuclear General Posted December 31, 2012 Posted December 31, 2012 Have you tried contacting your web host company about this breach of blocked bot crawlers? They may be able to figure out why they are breaking through the htaccess block method. -Don :)
Analogged Posted December 31, 2012 Posted December 31, 2012 Id suggest blocking them at kernal level before the request even makes it to your forum, deny them using iptables if your able to. :thumbsup:
Royzee Posted January 1, 2013 Posted January 1, 2013 Analogged is correct. You should block them with IP Tables if your on dedicated or a VPS server. If not and you have cPanel, you can use the IP Deny manager listed under "Security."
Apparelyzed_merged Posted January 5, 2013 Author Posted January 5, 2013 Ok, before I look into IP tables, here's what I have in my htaccess file, is this correct for blocking Yandex and Baidu? RewriteEngine on RewriteBase / RewriteRule ^phpodp(.*) http://www.apparelyzed.com/disability-equipment/ [R=301,L] RewriteRule ^jokes-guestbook/jokes(.*) http://www.apparelyzed.com/ [R=301,L] RewriteRule ^dermatone.html(.*) http://www.apparelyzed.com/dermatome.html [R=301,L] RewriteRule ^myo-dermatones.html(.*) http://www.apparelyzed.com/myo-dermatomes.html [R=301,L] RewriteRule ^portfolio/wmd.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L] RewriteRule ^portfolio/stem.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L] RewriteRule ^portfolio/rage.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L] RewriteRule ^portfolio/deaf.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L] RewriteRule ^portfolio/player.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L] RewriteRule ^portfolio/legsale.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L] RewriteRule ^index.php(.*) http://www.apparelyzed.com/index.html [R=301,L] RewriteRule ^crossdomain.xml(.*) http://www.apparelyzed.com/ [R=301,L] RewriteRule ^disability-equipment/index.php(.*) http://www.apparelyzed.com/forums/forum/30-disability-classifieds-equipment-secondhand-wheelchairs-for-sale-wanted/ [R=301,L] RewriteRule ^cauda-equina-syndtome.html(.*) http://www.apparelyzed.com/cauda-equina-syndrome.html [R=301,L] RewriteRule ^spinal-cord-injury-awareness.html(.*) http://www.apparelyzed.com/spinal-cord-injury-awareness-days.html [R=301,L] ## SITE REFERRER BANNING RewriteEngine on # Options +FollowSymlinks RewriteCond %{HTTP_REFERER} baidu.com [NC,OR] RewriteCond %{HTTP_REFERER} baidu. [NC,OR] RewriteCond %{HTTP_REFERER} sub.baidu.com [NC,OR] RewriteCond %{HTTP_REFERER} yandex.com [NC] RewriteRule .* - [F] <Files .htaccess> order allow,deny deny from all </Files> ErrorDocument 404 http://www.apparelyzed.com/404.html ErrorDocument 400 http://www.apparelyzed.com/400.shtml ErrorDocument 403 http://www.apparelyzed.com/403.shtml ErrorDocument 404 http://www.apparelyzed.com/404.shtml ErrorDocument 406 http://www.apparelyzed.com/406.shtml ErrorDocument 501 http://www.apparelyzed.com/501.shtml order allow,deny deny from 174.132.133.58 deny from 128.109.70.90 deny from 195.182.194.223 deny from 74.208.180.102 deny from 72.199.226.151 deny from 199.21.99.110 deny from 180.76.5.0-255 deny from 180.76.6.0-255 deny from baidu.com deny from yandex.com allow from all Yandex is IP 199.21.99.110 Baidu is IP 180.76. Thanks Simon
Dmacleo Posted January 5, 2013 Posted January 5, 2013 do you have root access? if so CSF for whm/cpanel works.
Apparelyzed_merged Posted January 5, 2013 Author Posted January 5, 2013 Hi, Yes, I have WHM access, but don't venture in there too often, it's scary! I also have Cpanel. What's CSF? Thanks Simon
Dmacleo Posted January 5, 2013 Posted January 5, 2013 http://configserver.com/cp/csf.html its pretty easy to install, need shell access but once installed its easy to use.
idtng Posted January 16, 2013 Posted January 16, 2013 This is what I use to block Baidu bot: BrowserMatchNoCase Baiduspider bad_bot BrowserMatchNoCase Baiduspider/2.0 bad_bot Order Deny,Allow Deny from env=bad_bot
tkheadcase Posted January 19, 2013 Posted January 19, 2013 Blocking them w/ iptables or some other firewall is going to be the best way, since that would stop the requests from even hitting Apache.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.