Jump to content

Blocking Yandex & Baidu Bots Via htaccess?


Recommended Posts

Posted

Hi all,

I'm trying to block both Yandex and Baidu bots from crawling my site.

In my html root folder, I've added the following to my htaccess file:

order allow,deny
deny from 199.21.99.110
deny from 180.76.5.
deny from 180.76.6.
allow from all

Now, 199.21.99.110 is the IP number for Yandex, and according to my server logs, it's being blocked. However, every 6th attempt is getting through even on the blocked IP number.

199.21.99.110 - - [31/Dec/2012:04:10:29 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:10:31 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:10:33 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:10:35 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:10:37 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:10:55 -0600] "GET /forums/topic/12516-profuse-sweating/page__pid__123668 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:10:57 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:10:59 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:01 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:03 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:05 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:23 -0600] "GET /forums/topic/22967-mans-miracle-recovery-from-paralyzed-to-helping-others/page__p__282839 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:25 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:27 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:29 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:31 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:33 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:51 -0600] "GET /forums/topic/21131-depression/page__p__254637 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:53 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:55 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:57 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:11:59 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:12:01 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.110 - - [31/Dec/2012:04:12:19 -0600] "GET /forums/topic/12192-dublin-is-not-in-the-uk/page__pid__118633 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

However, Baidu was blocked, but is still getting through using the 180.76.5. and 180.76.6. ip ranges. Below are the logs when Baidu broke through.

180.76.5.88 - - [30/Dec/2012:17:22:38 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.91 - - [30/Dec/2012:17:22:42 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.187 - - [30/Dec/2012:17:23:31 -0600] "GET /forums/index.php?app=forums&module=forums&section=findpost&pid=221587 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.194 - - [30/Dec/2012:17:23:31 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.113 - - [30/Dec/2012:17:23:32 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.168 - - [30/Dec/2012:17:23:32 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.136 - - [30/Dec/2012:17:23:33 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.145 - - [30/Dec/2012:17:23:33 -0600] "GET /403.shtml HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.95 - - [30/Dec/2012:17:24:25 -0600] "GET /forums/index.php?app=forums&module=forums&section=findpost&pid=221583 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.51 - - [30/Dec/2012:17:25:20 -0600] "GET /forums/index.php?app=forums&module=forums&section=findpost&pid=221581 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.6.232 - - [30/Dec/2012:17:26:15 -0600] "GET /forums/index.php?app=forums&module=forums&section=findpost&pid=221394 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.161 - - [30/Dec/2012:17:27:24 -0600] "GET /forums/index.php?app=forums&module=forums&section=findpost&pid=221064 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.6.20 - - [30/Dec/2012:17:28:18 -0600] "GET /forums/topic/22746-cardio/page__pid__287823 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.146 - - [30/Dec/2012:17:29:13 -0600] "GET /forums/topic/22588-where-to-start/page__pid__276734__settingNewSkin__1 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.6.20 - - [30/Dec/2012:17:30:07 -0600] "GET /forums/topic/22558-t-e-d-hose/page__fromsearch__1 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.169 - - [30/Dec/2012:17:31:02 -0600] "GET /forums/tags/forums/wheelchair+accessories/ HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.6.211 - - [30/Dec/2012:17:31:56 -0600] "GET /forums/index.php?app=forums&module=forums&section=findpost&pid=220995 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.189 - - [30/Dec/2012:17:32:51 -0600] "GET /forums/index.php?app=forums&module=forums&section=findpost&pid=220174 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.6.20 - - [30/Dec/2012:17:33:45 -0600] "GET /forums/index.php?app=forums&module=forums&section=findpost&pid=219900 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.5.153 - - [30/Dec/2012:17:34:40 -0600] "GET /forums/index.php?app=forums&module=forums&section=findpost&pid=219534 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.6.213 - - [30/Dec/2012:17:35:34 -0600] "GET /forums/index.php?app=forums&module=forums&section=findpost&pid=219322 HTTP/1.1" 302 362 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

Any ideas on why Baidu is still getting through and every 6th attempt by Yandex is also getting through?

Thanks

Simon

Posted

Analogged is correct. You should block them with IP Tables if your on dedicated or a VPS server. If not and you have cPanel, you can use the IP Deny manager listed under "Security."

Posted

Ok, before I look into IP tables, here's what I have in my htaccess file, is this correct for blocking Yandex and Baidu?

RewriteEngine on
RewriteBase /
RewriteRule ^phpodp(.*) http://www.apparelyzed.com/disability-equipment/ [R=301,L]
RewriteRule ^jokes-guestbook/jokes(.*) http://www.apparelyzed.com/ [R=301,L]
RewriteRule ^dermatone.html(.*) http://www.apparelyzed.com/dermatome.html [R=301,L]
RewriteRule ^myo-dermatones.html(.*) http://www.apparelyzed.com/myo-dermatomes.html [R=301,L]
RewriteRule ^portfolio/wmd.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L]
RewriteRule ^portfolio/stem.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L]
RewriteRule ^portfolio/rage.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L]
RewriteRule ^portfolio/deaf.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L]
RewriteRule ^portfolio/player.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L]
RewriteRule ^portfolio/legsale.html(.*) http://www.apparelyzed.com/portfolio.html [R=301,L]
RewriteRule ^index.php(.*) http://www.apparelyzed.com/index.html [R=301,L]
RewriteRule ^crossdomain.xml(.*) http://www.apparelyzed.com/ [R=301,L]
RewriteRule ^disability-equipment/index.php(.*) http://www.apparelyzed.com/forums/forum/30-disability-classifieds-equipment-secondhand-wheelchairs-for-sale-wanted/  [R=301,L]
RewriteRule ^cauda-equina-syndtome.html(.*) http://www.apparelyzed.com/cauda-equina-syndrome.html [R=301,L]
RewriteRule ^spinal-cord-injury-awareness.html(.*) http://www.apparelyzed.com/spinal-cord-injury-awareness-days.html [R=301,L]


## SITE REFERRER BANNING
RewriteEngine on
# Options +FollowSymlinks

RewriteCond %{HTTP_REFERER} baidu.com [NC,OR]
RewriteCond %{HTTP_REFERER} baidu. [NC,OR]
RewriteCond %{HTTP_REFERER} sub.baidu.com [NC,OR]
RewriteCond %{HTTP_REFERER} yandex.com [NC]
RewriteRule .* - [F]


<Files .htaccess>
order allow,deny
deny from all
</Files>
	

ErrorDocument 404 http://www.apparelyzed.com/404.html
ErrorDocument 400 http://www.apparelyzed.com/400.shtml
ErrorDocument 403 http://www.apparelyzed.com/403.shtml
ErrorDocument 404 http://www.apparelyzed.com/404.shtml
ErrorDocument 406 http://www.apparelyzed.com/406.shtml
ErrorDocument 501 http://www.apparelyzed.com/501.shtml


order allow,deny
deny from 174.132.133.58
deny from 128.109.70.90
deny from 195.182.194.223
deny from 74.208.180.102
deny from 72.199.226.151
deny from 199.21.99.110
deny from 180.76.5.0-255
deny from 180.76.6.0-255
deny from baidu.com
deny from yandex.com
allow from all


Yandex is IP 199.21.99.110

Baidu is IP 180.76.

Thanks

Simon

  • 2 weeks later...
Posted

This is what I use to block Baidu bot:

BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot


Order Deny,Allow
Deny from env=bad_bot
 

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...