Jump to content

Aggressive bot crawling increasing S3 costs (DataTransfer-Out-Bytes)

Recommended Posts

I hosted my media on S3 and have Cloudflare as well, but am noticing the last month+ that Bingbot has become very aggressive in crawling historical data (I suspect given it isn't hitting the CF cache but rather pulling from S3 directly).  Anyone else experience the same or have any idea what may be going on?  I've been in S3 for a few months now and this spike just started last month from what I can tell from the AWS logging I'm doing and querying via Athena.  

Bingbot is running 43,186 S3 GetObject events per day ranging from 4-9GB in total per day from this bot.

Amazonbot is also starting to increase recently (from what I understand for Alexa, so perhaps blocking Amazonbot won't hurt SEO much).   I'm seeing 6,187 S3 GetObject events per day ranging from 2-6GB in total per day from this bot.

Meanwhile Google is running 7,915 S3 GetObject events per day all less than 1GB for months with the exception of one day.

Ultimately it's not some insanely high costs, but certainly Bingbot at 40K GetObject requests a day vs Google at approx 8K is significant.  Not sure if this will slow down as Bing gets better indexing.  Also not sure if porting to S3 caused this gross reindexing to go on or why it's happening at this level.  I tried to adjust the crawl delay/rate and de-prioritize it both in robots.txt and Bing Webmaster tools but it had zero effect.

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Create New...