Jump to content

Recommended Posts

Posted (edited)

Hello,

More than 80% of the traffic is linked to robots creating overloads: database or bandwidth.

Can you add feature for block bad robot in CP-Admin -> Ban Settings : Bots

  • AhrefsBot, Konqueror, MJ12bot, Zeus, ...

Like this code in index.php:

include 'Badbots.php';

second files Badbots.php:

<?php
$httpUserAgent = null;
if(isset($_SERVER['HTTP_USER_AGENT'])) {
    $httpUserAgent = $_SERVER['HTTP_USER_AGENT'];
 
    $unwanted = array(   
        "ahrefsbot",
	"cmscrawler",
        "mj12bot", 
        "memorybot",    
        "zbot",  
    );
 
    foreach ($unwanted AS $val) {
        if (stristr($_SERVER['HTTP_USER_AGENT'], $val) != FALSE) {
            header("HTTP/1.0 451 Unavailable For Legal Reasons");
            readfile('451.shtml');
            die;
        }
    }
 
}

The PHP script above checks each page which slows down the server. The idea would be that the check is done only once if the robot is in the blacklist then the IP address is blocked. The robot is informed by an HTTP error message that it is undesirable. If the robot insists then the IP address that has been temporarily blocked for 24 to 72 hours will be blocked for a longer time or even permanently if the administrator validates it manually. This would allow to update the list of problem IPs on the subscription Invision Power.

Edited by MEVi
Posted (edited)

Would that not better be served on the DNS level if you had a proxy DNS provider (Cloudflare, for example)? Or if you have no such interest, perhaps on the server level through rewrite rules? The more you rely on your forum to execute this, the heavier the load would be. Because while I do believe your solution would be useful in denying access, you would still be asking your site to inquire about each individual page load.

Edited by Linux-Is-Best
clarification
Posted

The Cloudflare is causing me some problems about data privacy. It would be perfectly possible to check once if it is in the list and then block the IP for 72 hours or more.

Posted

I would recommend blocking unwanted traffic at the server level. By the time the request makes it to PHP, a lot of unnecessary overhead has already occurred if your intention is to block the traffic. You can already ban by IP address if you wish, and most legitimate spiders will advertise the IP address ranges used by their bots if you wish to do that, but again...this is better served at the server/firewall level.

Posted

Indeed it is much more efficient to block this at the server/FireWall level.

But it would be convenient to have in the "Online Users" statistics : Robots, Guests, Members (and HTTP Errors).

Posted
13 hours ago, MEVi said:

Indeed it is much more efficient to block this at the server/FireWall level.

But it would be convenient to have in the "Online Users" statistics : Robots, Guests, Members (and HTTP Errors).

From what I gather, there will be cache changes in an upcoming update that would make this not feasible 🙏 

Posted

Identifying if it's a robot in the statistics should not be a cache problem. Just like when there is an HTTP error noted when a person wants to console a page inaccessible to the non-member.

Posted

Caching changes in an upcoming release mean that when a page is generated for a guest, that page will continue to be served to all guest visits for a period of time. That means a user could cause the page to be generated, and then 100 bots visit the page and see a cached image. Or a bot could cause a page to be generated and 100 regular guest visitors then view the page.

Guest session tracking is not reliable due to caching and performance optimizations, and all bots are treated as guests, so what you are after simply isn't reliably tracked.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...