Jump to content

CP-Admin Ban Settings: Unwanted Bots


MEVi

Recommended Posts

Hello,

More than 80% of the traffic is linked to robots creating overloads: database or bandwidth.

Can you add feature for block bad robot in CP-Admin -> Ban Settings : Bots

  • AhrefsBot, Konqueror, MJ12bot, Zeus, ...

Like this code in index.php:

include 'Badbots.php';

second files Badbots.php:

<?php
$httpUserAgent = null;
if(isset($_SERVER['HTTP_USER_AGENT'])) {
    $httpUserAgent = $_SERVER['HTTP_USER_AGENT'];
 
    $unwanted = array(   
        "ahrefsbot",
	"cmscrawler",
        "mj12bot", 
        "memorybot",    
        "zbot",  
    );
 
    foreach ($unwanted AS $val) {
        if (stristr($_SERVER['HTTP_USER_AGENT'], $val) != FALSE) {
            header("HTTP/1.0 451 Unavailable For Legal Reasons");
            readfile('451.shtml');
            die;
        }
    }
 
}

The PHP script above checks each page which slows down the server. The idea would be that the check is done only once if the robot is in the blacklist then the IP address is blocked. The robot is informed by an HTTP error message that it is undesirable. If the robot insists then the IP address that has been temporarily blocked for 24 to 72 hours will be blocked for a longer time or even permanently if the administrator validates it manually. This would allow to update the list of problem IPs on the subscription Invision Power.

Edited by MEVi
Link to comment
Share on other sites

Would that not better be served on the DNS level if you had a proxy DNS provider (Cloudflare, for example)? Or if you have no such interest, perhaps on the server level through rewrite rules? The more you rely on your forum to execute this, the heavier the load would be. Because while I do believe your solution would be useful in denying access, you would still be asking your site to inquire about each individual page load.

Edited by Linux-Is-Best
clarification
Link to comment
Share on other sites

I would recommend blocking unwanted traffic at the server level. By the time the request makes it to PHP, a lot of unnecessary overhead has already occurred if your intention is to block the traffic. You can already ban by IP address if you wish, and most legitimate spiders will advertise the IP address ranges used by their bots if you wish to do that, but again...this is better served at the server/firewall level.

Link to comment
Share on other sites

13 hours ago, MEVi said:

Indeed it is much more efficient to block this at the server/FireWall level.

But it would be convenient to have in the "Online Users" statistics : Robots, Guests, Members (and HTTP Errors).

From what I gather, there will be cache changes in an upcoming update that would make this not feasible 🙏 

Link to comment
Share on other sites

Caching changes in an upcoming release mean that when a page is generated for a guest, that page will continue to be served to all guest visits for a period of time. That means a user could cause the page to be generated, and then 100 bots visit the page and see a cached image. Or a bot could cause a page to be generated and 100 regular guest visitors then view the page.

Guest session tracking is not reliable due to caching and performance optimizations, and all bots are treated as guests, so what you are after simply isn't reliably tracked.

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...