MEVi Posted February 13, 2021 Posted February 13, 2021 (edited) Hello, More than 80% of the traffic is linked to robots creating overloads: database or bandwidth. Can you add feature for block bad robot in CP-Admin -> Ban Settings : Bots AhrefsBot, Konqueror, MJ12bot, Zeus, ... Like this code in index.php: include 'Badbots.php'; second files Badbots.php: <?php $httpUserAgent = null; if(isset($_SERVER['HTTP_USER_AGENT'])) { $httpUserAgent = $_SERVER['HTTP_USER_AGENT']; $unwanted = array( "ahrefsbot", "cmscrawler", "mj12bot", "memorybot", "zbot", ); foreach ($unwanted AS $val) { if (stristr($_SERVER['HTTP_USER_AGENT'], $val) != FALSE) { header("HTTP/1.0 451 Unavailable For Legal Reasons"); readfile('451.shtml'); die; } } } The PHP script above checks each page which slows down the server. The idea would be that the check is done only once if the robot is in the blacklist then the IP address is blocked. The robot is informed by an HTTP error message that it is undesirable. If the robot insists then the IP address that has been temporarily blocked for 24 to 72 hours will be blocked for a longer time or even permanently if the administrator validates it manually. This would allow to update the list of problem IPs on the subscription Invision Power. Edited February 15, 2021 by MEVi
MNOfficial Posted February 14, 2021 Posted February 14, 2021 Yeah, the continued guests who roam, even though my site is unavailable to them, can really slow the site down.
Linux-Is-Best Posted February 14, 2021 Posted February 14, 2021 (edited) Would that not better be served on the DNS level if you had a proxy DNS provider (Cloudflare, for example)? Or if you have no such interest, perhaps on the server level through rewrite rules? The more you rely on your forum to execute this, the heavier the load would be. Because while I do believe your solution would be useful in denying access, you would still be asking your site to inquire about each individual page load. Edited February 14, 2021 by Linux-Is-Best clarification
MEVi Posted February 15, 2021 Author Posted February 15, 2021 The Cloudflare is causing me some problems about data privacy. It would be perfectly possible to check once if it is in the list and then block the IP for 72 hours or more.
bfarber Posted February 15, 2021 Posted February 15, 2021 I would recommend blocking unwanted traffic at the server level. By the time the request makes it to PHP, a lot of unnecessary overhead has already occurred if your intention is to block the traffic. You can already ban by IP address if you wish, and most legitimate spiders will advertise the IP address ranges used by their bots if you wish to do that, but again...this is better served at the server/firewall level. MEVi, Linux-Is-Best, Jordan Miller and 2 others 5
MEVi Posted February 17, 2021 Author Posted February 17, 2021 Indeed it is much more efficient to block this at the server/FireWall level. But it would be convenient to have in the "Online Users" statistics : Robots, Guests, Members (and HTTP Errors).
Jordan Miller Posted February 17, 2021 Posted February 17, 2021 13 hours ago, MEVi said: Indeed it is much more efficient to block this at the server/FireWall level. But it would be convenient to have in the "Online Users" statistics : Robots, Guests, Members (and HTTP Errors). From what I gather, there will be cache changes in an upcoming update that would make this not feasible 🙏
MEVi Posted February 18, 2021 Author Posted February 18, 2021 Identifying if it's a robot in the statistics should not be a cache problem. Just like when there is an HTTP error noted when a person wants to console a page inaccessible to the non-member.
bfarber Posted February 18, 2021 Posted February 18, 2021 Caching changes in an upcoming release mean that when a page is generated for a guest, that page will continue to be served to all guest visits for a period of time. That means a user could cause the page to be generated, and then 100 bots visit the page and see a cached image. Or a bot could cause a page to be generated and 100 regular guest visitors then view the page. Guest session tracking is not reliable due to caching and performance optimizations, and all bots are treated as guests, so what you are after simply isn't reliably tracked. Jordan Miller 1
MEVi Posted February 18, 2021 Author Posted February 18, 2021 I understand I will have to reinstall piwik or an equivalent product.
Recommended Posts