Search Engine Optimization : Sitemap

iProxy · May 5, 2023

Hello,

1) How change value in yourwebsite.tld/invision/robots.txt ?

```
User-agent: *
Crawl-delay: 10 -> 30
```

This information is generated by the website as the file robots.txt does not exist in FTP

2) While trying to deactivate the generation of the site map I have an error. Whether I put the correct information or not, I always get the same error message.

3) Which results in the slowing down of the website. Why there are no tools in invasion to see the bad Crawler that does not respect the rules of the robots.txt file, would gain in security by blacklisting (DDoS, Brute Force, Flodding, ...) them if the information (can be blacklisted by being redirected to the site robots.txt).

4) Although the website is closed to the public with the invision banner (Offline site) the crawler continue to access the images which is problematic. How to regenerate hashcodes of images ?

Edited May 5, 2023 by iProxy

Randy Calvert · May 5, 2023

1. Click on crawl management tab (you see it in your screenshot). Click on custom and define your own. You can also click on “Do not use” and create your own file via FTP. Remember though… if you choose custom or do not use… you start with nothing for the file. So you might want to add even the stuff that might already be in auto the IPS generated version.

2. This is typically just the root of your install with the protocol. So https://yourwebsite.tld/invision/sitemap.php based on your example above.

3. Slowing down bad bots, etc is not something you do at the software level. You do that at either your server level or firewall level. Talk to your host for help with that. And bad bots don’t respect robots.txt just fyi. 🙂

4. That’s a server/hosting issue. You might block them using custom .htaccess rules or via a firewall.

iProxy · May 5, 2023

I have tried it and still get the same error message
I have tried it and still get the same error message
I find your answer simplistic. When a Crawler tries to access a topic that has been removed (from public view) and persists sometimes it's blatantly obvious sometimes it's not, what do you want the host to do it's not his job to monitor developments in "my" community.
Another example if a topic has been made public and then removed by a moderator (invisible) but not deleted, the Crawler can still download the images and files. I'm asking how to regenerate the Hashcode for images/files without knowing to delete them and re-upload them one by one.

Edited May 5, 2023 by iProxy

Randy Calvert · May 5, 2023

28 minutes ago, iProxy said:

I find your answer simplistic. When a Crawler tries to access a topic that has been removed (from public view) and persists sometimes it's blatantly obvious sometimes it's not, what do you want the host to do it's not his job to monitor developments in "my" community.

When a crawler tries to access a topic that it does not have permission to access, it will get a message denied. It can't do anything else. In terms of IPB trying to figure out what speed your server or site can handle... it can't do this. It does not know if it's on a shared host, or some monster beast, etc. It does not know if it's the only thing running on the server or if there are other sites/applications also running. You need to monitor and control bad bots outside of the software. Good bots will follow robots.txt instructions you place in. Bad ones that don't... you should be blocking from getting to the site at all which is done either at your server denying by the IP addresses or ASN (for example using CSF), etc or within some sort of WAF that sits in front of the site/server.

The best place I would suggest doing this would be within some sort of WAF such as Cloudflare.

32 minutes ago, iProxy said:

Another example if a topic has been made public and then removed by a moderator (invisible) but not deleted, the Crawler can still download the images and files. I'm asking how to regenerate the Hashcode for images/files without knowing to delete them and re-upload them one by one.

There is no way to do this.

34 minutes ago, iProxy said:

I have tried it and still get the same error message

I have tried it and still get the same error message

If you go to the Sitemap tab... that value should be pre-filled in with something valid.

Can you actually open that path? It should be whatever address you use to access the ACP, but without "/admin", and adding /sitemap.php on the end.

Out of curiosity, have you tried this in a different browser to make sure it's not a browser auto-fill issue and have you also tried disabling any 3rd party resources (applications/plugins)?

iProxy · May 5, 2023

Thank you for this additional information

Bug Sitemap: I have several websites running on invision, the settings are pretty much the same and on one site it doesn't work that's why I asked for help. I won't go into detail on the subject.

Bad Crawler/Membre: I noticed on other products, that the guests can see photos in reduced dimensions (300x300) inviting to identify themselves to visualize it in the real size. You tell me that this is impossible, but IPS already does it, but only in IP.Download. You must have realized that the downloaded files are associated with a temporary key hash that prevents bots from downloading the files. Why not extend this feature to the forum and pages?

A group of members discussing in a password protected section of the forum. If the member opens their web browser and enables offline reading, they can view the page without being logged in and thus access the images, worse they can share the web link of those images to allow others to view them without being a member of the community. This is for me a real problem with the new copyright laws. Some companies are specialized in searching and demonstrating the accessibility of these pictures they even create an account on the forum and index it as a member and not as a guest IPS has no function to be notified of this abuse. When we realize the problem it is already too late, all the methods used proved to be ineffective in the long run. The policy recommends using the hash code regeneration function hashs (md5, sha1, sha-256) to solve the problem which is a very good idea, but I don't know how to do it.

Regeneration hash this should even be automatic in some cases, when a topic is hidden by a moderator or the topic is moved.

Edited May 5, 2023 by iProxy

Jim M · May 8, 2023

Please make any feature suggestions in Feedback forum.

iProxy · May 10, 2023

The problem with the sitemap is still present even with version 4.7.10

Marc · May 10, 2023

As mentioned, the second item you would add as a suggestion. The latter you have stated you won't go into detail on, so I'm not sure what we can do to assist you unfortunately.

iProxy · May 11, 2023

Bad Crawler/Membre: I followed your advice and created a topic

Bug Sitemap: I don't have this kind of problem with my other sites. Just connect to the website, you will see by yourself the sitemap functionality is not usable.

Edited May 11, 2023 by iProxy

Marc · May 11, 2023

7 hours ago, iProxy said:

Just connect to the website, you will see by yourself the sitemap functionality is not usable.

You are showing localhost on your screenshots, and have stated you wont provide other details. If you can provide information, we can certainly look

iProxy · May 11, 2023

Quote

Marc · May 11, 2023

You appear not to have posted anything there

iProxy · May 11, 2023

I have the impression that there is confusion. The website is online but closed to the public through the functionality provided for this purpose in IPS. Here is a demonstration of the bug on an active website :

https://files.nospy.ch/xyz/BUG/IPS/2023-0505-Sitemap.mp4

PS Sorry for the previous message it happens that I click a button and I do not manage to find how to delete and there is no option to say cancel.

Jim M · May 11, 2023

37 minutes ago, iProxy said:

I have the impression that there is confusion. The website is online but closed to the public through the functionality provided for this purpose in IPS. Here is a demonstration of the bug on an active website :

https://files.nospy.ch/xyz/BUG/IPS/2023-0505-Sitemap.mp4

Unfortunately, this isn't something which I can reproduce on my own installation. I would recommend checking server permissions and security modules to ensure that the system is able to reach the given URL there and validate what is on the server.

iProxy · May 11, 2023

It is incomprehensible if I install the old version 4.6, there is no problem with version 4.7 is this error.

if I use this erroneous URL https://www.nospy.ch/info/blog/

If I use this work fine https://www.nospy.ch/info/sitemap.php

But in the CPAdmin interface I have an error.

I have disabled the offline mode, it does not change anything. I deleted all .HTACCESS no improvement.

Ryan Ashbrook · May 11, 2023

When you save that form, the software makes an HTTP request to the URL that is entered to ensure it is valid. Your server is blocking that request, which is why it says it is invalid.

I would recommend contacting your host to ensure that your server can make HTTP requests to itself. Unfortunately, this is not a software issue.

iProxy · May 11, 2023

Hello,

You are right we had banned the server, because there was suspicious activity. 😅

The problem is solved.

Jim M · May 11, 2023

Glad to hear that you found the issue!

iProxy · May 11, 2023

While searching for the problem we found a strange error message on

https://www.nospy.ch/info/blog/

How to mortify the page so that it is our website and not your website in the link?

Jim M · May 11, 2023

You'll want to delete the "blog" folder in your "info" folder as that shouldn't be there and the dynamic URL isn't working from the software.

iProxy · May 11, 2023

I deleted the file and everything went back to normal.

Thanks again to everyone and good luck. Totally outdated subject.

Marc · May 12, 2023

Glad you have things working 🙂

iProxy · June 2, 2023

Since the last update 4.7.10+++, I have a bug with the sitemap on all websites, if I modify a value I get an error.

If modify any value, it tells me that the file is not compliant the other problem is that I can't even put it in another folder, so what's the point of asking where the file is?

Edited June 2, 2023 by iProxy

Marc · June 2, 2023

Could you give me an example value you are entering?

Jheroen · June 6, 2023

Is it the best option to use the robots.txt which is shown here

Search Engine Optimization : Sitemap

Recommended Posts

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Nathan Explosion

Randy Calvert

Jim M

Posted Images

Recently Browsing 0 members