Jump to content

Search Engine Optimization : Sitemap


Recommended Posts

Hello,

1) How change value in yourwebsite.tld/invision/robots.txt ?

  • User-agent: *
    Crawl-delay: 10 -> 30
    

This information is generated by the website as the file robots.txt does not exist in FTP

2) While trying to deactivate the generation of the site map I have an error. Whether I put the correct information or not, I always get the same error message.

Could contain: File, Text, Page, Webpage

 

3) Which results in the slowing down of the website. Why there are no tools in invasion to see the bad Crawler that does not respect the rules of the robots.txt file, would gain in security by blacklisting (DDoS, Brute Force, Flodding, ...) them if the information (can be blacklisted by being redirected to the site robots.txt).

4) Although the website is closed to the public with the invision banner (Offline site) the crawler continue to access the images which is problematic. How to regenerate hashcodes of images ?

Edited by iProxy
Link to comment
Share on other sites

1. Click on crawl management tab (you see it in your screenshot). Click on custom and define your own. You can also click on “Do not use” and create your own file via FTP.  Remember though… if you choose custom or do not use… you start with nothing for the file. So you might want to add even the stuff that might already be in auto the IPS generated version. 

2. This is typically just the root of your install with the protocol. So https://yourwebsite.tld/invision/sitemap.php based on your example above. 

3. Slowing down bad bots, etc is not something you do at the software level. You do that at either your server level or firewall level. Talk to your host for help with that. And bad bots don’t respect robots.txt just fyi. 🙂 

4. That’s a server/hosting issue. You might block them using custom .htaccess rules or via a firewall. 

Link to comment
Share on other sites

  1. I have tried it and still get the same error message
  2. I have tried it and still get the same error message
  3. I find your answer simplistic. When a Crawler tries to access a topic that has been removed (from public view) and persists sometimes it's blatantly obvious sometimes it's not, what do you want the host to do it's not his job to monitor developments in "my" community.
  4. Another example if a topic has been made public and then removed by a moderator (invisible) but not deleted, the Crawler can still download the images and files. I'm asking how to regenerate the Hashcode for images/files without knowing to delete them and re-upload them one by one.
Edited by iProxy
Link to comment
Share on other sites

Marc Stridgen
This post was recognized by Marc Stridgen!

Randy Calvert was awarded the badge 'Helpful' and 5 points.

28 minutes ago, iProxy said:

I find your answer simplistic. When a Crawler tries to access a topic that has been removed (from public view) and persists sometimes it's blatantly obvious sometimes it's not, what do you want the host to do it's not his job to monitor developments in "my" community.

When a crawler tries to access a topic that it does not have permission to access, it will get a message denied.  It can't do anything else.  In terms of IPB trying to figure out what speed your server or site can handle... it can't do this.  It does not know if it's on a shared host, or some monster beast, etc.  It does not know if it's the only thing running on the server or if there are other sites/applications also running.  You need to monitor and control bad bots outside of the software.   Good bots will follow robots.txt instructions you place in.  Bad ones that don't... you should be blocking from getting to the site at all which is done either at your server denying by the IP addresses or ASN (for example using CSF), etc or within some sort of WAF that sits in front of the site/server. 

The best place I would suggest doing this would be within some sort of WAF such as Cloudflare.

32 minutes ago, iProxy said:

Another example if a topic has been made public and then removed by a moderator (invisible) but not deleted, the Crawler can still download the images and files. I'm asking how to regenerate the Hashcode for images/files without knowing to delete them and re-upload them one by one.

There is no way to do this.  

34 minutes ago, iProxy said:
  • I have tried it and still get the same error message
  • I have tried it and still get the same error message

If you go to the Sitemap tab...  that value should be pre-filled in with something valid.

Could contain: File, Webpage, Page, Text

Can you actually open that path?  It should be whatever address you use to access the ACP, but without "/admin", and adding /sitemap.php on the end. 

Out of curiosity, have you tried this in a different browser to make sure it's not a browser auto-fill issue and have you also tried disabling any 3rd party resources (applications/plugins)?

Link to comment
Share on other sites

Thank you for this additional information

Bug Sitemap: I have several websites running on invision, the settings are pretty much the same and on one site it doesn't work that's why I asked for help. I won't go into detail on the subject.

Bad Crawler/Membre: I noticed on other products, that the guests can see photos in reduced dimensions (300x300) inviting to identify themselves to visualize it in the real size. You tell me that this is impossible, but IPS already does it, but only in IP.Download. You must have realized that the downloaded files are associated with a temporary key hash that prevents bots from downloading the files. Why not extend this feature to the forum and pages?

A group of members discussing in a password protected section of the forum. If the member opens their web browser and enables offline reading, they can view the page without being logged in and thus access the images, worse they can share the web link of those images to allow others to view them without being a member of the community. This is for me a real problem with the new copyright laws. Some companies are specialized in searching and demonstrating the accessibility of these pictures they even create an account on the forum and index it as a member and not as a guest IPS has no function to be notified of this abuse. When we realize the problem it is already too late, all the methods used proved to be ineffective in the long run. The policy recommends using the hash code regeneration function hashs (md5, sha1, sha-256) to solve the problem which is a very good idea, but I don't know how to do it.

Regeneration hash this should even be automatic in some cases, when a topic is hidden by a moderator or the topic is moved.

 

 

Edited by iProxy
Link to comment
Share on other sites

Bad Crawler/Membre: I followed your advice and created a topic

Bug Sitemap: I don't have this kind of problem with my other sites. Just connect to the website, you will see by yourself the sitemap functionality is not usable.

Edited by iProxy
Link to comment
Share on other sites

7 hours ago, iProxy said:

Just connect to the website, you will see by yourself the sitemap functionality is not usable.

You are showing localhost on your screenshots, and have stated you wont provide other details. If you can provide information, we can certainly look

Link to comment
Share on other sites

I have the impression that there is confusion. The website is online but closed to the public through the functionality provided for this purpose in IPS. Here is a demonstration of the bug on an active website :

PS Sorry for the previous message it happens that I click a button and I do not manage to find how to delete and there is no option to say cancel.

 

Link to comment
Share on other sites

37 minutes ago, iProxy said:

I have the impression that there is confusion. The website is online but closed to the public through the functionality provided for this purpose in IPS. Here is a demonstration of the bug on an active website :

Unfortunately, this isn't something which I can reproduce on my own installation. I would recommend checking server permissions and security modules to ensure that the system is able to reach the given URL there and validate what is on the server.

Link to comment
Share on other sites

It is incomprehensible if I install the old version 4.6, there is no problem with version 4.7 is this error.

if I use this erroneous URL https://www.nospy.ch/info/blog/

Could contain: Page, Text

 

If I use this work fine https://www.nospy.ch/info/sitemap.php

But in the CPAdmin interface I have an error.

I have disabled the offline mode, it does not change anything. I deleted all .HTACCESS no improvement.

 

Link to comment
Share on other sites

When you save that form, the software makes an HTTP request to the URL that is entered to ensure it is valid. Your server is blocking that request, which is why it says it is invalid.

I would recommend contacting your host to ensure that your server can make HTTP requests to itself. Unfortunately, this is not a software issue.

Link to comment
Share on other sites

  • 3 weeks later...

Since the last update 4.7.10+++, I have a bug with the sitemap on all websites, if I modify a value I get an error.

Could contain: Sign, Symbol, Text

 

If modify any value, it tells me that the file is not compliant the other problem is that I can't even put it in another folder, so what's the point of asking where the file is?

Edited by iProxy
Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...