Jump to content

robots.txt recommendation for IPS 4.5.4.2


Recommended Posts

Hello,
I just checked the latest indexed articles in Google and i noticed that they are the following:

gindex.png.74675be1d9237d077e63b0765cd5845c.png

lots of "sign in" pages with different URLs, and "create new topic" page from topic view.

The URLs are the following:

https://pctroubleshooting.ro/login/?ref=some_random_charcters
https://pctroubleshooting.ro/forum/45-jocuri-online/?do=add

from my point of view I should forbid the indexing of these pages. what do you think?
how can i disallow that dynamic URLs?

do you have any recommendation for robots.txt?

Thank you!

Link to comment
Share on other sites

  • 2 weeks later...
4 minutes ago, IP-Gamers said:

Yes, yes, it also pisses me off that some unnecessary pages end up in the search engine. And how to deal with this, I do not know. Yes, I'm a noob, and help would not hurt me.

1568877176_2021-04-22113456.png.80cff97f1e0cd13ac550c1d3ebe85667.png

You could use the live meta tag editor to add robot instructions to the pages which you don't want to have indexed.

 

 

That said, we made also several improvements in IPS 4.6 to remove the CSRF key from URLs

Link to comment
Share on other sites

You can easily create a robots.txt files with wildcards:

Check the section: "How to Use wildcards in robots.txt" here: An SEO's Guide to Robots.txt, Wildcards, the X-Robots-Tag and Noindex (builtvisible.com) for example.

To prevent the report URL or other sections being indexed we use this e.g.: 

Disallow: /*?do=reportComment*
Disallow: /*?do=add

As the robots.txt file is very site and version specific and there are other methods, see article mentioned above, I think every site owner has to create his own robots.txt file or "no index" mechanism.

Edited by Thomas P
Link to comment
Share on other sites

@Thomas P 

Thomas have you tested too:

do=GetNewComment
do=GetLastComment

?

I am asking because Google also indexes the same topic several times with these query strings.

Also

Quote

Disallow: /*?do=reportComment*

 this query string is already excluded by the Meta Robots Tag so it doesn't need to be added to robots.txt.

Edited by SeNioR-
Link to comment
Share on other sites

12 hours ago, Thomas P said:

As the robots.txt file is very site and version specific and there are other methods, see article mentioned above, I think every site owner has to create his own robots.txt file or "no index" mechanism.

I agree with you, but I think that certain pages (like login, logout, /*?do=add, /*?do=reportComment*, etc. ) should be disallowed by default...
Can you tell me please which pages you have disallowed from robots.txt?

Link to comment
Share on other sites

7 hours ago, gaby said:

I agree with you, but I think that certain pages (like login, logout, /*?do=add, /*?do=reportComment*, etc. ) should be disallowed by default...
Can you tell me please which pages you have disallowed from robots.txt?

 

16 hours ago, SeNioR- said:

@Thomas P 

Thomas have you tested too:

do=GetNewComment
do=GetLastComment

?

I am asking because Google also indexes the same topic several times with these query strings.

Also

this query string is already excluded by the Meta Robots Tag so it doesn't need to be added to robots.txt.

 

Those are samples, as mentioned before every site, every site setting and environment is specific.
For the URL parameters containing "do" google recognized them -in our case- as possible attribute for the same content and decides by itself, if the parameter has no influence on the page content or if it changes the page content. Check Google Search Console and the appropriate section.

There are different methods to declare a page or url parameter not being indexed be it the robots.txt file, meta tags in the ACP or the Google Search Console settings etc.

 

GSearchConsole_URLparameter.thumb.png.0ad8de96c5fed26a04eca30829b3c525.png

Edited by Thomas P
Link to comment
Share on other sites

You can exclude duplicates by using following rules:

Disallow: /*do=*
Disallow: /*sort=*
Disallow: /*sortby=*
Disallow: /*csrf=*
Disallow: /*csrfKey=*
Disallow: */?tab=*
Disallow: */?_fromLogin=*
Disallow: */?_fromLogout=*

My entire robots.txt https://hoilik.com/robots-txt-for-invision-community-473d8aa32984

Edited by Ilya Hoilik
Link to comment
Share on other sites

6 hours ago, Ilya Hoilik said:

You can exclude duplicates by using following rules:

Disallow: /*do=*
Disallow: /*sort=*
Disallow: /*sortby=*
Disallow: /*csrf=*
Disallow: /*csrfKey=*
Disallow: */?tab=*
Disallow: */?_fromLogin=*
Disallow: */?_fromLogout=*

My entire robots.txt https://hoilik.com/robots-txt-for-invision-community-473d8aa32984

That was very kind and considerate of you to share your full robots.txt with the community. Thank you for being helpful. 👍

Link to comment
Share on other sites

  • 2 weeks later...
On 4/23/2021 at 4:32 PM, Linux-Is-Best said:

I took the full robots.txt from this link, at the moment I have moved away from Yandex, and I work with Bing.
So here's the line in Robots.txt:

Host: https://ip-gamers.net

In Bing, Webmaster is marked with a red cross. What's wrong with her?

Link to comment
Share on other sites

1 hour ago, IP-Gamers said:

In Bing, Webmaster is marked with a red cross. What's wrong with her?

I am sorry, but I know nothing about Bing. I focus most of my SEO (search engine optimization) with Baidu and occasionally Google in mind.  Sorry, I could not be of further help. 😞

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...