robots.txt recommendation for IPS 4.5.4.2

gaby · April 7, 2021

Hello,
I just checked the latest indexed articles in Google and i noticed that they are the following:

lots of "sign in" pages with different URLs, and "create new topic" page from topic view.

The URLs are the following:

https://pctroubleshooting.ro/login/?ref=some_random_charcters
https://pctroubleshooting.ro/forum/45-jocuri-online/?do=add

from my point of view I should forbid the indexing of these pages. what do you think?
how can i disallow that dynamic URLs?

do you have any recommendation for robots.txt?

Thank you!

IP-Gamers · April 22, 2021

Yes, yes, it also pisses me off that some unnecessary pages end up in the search engine. And how to deal with this, I do not know. Yes, I'm a noob, and help would not hurt me.

Daniel F · April 22, 2021

4 minutes ago, IP-Gamers said:

Yes, yes, it also pisses me off that some unnecessary pages end up in the search engine. And how to deal with this, I do not know. Yes, I'm a noob, and help would not hurt me.

You could use the live meta tag editor to add robot instructions to the pages which you don't want to have indexed.

That said, we made also several improvements in IPS 4.6 to remove the CSRF key from URLs

Thomas P · April 22, 2021

You can easily create a robots.txt files with wildcards:

Check the section: "How to Use wildcards in robots.txt" here: An SEO's Guide to Robots.txt, Wildcards, the X-Robots-Tag and Noindex (builtvisible.com) for example.

To prevent the report URL or other sections being indexed we use this e.g.:

Disallow: /*?do=reportComment*
Disallow: /*?do=add

As the robots.txt file is very site and version specific and there are other methods, see article mentioned above, I think every site owner has to create his own robots.txt file or "no index" mechanism.

Edited April 22, 2021 by Thomas P

SeNioR- · April 22, 2021

@Thomas P

Thomas have you tested too:

do=GetNewComment
do=GetLastComment

?

I am asking because Google also indexes the same topic several times with these query strings.

Also

Quote

Disallow: /*?do=reportComment*

this query string is already excluded by the Meta Robots Tag so it doesn't need to be added to robots.txt.

Edited April 22, 2021 by SeNioR-

Daniel F · April 22, 2021

BTW, if you're hosted with us, you're of course able to create the robots.txt too 🙂

Just mentioning it here for our cloud clients which probably run into this topic.. 🙂

gaby · April 22, 2021

12 hours ago, Thomas P said:

As the robots.txt file is very site and version specific and there are other methods, see article mentioned above, I think every site owner has to create his own robots.txt file or "no index" mechanism.

I agree with you, but I think that certain pages (like login, logout, /*?do=add, /*?do=reportComment*, etc. ) should be disallowed by default...
Can you tell me please which pages you have disallowed from robots.txt?

Thomas P · April 23, 2021

7 hours ago, gaby said:

I agree with you, but I think that certain pages (like login, logout, /*?do=add, /*?do=reportComment*, etc. ) should be disallowed by default...
Can you tell me please which pages you have disallowed from robots.txt?

16 hours ago, SeNioR- said:
@Thomas P

Thomas have you tested too:
do=GetNewComment
do=GetLastComment
?

I am asking because Google also indexes the same topic several times with these query strings.

Also

this query string is already excluded by the Meta Robots Tag so it doesn't need to be added to robots.txt.

Those are samples, as mentioned before every site, every site setting and environment is specific.
For the URL parameters containing "do" google recognized them -in our case- as possible attribute for the same content and decides by itself, if the parameter has no influence on the page content or if it changes the page content. Check Google Search Console and the appropriate section.

There are different methods to declare a page or url parameter not being indexed be it the robots.txt file, meta tags in the ACP or the Google Search Console settings etc.

Edited April 23, 2021 by Thomas P

Ilya Hoilik · April 23, 2021

You can exclude duplicates by using following rules:

Disallow: /*do=*
Disallow: /*sort=*
Disallow: /*sortby=*
Disallow: /*csrf=*
Disallow: /*csrfKey=*
Disallow: */?tab=*
Disallow: */?_fromLogin=*
Disallow: */?_fromLogout=*

My entire robots.txt https://hoilik.com/robots-txt-for-invision-community-473d8aa32984

Edited April 23, 2021 by Ilya Hoilik

Linux-Is-Best · April 23, 2021

6 hours ago, Ilya Hoilik said:
You can exclude duplicates by using following rules:
Disallow: /*do=*
Disallow: /*sort=*
Disallow: /*sortby=*
Disallow: /*csrf=*
Disallow: /*csrfKey=*
Disallow: */?tab=*
Disallow: */?_fromLogin=*
Disallow: */?_fromLogout=*
My entire robots.txt https://hoilik.com/robots-txt-for-invision-community-473d8aa32984

That was very kind and considerate of you to share your full robots.txt with the community. Thank you for being helpful. 👍

gaby · April 23, 2021

@Ilya Hoilik Thanks a lot!

IP-Gamers · May 3, 2021

On 4/23/2021 at 4:32 PM, Linux-Is-Best said:

My entire robots.txt https://hoilik.com/robots-txt-for-invision-community-473d8aa32984

I took the full robots.txt from this link, at the moment I have moved away from Yandex, and I work with Bing.
So here's the line in Robots.txt:

Host: https://ip-gamers.net

In Bing, Webmaster is marked with a red cross. What's wrong with her?

Linux-Is-Best · May 3, 2021

1 hour ago, IP-Gamers said:

In Bing, Webmaster is marked with a red cross. What's wrong with her?

I am sorry, but I know nothing about Bing. I focus most of my SEO (search engine optimization) with Baidu and occasionally Google in mind. Sorry, I could not be of further help. 😞

robots.txt recommendation for IPS 4.5.4.2

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Recently Browsing 0 members

Upcoming Events

Trending Content