gaby Posted April 7, 2021 Posted April 7, 2021 Hello, I just checked the latest indexed articles in Google and i noticed that they are the following: lots of "sign in" pages with different URLs, and "create new topic" page from topic view. The URLs are the following: https://pctroubleshooting.ro/login/?ref=some_random_charcters https://pctroubleshooting.ro/forum/45-jocuri-online/?do=add from my point of view I should forbid the indexing of these pages. what do you think? how can i disallow that dynamic URLs? do you have any recommendation for robots.txt? Thank you! AlexJ and IP-Gamers 1 1
IP-Gamers Posted April 22, 2021 Posted April 22, 2021 Yes, yes, it also pisses me off that some unnecessary pages end up in the search engine. And how to deal with this, I do not know. Yes, I'm a noob, and help would not hurt me.
Daniel F Posted April 22, 2021 Posted April 22, 2021 4 minutes ago, IP-Gamers said: Yes, yes, it also pisses me off that some unnecessary pages end up in the search engine. And how to deal with this, I do not know. Yes, I'm a noob, and help would not hurt me. You could use the live meta tag editor to add robot instructions to the pages which you don't want to have indexed. That said, we made also several improvements in IPS 4.6 to remove the CSRF key from URLs IP-Gamers, Claudia999, Maxxius and 2 others 2 3
Thomas P Posted April 22, 2021 Posted April 22, 2021 (edited) You can easily create a robots.txt files with wildcards: Check the section: "How to Use wildcards in robots.txt" here: An SEO's Guide to Robots.txt, Wildcards, the X-Robots-Tag and Noindex (builtvisible.com) for example. To prevent the report URL or other sections being indexed we use this e.g.: Disallow: /*?do=reportComment* Disallow: /*?do=add As the robots.txt file is very site and version specific and there are other methods, see article mentioned above, I think every site owner has to create his own robots.txt file or "no index" mechanism. Edited April 22, 2021 by Thomas P IP-Gamers, Linux-Is-Best and SeNioR- 1 2
SeNioR- Posted April 22, 2021 Posted April 22, 2021 (edited) @Thomas P Thomas have you tested too: do=GetNewComment do=GetLastComment ? I am asking because Google also indexes the same topic several times with these query strings. Also Quote Disallow: /*?do=reportComment* this query string is already excluded by the Meta Robots Tag so it doesn't need to be added to robots.txt. Edited April 22, 2021 by SeNioR-
Daniel F Posted April 22, 2021 Posted April 22, 2021 BTW, if you're hosted with us, you're of course able to create the robots.txt too 🙂 Just mentioning it here for our cloud clients which probably run into this topic.. 🙂 Linux-Is-Best 1
gaby Posted April 22, 2021 Author Posted April 22, 2021 12 hours ago, Thomas P said: As the robots.txt file is very site and version specific and there are other methods, see article mentioned above, I think every site owner has to create his own robots.txt file or "no index" mechanism. I agree with you, but I think that certain pages (like login, logout, /*?do=add, /*?do=reportComment*, etc. ) should be disallowed by default...Can you tell me please which pages you have disallowed from robots.txt?
Thomas P Posted April 23, 2021 Posted April 23, 2021 (edited) 7 hours ago, gaby said: I agree with you, but I think that certain pages (like login, logout, /*?do=add, /*?do=reportComment*, etc. ) should be disallowed by default...Can you tell me please which pages you have disallowed from robots.txt? 16 hours ago, SeNioR- said: @Thomas P Thomas have you tested too: do=GetNewComment do=GetLastComment ? I am asking because Google also indexes the same topic several times with these query strings. Also this query string is already excluded by the Meta Robots Tag so it doesn't need to be added to robots.txt. Those are samples, as mentioned before every site, every site setting and environment is specific. For the URL parameters containing "do" google recognized them -in our case- as possible attribute for the same content and decides by itself, if the parameter has no influence on the page content or if it changes the page content. Check Google Search Console and the appropriate section. There are different methods to declare a page or url parameter not being indexed be it the robots.txt file, meta tags in the ACP or the Google Search Console settings etc. Edited April 23, 2021 by Thomas P
Ilya Hoilik Posted April 23, 2021 Posted April 23, 2021 (edited) You can exclude duplicates by using following rules: Disallow: /*do=* Disallow: /*sort=* Disallow: /*sortby=* Disallow: /*csrf=* Disallow: /*csrfKey=* Disallow: */?tab=* Disallow: */?_fromLogin=* Disallow: */?_fromLogout=* My entire robots.txt https://hoilik.com/robots-txt-for-invision-community-473d8aa32984 Edited April 23, 2021 by Ilya Hoilik PPlanet, SeNioR-, Linux-Is-Best and 4 others 7
Linux-Is-Best Posted April 23, 2021 Posted April 23, 2021 6 hours ago, Ilya Hoilik said: You can exclude duplicates by using following rules: Disallow: /*do=* Disallow: /*sort=* Disallow: /*sortby=* Disallow: /*csrf=* Disallow: /*csrfKey=* Disallow: */?tab=* Disallow: */?_fromLogin=* Disallow: */?_fromLogout=* My entire robots.txt https://hoilik.com/robots-txt-for-invision-community-473d8aa32984 That was very kind and considerate of you to share your full robots.txt with the community. Thank you for being helpful. 👍 Ilya Hoilik 1
IP-Gamers Posted May 3, 2021 Posted May 3, 2021 On 4/23/2021 at 4:32 PM, Linux-Is-Best said: My entire robots.txt https://hoilik.com/robots-txt-for-invision-community-473d8aa32984 I took the full robots.txt from this link, at the moment I have moved away from Yandex, and I work with Bing. So here's the line in Robots.txt: Host: https://ip-gamers.net In Bing, Webmaster is marked with a red cross. What's wrong with her?
Linux-Is-Best Posted May 3, 2021 Posted May 3, 2021 1 hour ago, IP-Gamers said: In Bing, Webmaster is marked with a red cross. What's wrong with her? I am sorry, but I know nothing about Bing. I focus most of my SEO (search engine optimization) with Baidu and occasionally Google in mind. Sorry, I could not be of further help. 😞
Recommended Posts