modman Posted July 31 Posted July 31 Hi, I noticed a spike in URLs in Google Search Console after updating to 4.7.18. The URLs went from 600k to 1.3 Million, all present in the category "Duplicate page without canonical URL selected by the user", therefore with error, in fact this is the list of URLs that should not be there: SeNioR- 1
Marc Posted July 31 Posted July 31 Thank you for bringing this issue to our attention! I can confirm this should be further reviewed and I have logged an internal bug report for our development team to investigate and address as necessary, in a future maintenance release.
Marc Posted July 31 Posted July 31 Have you just added a currency at all? Edit: On taking a closer look at this, it doesnt actually appear to be an issue here, as they are not being indexed.
modman Posted July 31 Author Posted July 31 The currency plugin has been installed for some time, maybe years. However, those URLs have appeared now and I don't understand where they come from, let alone why they are there. SeNioR- 1
Jim M Posted July 31 Posted July 31 1 hour ago, modman said: The currency plugin has been installed for some time, maybe years. However, those URLs have appeared now and I don't understand where they come from, let alone why they are there. You would want to contact the author for assistance in answering that if those URLs are strictly related to their application.
Jim M Posted July 31 Posted July 31 5 minutes ago, modman said: What about the "&crfkey=" parameter? That is a part of that call.
modman Posted July 31 Author Posted July 31 I understand, but it is the only parameter that changes in the URLs, what does it correspond to? without him there wouldn't be 700k more URLs...
modman Posted July 31 Author Posted July 31 I can't find any "currency" application or plugin, did you remove it?
Marc Posted August 1 Posted August 1 There isnt a 'currency' application that Im aware of. Note the person who mentioned a currency plugin was yourself, rather than us. As far as we are concerned its part of commerce. crfkey is something added to perameters as a security measure by the software to identify a user. You are chasing things here that, as far as we can see, are not an issue. Google is trying to index those pages, and the software is telling it not to do so, and therefore its not. There is no issue here from our perspective, hence why I asked what issue its causing you.
modman Posted August 1 Author Posted August 1 So for you generating URLs with a random parameter like Crfkey endlessly mocking Google and effectively blocking the indexing of real pages isn't it an error? Can't you show that currency selector only to registered users?
Marc Posted August 1 Posted August 1 Please could you advise on where you have found the indexing of real pages is being blocked here? There is no indication in any way that this is happening from what you have said there. Only the blocking (intentionally) of links that shouldnt be indexed
modman Posted August 1 Author Posted August 1 Google writes on the page where these pages are listed "Did you fix the problem?" Itis marked in red. Is it really that hard to understand? And then why did all this happen after the update?
Marc Posted August 1 Posted August 1 It is a problem only if those items are supposed to be indexed. They are not, and blocked by robots.txt, which Im assuming is what is being told to you on the page there. Unfortunately, I dont speak the language shown there, so indeed, its quite hard to understand. It happening after your update is purely coincidental. Nothing has happened in that area on the upgrade.
modman Posted August 1 Author Posted August 1 no one wants to have links that increase randomly every day, either as a user or on Google's part. It will be a continuous increase, why? Is all this caused only in the currency exchange, where this CRFKEY parameter appears which always changes randomly, why doesn't it exist on the other pages of the site? Why don't you just remove it for guests so you don't clog up Google every day with URLs that are of no use when it already takes ages to finish indexing the correct ones? Do you understand that Google doesn't like this and that my site will be penalized even more? And maybe the indexing will never be completely finished if in the meantime we send them thousands a day? But then... for what absurd reason does this stuff have to exist???
Marc Posted August 1 Posted August 1 We can take a look to block these from google entirely, but Im pointing out here that there is not an issue as such here. Google would not penalize you for blocking items you dont wish to be indexed. It will simply not index them, which is exactly what we want it to do. As I said, we will look at it. Quote But then... for what absurd reason does this stuff have to exist??? As I mentioned above, csrf is a security element of URLs relating to the identification of users.
Randy Calvert Posted August 1 Posted August 1 You’re making a mountain out of a molehill here honestly. This is not going to affect Google any way. It does not hurt your SEO. Honestly as a site owner, you’re focusing on something that does not help your site or drive traffic to it. Time is a valuable thing. Don’t waste it. 🙂
modman Posted August 1 Author Posted August 1 16 minutes ago, Randy Calvert said: You’re making a mountain out of a molehill here honestly. This is not going to affect Google any way. It does not hurt your SEO. Honestly as a site owner, you’re focusing on something that does not help your site or drive traffic to it. Time is a valuable thing. Don’t waste it. 🙂 Hi, this is your opinion, you don't work at Google, you don't know Google's algorithms. From my point of view: I would like to enter the service that Google makes available to me, called Search Console, and not see every day hundreds of thousands of URLs with a random parameter that are useless, in which on the same page Google identifies them as problems, whether or not they are authorized by robots.txt (which is a waste of time....) So please... don't come here and say what damages SEO and what doesn't.... these are your personal opinions, Google doesn't say this... and I DEMAND that my Search Console panel be free of this crap. I think that every now and then, even customers like us, should be taken into consideration. Here we don't pay to be made fun of, do your job!
Management Matt Posted August 1 Management Posted August 1 For others reading this, or coming in via search in the future, you can quickly stop Google from crawling pages with specific URL parameters by adjusting your robots.txt file. It takes a few minutes to do. Just add: Disallow: /*?currency=* @modman, getting angry, being rude and throwing insults around will just get you added to the moderation queue. Our team are here to help, not be insulted if you don't appreciate the answer given. SeNioR- 1
modman Posted August 1 Author Posted August 1 Adding that to robots.txt, will not improve the solution, there will always be thousands of random URLs added every day, they will simply be moved from the category "Duplicate page without canonical URL" to "Page found but blocked by Robots.txt". What changes?
Jim M Posted August 1 Posted August 1 1 hour ago, modman said: Adding that to robots.txt, will not improve the solution, there will always be thousands of random URLs added every day, they will simply be moved from the category "Duplicate page without canonical URL" to "Page found but blocked by Robots.txt". What changes? robots.txt is the way to block unimportant items which you do not want search engines and bots to crawl. If you do not like this solution, you're welcome to recommend a new one. However, this is respected and eventually, Google will drop the URL(s). Here is Google's recommendation of what to use a robots.txt for: Quote You can use a robots.txt file for web pages (HTML, PDF, or other non-media formats that Google can read), to manage crawling traffic if you think your server will be overwhelmed by requests from Google's crawler, or to avoid crawling unimportant or similar pages on your site. I highlighted what these are above and it falls under what Google recommends.
modman Posted August 1 Author Posted August 1 Too bad that only Google (and not always) follows Robots.txt, the other search engines do not know what it is, nor do they care, nor is it in their interest. Therefore the other engines will start an endless work of crawler to then penalize the site. Anyway, these are my curiosities: Why do I have to have a random link generator on my site. What is "CrfKey" for and why does it track random IDs if the BOT is unique. Why is it only used in the currency selector and not for example in other links. Can its use really not be replaced by Cookies (like for example the language selector?)
Jim M Posted August 1 Posted August 1 23 minutes ago, modman said: Too bad that only Google (and not always) follows Robots.txt, the other search engines do not know what it is, nor do they care, nor is it in their interest. robots.txt is a widely accepted method for outlining items which a bot/crawler should not access. It is an internet standard, which is why it is incorporated to our software. 23 minutes ago, modman said: Why do I have to have a random link generator on my site. CSRF key is not a random link generator. 23 minutes ago, modman said: What is "CrfKey" for and why does it track random IDs if the BOT is unique. CSRF key is a key which validates the user is the user who requested it. To read more about it: https://brightsec.com/blog/csrf-token/ 23 minutes ago, modman said: Why is it only used in the currency selector and not for example in other links. It is used for various links. 23 minutes ago, modman said: Can its use really not be replaced by Cookies (like for example the language selector?) The theme and language selector utilize a similar fashion to the currency selector.
modman Posted August 2 Author Posted August 2 Hi, Robots.txt is not a standard, it is not used by all search engines, many bots do not respect it, this is documented and demonstrable by me! Theme and language selector don't create thousands of random links, nor use CSRFkey, I do not see any link containing this parameter. Why not adopt this (winning) solution also with the currency selector? Since all my links in Google Search Console are related ONLY to the currency selector... could you kindly tell me where I can find other links (for guests) containing this parameter?
Stuart Silvester Posted August 2 Posted August 2 Robots exclusion (including robots.txt) is literally a standard: https://datatracker.ietf.org/doc/html/rfc9309 CSRF Keys are required on anything that changes the state of the application. The theme and language selectors are slightly different in how they work, they're forms that POST data when the buttons (themes) are selected. They still have CSRF key protection on them. In the time that you've been repeatedly complaining, the bug report has been reviewed (as Marc said it would), addressed (for .19) and closed. Jim M, Marc and Matt 3
Recommended Posts