Durango Posted February 28, 2016 Share Posted February 28, 2016 Hi I realized Google would index three urls for the same topic : yoursite.com/topic/1874-name-of-topic/ yoursite.com/topic/1874-name-of-topic/?do=getNewComment yoursite.com/topic/1874-name-of-topic/?do=getLastComment which makes a lot of duplicate and can harm the SEO of your site there might me some more duplicate urls in IPB4 but these are the first i noticed Then, you could add in your robots.txt this to stop this duplicate : Disallow: /*?do=getNewComment Disallow: /*?do=getLastComment or more simply if i am correct : Disallow: /*? Remind that Robots.txt only disallow crawling but not indexation You could also add an Htacces rule to prevent indexation : <FilesMatch "\getNewComment$"> Header set X-Robots-Tag "noindex, follow" </Files> <FilesMatch "\getLastComment$"> Header set X-Robots-Tag "noindex, follow" </Files> these last rules dont work any idea ? Link to comment Share on other sites More sharing options...
sound Posted February 28, 2016 Share Posted February 28, 2016 doesn't the canonical tags get rid of these concerns ? eg for this topic <link rel="canonical" href="https://community.invisionpower.com/topic/427126-ipb-4-duplicate-content/" /> Link to comment Share on other sites More sharing options...
tnn Posted February 29, 2016 Share Posted February 29, 2016 13 hours ago, sound said: doesn't the canonical tags get rid of these concerns ? eg for this topic <link rel="canonical" href="https://community.invisionpower.com/topic/427126-ipb-4-duplicate-content/" /> That is something that should definitely be used but I don't think it handles url parameters such as /*?do=getNewComment so a good way is with the robots.txt. Also, Google webmasters tools has an advanced url parameters tool to handle this issue but you have to know what you are doing. Robots.txt is easier for me. I think if the canonical fixed all those issues, they wouldn't have or recommend in their support the use of the url parameter tool. Quote or more simply if i am correct : Disallow: /*? Disallow: /*?* Is probably what you meant which blocks all urls that have a question mark after the / (e.g /him?do=getLastComment) Disallow: /*?do=getLastComment would of course only block that parameter. Link to comment Share on other sites More sharing options...
Durango Posted February 29, 2016 Author Share Posted February 29, 2016 Hi tnn You are right, Canonical is not enough Best is canonical + robots.txt + meta noindex you are right this is the right one : Disallow: /*?* Link to comment Share on other sites More sharing options...
sobrenome Posted March 7, 2016 Share Posted March 7, 2016 On February 29, 2016 at 2:11 PM, Durango said: Hi tnn You are right, Canonical is not enough Best is canonical + robots.txt + meta noindex you are right this is the right one : Disallow: /*?* If you do this you are going to block pagination crawl. Link to comment Share on other sites More sharing options...
Durango Posted March 7, 2016 Author Share Posted March 7, 2016 You're right Sobrenome then the right lines to add in the robots.txt are : Disallow: /profile/ Disallow: /*$do=getNewComment Disallow: /*$do=getLastComment I added /profile/ too as so many profiles are empty, it could harm the SEO, also it will prevent the profile spammers to spam your board Link to comment Share on other sites More sharing options...
sobrenome Posted March 7, 2016 Share Posted March 7, 2016 5 hours ago, Durango said: You're right Sobrenome then the right lines to add in the robots.txt are : Disallow: /profile/ Disallow: /*$do=getNewComment Disallow: /*$do=getLastComment I added /profile/ too as so many profiles are empty, it could harm the SEO, also it will prevent the profile spammers to spam your board You can block all "do=" stuff! Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.