IPB 4 & Duplicate content

Durango · February 28, 2016

Hi

I realized Google would index three urls for the same topic :

yoursite.com/topic/1874-name-of-topic/

yoursite.com/topic/1874-name-of-topic/?do=getNewComment

yoursite.com/topic/1874-name-of-topic/?do=getLastComment

which makes a lot of duplicate and can harm the SEO of your site

there might me some more duplicate urls in IPB4 but these are the first i noticed

Then, you could add in your robots.txt this to stop this duplicate :

Disallow: /*?do=getNewComment
Disallow: /*?do=getLastComment

or more simply if i am correct :

Disallow: /*?

Remind that Robots.txt only disallow crawling but not indexation

You could also add an Htacces rule to prevent indexation :

<FilesMatch "\getNewComment$">
Header set X-Robots-Tag "noindex, follow"
</Files>

<FilesMatch "\getLastComment$">
Header set X-Robots-Tag "noindex, follow"
</Files>

these last rules dont work

any idea ?

sound · February 28, 2016

doesn't the canonical tags get rid of these concerns ?

eg for this topic

tnn · February 29, 2016

13 hours ago, sound said:

doesn't the canonical tags get rid of these concerns ?

eg for this topic

<link rel="canonical" href="https://community.invisionpower.com/topic/427126-ipb-4-duplicate-content/" />

That is something that should definitely be used but I don't think it handles url parameters such as /*?do=getNewComment so a good way is with the robots.txt. Also, Google webmasters tools has an advanced url parameters tool to handle this issue but you have to know what you are doing. Robots.txt is easier for me. I think if the canonical fixed all those issues, they wouldn't have or recommend in their support the use of the url parameter tool.

Quote

or more simply if i am correct :
Disallow: /*?

Disallow: /*?*
Is probably what you meant which blocks all urls that have a question mark after the / (e.g /him?do=getLastComment)

Disallow: /*?do=getLastComment would of course only block that parameter.

Durango · February 29, 2016

Hi tnn

You are right, Canonical is not enough

Best is canonical + robots.txt + meta noindex

you are right this is the right one : Disallow: /*?*

sobrenome · March 7, 2016

On February 29, 2016 at 2:11 PM, Durango said:

Hi tnn

You are right, Canonical is not enough

Best is canonical + robots.txt + meta noindex

you are right this is the right one : Disallow: /*?*

If you do this you are going to block pagination crawl.

Durango · March 7, 2016

You're right Sobrenome

then the right lines to add in the robots.txt are :

Disallow: /profile/
Disallow: /*$do=getNewComment
Disallow: /*$do=getLastComment

I added /profile/ too as so many profiles are empty, it could harm the SEO, also it will prevent the profile spammers to spam your board

sobrenome · March 7, 2016

5 hours ago, Durango said:
You're right Sobrenome

then the right lines to add in the robots.txt are :
Disallow: /profile/
Disallow: /*$do=getNewComment
Disallow: /*$do=getLastComment
I added /profile/ too as so many profiles are empty, it could harm the SEO, also it will prevent the profile spammers to spam your board

You can block all "do=" stuff!

Invision Community 5: Assign topics to moderators

Invision Community 4: Pages databases in Clubs

Invision Community 5: Live Topic Improvements

Invision Community 5: New Live Community Features

IPB 4 & Duplicate content

Recommended Posts

Durango

Link to comment

Share on other sites

sound

Link to comment

Share on other sites

tnn

Link to comment

Share on other sites

Durango

Link to comment

Share on other sites

sobrenome

Link to comment

Share on other sites

Durango

Link to comment

Share on other sites

sobrenome

Link to comment

Share on other sites

Archived

Recently Browsing 0 members

Upcoming Events

Trending Content

More