Jump to content

Canonical for articles comments (pages)


Sonya*

Recommended Posts

Posted

The canonical for articles comments (pages) is always set to article page despite of the comments page. E. g. on page 5 of the article comments we have

<link rel="first" href="https://www.mysite.com/articles/category/article/">
<link rel="prev" href="https://www.mysite.com/articles/category/article/page/4/">
<link rel="next" href="https://www.mysite.com/articles/category/article/page/6/">
<link rel="last" href="https://www.mysite.com/articles/category/article/page/7/">
<link rel="canonical" href="https://www.mysite.com/articles/category/article/">  <----- canonical does not include page!!!

At the same time in forums, gallery or blogs we have page included in canonical

<link rel="first" href="https://www.mysite.com/blogs/entry/211973-whatever/">
<link rel="prev" href="https://www.mysite.com/blogs/entry/211973-whatever/">
<link rel="canonical" href="https://www.mysite.com/blogs/entry/211973-whatever/page/2/">  <---- canonical includes page!!!

I think the canonical in articles should include page as well so that it works the same way as in other applications.

The same for the articles categories: the canonical link is always set to the category index page. The pages in the categories view are not included in canonical.

Posted

I cannot submit a bug. We have not updated our live server to the 4.4.4 due to notifications issues. We are on 4.4.3 and waiting for notifications patch or new maintenance release. Support says us in every ticket they cannot help till we have upgraded. And they are also not allowed to escalate if project is not running latest version.

The only possibility to submit a bug for me right now is this forum. Probably somebody who runs 4.4.4 can submit a bug? 

Posted
11 hours ago, Sonya* said:

I think the canonical in articles should include page as well so that it works the same way as in other applications.

I’m not so sure about that. With forums, you clearly have different content on each page. With articles, you always load the same article (just with different comments), so preventing duplicate content regarding the article by using one canonical URL sounds exactly what canonical URLs are all about. If the article itself was hidden on the comment pages, then it might be a different case. 

Quote

The rel=canonical element, often called the “canonical link”, is an HTML element that helps webmasters prevent duplicate content issues. It does this by specifying the “canonical URL”, the “preferred” version of a web page – the original source, even. Using it well improves a site’s SEO.

The idea is simple: if you have several similar versions of the same content, you pick one “canonical” version and point the search engines at it. This solves the duplicate content problem where search engines don’t know which version of the content to show in their results. This article takes you through how and when to use them, and how to avoid common mistakes.

It still may or may not be intentional behavior. I don’t know. IPS would have to clarify that. 

Either way: bugs reports should not require the latest release on the client site as long as the problem can be replicated on any installation. There would be nothing to check out on the client’s website. IPS support could just replicate it on a test installation. 

Posted
Just now, opentype said:

I’m not so sure about that.

If this works as desired then they should remove entirely:

<link rel="first" href="https://www.mysite.com/articles/category/article/">
<link rel="prev" href="https://www.mysite.com/articles/category/article/page/4/">
<link rel="next" href="https://www.mysite.com/articles/category/article/page/6/">
<link rel="last" href="https://www.mysite.com/articles/category/article/page/7/">

Those tags make no sense if canonical is always set to the first page. Search engine would try to crawl the links but fails as they are not canonical and exclude the pages from index. And as Google does not support prev and next any more, there is no chance for these pages to ever get into index.

Posted
17 minutes ago, opentype said:

With articles, you always load the same article (just with different comments), so preventing duplicate content regarding the article by using one canonical URL sounds exactly what canonical URLs are all about. If the article itself was hidden on the comment pages, then it might be a different case. 

I can see your point. I am not sure what is better as well. Probably IPS can consult a SEO expert on this? But we have the same situation with blog entries. The blog entry is on every page but canonical for the comments is set to /page/X. It should work the same way in every application.

  • Management
Posted

I've been thinking about this and for Pages, it makes sense that the canonical view is the root page, because unlike topics, Pages shows the same content article on every single page with just the comments changing. Topics are different because each page is unique.

I would be tempted to go back to param based pagination for Pages (?page=2) to make it more obvious to search engines but I don't think there is any 'real' value in doing that.

Posted
13 minutes ago, Matt said:

I've been thinking about this and for Pages, it makes sense that the canonical view is the root page, because unlike topics, Pages shows the same content article on every single page with just the comments changing. Topics are different because each page is unique.

And what about blog entries and custom apps like Link Directory or Tutorials or Videos? Or Commerce and Downloads? If there are two approaches that are both valid depending on the content posted then I would suggest to create a setting for it. E. g. if we have few content in the root page but lot of rich comments/reviews then it could be useful to have all the comment pages in index. 

Posted

@Matt I have just investigated further. For someone who use Databases for reviews it will be critical to exclude review pages from crawling. In this case the rich content ARE the reviews and not the record itself.

Posted
34 minutes ago, opentype said:

It would also be good to post the links to where you got that from. 

Example for reviews: https://www.trustpilot.com/review/www.notonthehighstreet.com There are 640 reviews on this example to find on pages:

With reviews, you always load the same record (in this case www.notonthehighstreet.com) just with different reviews. But the reviews ARE what is really important here not the record itself. If databases (and articles is a database) would generally use the root page of record as canonical and place this canonical on every review page then no reviews from the pages 2, 3 and so on would be ever indexed.

It means that databases for reviews would not really make sense in Pages. In this case only reviews on the first page of the record would be indexed. For all newer (!) reviews on the subsequent pages below the record we would say with canonical: "No! Please do exclude these pages with reviews as these all are duplicates of the first page." But they are not duplicate (see example above). The pages are very important in this case and this is how databases can be used as well. Not only for articles.

 

Posted
9 hours ago, Sonya* said:

It means that databases for reviews would not really make sense in Pages.

I disagree again. For such review pages, the record itself usually holds the information about the product (name, description, features, price, whatever) and then the average rating, which is also put in the schema information so search engines can use it. That is exactly what you want indexed and appear in Google. Let’s say I have a book review database. The book is described in the record and the schema rating will be tied to the record.  
The individual reviews on such a record are essentially additional comments explaining the average rating. I don’t know why you would say “the reviews ARE what is really important here”. Are you saying, people would google a specific phrase from a review? I don’t get it. You think it’s more important to find “I loved the book! Read it in two days” instead of the record title in a query like “[book title] or “[book title] review”?

By the way: canonical is about preference. It is not identical to “no index”. Instead of distributing the page rank among 5 pages of reviews for the same product, the canonical link makes sure it is all tied to the first page. 

Posted

@opentype, what you write is your point of view, but technically it is wrong. You have added a quote earlier what canonical is for:

Quote

The rel=canonical element, often called the “canonical link”, is an HTML element that helps webmasters prevent duplicate content issues.

There is no double content in the links I have added above. They are entirely different. Each page contains different reviews and they are not duplicate.

There are indeed two different approaches: 

  1. Review written in the record body by community owner (rich content) and comments on this review (probably thin content). In this case canonical on the record can be right.
  2. Reviews added by community users (rich content) on the record almost without body content (thin content). In this case the review pages are valuable content and not the record itself with two description lines. The canonical on the record could be critical.

Databases in Pages can be used for both. 

 

Posted

First of all, please read the entire article I link and don’t limit it to a single sentence I quoted. The article clearly also supports what I just said about preference. And please be careful claiming I am wrong, when you cannot demonstrate that without the shadow of a doubt. You haven’t done that in any way. You haven’t even said what exactly I am wrong about. Quote it, repeat it, whatever. Then provide the correction. Be clear and precise. You should know by now that I do not accept false accusations in any way. 

Second of all, I do not care about the Trustpilot examples at all. They are not made with Pages, so they are irrelevant. Pages doesn’t work like this. I asked you for links supporting your claims like “critical to exclude review pages from crawling«. Where did you “investigate” that? You can’t accuse me of just providing a “point of view“, when you can’t back up what you are saying at the same time, e.g. by pointing to a Google article (or at least expert pages) supporting what you are saying. 

25 minutes ago, Sonya* said:

 

Reviews added by community users (rich content) on the record almost without body content (thin content). In this case the review pages are valuable content and not the record itself with two description lines.

Sorry, I can’t accept that as a premise for your argument. Who says the record has two lines? Why does that follow from allowing reviews for the record? That makes no sense. And one more time: canonical does not mean “no index”. Setting a canonical for your two line record does NOT automatically mean, Google will ignore the review texts. Again, that just doesn’t follow.  

Posted
2 hours ago, opentype said:

For such review pages, the record itself usually holds the information about the product (name, description, features, price, whatever)

Please provide example links where you see those reviews. I do not know what you speak about.

2 hours ago, opentype said:

That is exactly what you want indexed and appear in Google.

Probably you want this to be indexed. Me - not. I would like to have user reviews to be indexed. All pages with reviews, to be exact.

2 hours ago, opentype said:

Let’s say I have a book review database.

I do not speak about book reviews, please see example links above. If you speak about book reviews please provide examples to illustrate what you mean.

2 hours ago, opentype said:

Are you saying, people would google a specific phrase from a review?

Yes. Over 90% search phrases are long-tail search phrases and not just titles of book, product or services.

2 hours ago, opentype said:

By the way: canonical is about preference. It is not identical to “no index”.

Pages where canonical link is not the same as viewed page are excluded from index. See documentation in Google Webmaster Tools. 

19 minutes ago, opentype said:

Second of all, I do not care about the Trustpilot examples at all.

Please provide exact examples of reviews you would like to have from me and that you would care about. I am not sure what kind of examples do you want from me. 

 

  • Management
Posted

@Sonya* Regarding https://www.trustpilot.com/review/www.notonthehighstreet.com?page=2 - there each page is different.

With a pages record you have at least half the page with a single article that is shown on page 2, 3 and so on.

8#) 2019-06-19 09-49-09.jpg

Based on this, canonical links to each page for an article would be incorrect because it would then see "Article text" as content duplicated across multiple pages.

Keep in mind that canonical is not a hard rule for Google. It uses it to decide which is the best page to index, it doesn't mean it will ignore multiple pages. We use the meta pagination tags to further inform Google about the structure of the document.

 

Posted

This is exactly what I say. Databases in Pages can be used in two different ways.

1 hour ago, Sonya* said:

There are indeed two different approaches: 

  1. Review written in the record body by community owner (rich content) and comments on this review (probably thin content). In this case canonical on the record can be right.
  2. Reviews added by community users (rich content) on the record almost without body content (thin content). In this case the review pages are valuable content and not the record itself with two description lines. The canonical on the record could be critical.

The first one is an example for articles in Pages. But I am also able to create Trustpilot-like database in Pages for reviews, right? In this case I would not like to have canonical for the first page as every page would be different.

And if you decide to leave it as is and set canonical to the first page for all comments and reviews pages then you should remove the following tags:

<link rel="first" href="https://www.mysite.com/articles/category/article/">
<link rel="prev" href="https://www.mysite.com/articles/category/article/page/4/">
<link rel="next" href="https://www.mysite.com/articles/category/article/page/6/">
<link rel="last" href="https://www.mysite.com/articles/category/article/page/7/">

They do not make sense if you say the canonical is the first page. Or what is the reason for recommending search engines links to the pages that are not canonical?

Posted

I would suggest you look for a developer to modify your database the way you want it. If you really want to, hide the record content from the following pages (or you have serious duplicate content issues), then add the canonical URLs for the following pages, so they are all treated uniquely as you want it. I do not see how you can make a case for that to be a stock Pages feature though. 

But I’m not going to continue to debate it this way. It’s not constructive and going in circles. 
Looks like Matt is making sensible decisions about this anyway. 

 

Posted
9 minutes ago, opentype said:

I would suggest you look for a developer to modify your database the way you want it.

I do not want to modify something. I just try to find common logic behind canonical in IPS. If the issue above is not an issue and works as desired, then @Matt should make a clear statement: yes, we do not want comments or reviews pages for ALL databases created in Pages to be listed in search engines. Nor for articles neither for custom databases created e. g. for user reviews. 

  • Management
Posted
18 minutes ago, Sonya* said:

This is exactly what I say. Databases in Pages can be used in two different ways.

The first one is an example for articles in Pages. But I am also able to create Trustpilot-like database in Pages for reviews, right? In this case I would not like to have canonical for the first page as every page would be different.

And if you decide to leave it as is and set canonical to the first page for all comments and reviews pages then you should remove the following tags:


<link rel="first" href="https://www.mysite.com/articles/category/article/">
<link rel="prev" href="https://www.mysite.com/articles/category/article/page/4/">
<link rel="next" href="https://www.mysite.com/articles/category/article/page/6/">
<link rel="last" href="https://www.mysite.com/articles/category/article/page/7/">

They do not make sense if you say the canonical is the first page. Or what is the reason for recommending search engines links to the pages that are not canonical?

Every single database record regardless of how it is configured will have a Title and Content field followed by comments and reviews. I don't think there's a way without modifying the code to have a review database that doesn't always show the "content" on each page.

You raise a good point about consistency and blog. I believe this also should not show the canonical as the current page, but rather the root page for the same reasons.

But again, I do not think this is a do or die / be unlisted by Google because the canonical is incorrect in your eyes. Google uses the canonical as a suggestion for what may be the root page.

For topics, the posts are important, thus the canonical should be the set to the current page.
For blogs and articles, the main content (not the comments) are the most important and SEO rich part of the entire structure thus it makes sense that the canonical is consistent across all pages.

Posted
1 minute ago, Matt said:

Every single database record regardless of how it is configured will have a Title and Content field followed by comments and reviews. I don't think there's a way without modifying the code to have a review database that doesn't always show the "content" on each page.

The "content" can consist of just one line, like short product description. In this case it is not critical to repeat this on every review page. As 90% of the content will be made our of reviews and not of the "content" field of the record. This is how databases can be used, they do not require rich content in the "content" field.

8 minutes ago, Matt said:

Google uses the canonical as a suggestion for what may be the root page.

In my Google Webmaster Tools all non canonical pages are really excluded from search index. It does not look like a suggestion, the non-canonical pages are really not listed. 

screenshot-search.google.com-2019_06.19-11-30-37.thumb.png.32194a91eea9980c97ea86d58a7851ac.png

Recently we have loose almost all topics from clubs due to this canonical issue. They have been dropped from index. The pages were completely different but Google used user-defined canonical to drop the topics in favor of forum index page. Thousands of topics have been removed from search index of our site just due to the false canonical.

22 minutes ago, Matt said:

For blogs and articles, the main content (not the comments) are the most important and SEO rich part of the entire structure thus it makes sense that the canonical is consistent across all pages.

Our user blogs have few content but are rich on comments. If user enters just few lines in blog entry and gets 100 comments on it, then only first page would go into index according to canonical. All other comment pages would be excluded or dropped. If you change this generally, PLEASE inform us about it. We do get a lot of traffic to the comment pages and I do not want to loose them suddenly.

I still suggest to make a setting for this. All databases, blogs, links directories and so on are different out there. Some users or community owners use "content" field to create rich content, the other rely on user generated content for the record that consists mainly of comments and reviews and do not care about "content" field. If you change canonical for all them mandatory some sites can have negative impact on organic search traffic.

You are definitely free to define it your way, but I just would like to know what you decide and change in next version to "save" my traffic to the custom databases without rich "content" field and user blogs.

  • Management
Posted

I guess we could add a setting, but most people wouldn't understand what the setting is for or why they need to choose.

The best option may be to remove the canonical tag completely and let Google figure it out themselves.

Posted
9 minutes ago, Matt said:

The best option may be to remove the canonical tag completely and let Google figure it out themselves.

I think this would be best solution indeed.

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...