Jump to content

Large community? You have a problems with sitemap!


Recommended Posts

I just noticed that all my sitemap urls in the index sitemap at sitemap.php are november 2018 dates on this site. I would think that every time it runs, the last mod should change, no? Maybe only if a topic in that sitemap updates? Its a bit odd. I'll have to check more on this. 

<lastmod>2018-11-07T20:47:02-05:00</lastmod>

Link to comment
Share on other sites

  • 1 month later...

So I'd like to follow up about the terrible downward spiral of my site that began around August 2018--it dropped from a USA site rank of ~15K to 85K. Certainly part of it was caused by Google's Medic update, however, a good deal of it, I believe, was also caused by flaws with the IPB sitemap that came back into play on my site (I should have been watching this more closely), and the flaws with how the topics are indexed, both of which were corrected in 4.4. 

Around April 2018 I used this thread to fix the lastmod issue in the sitemap in version 4.3, which had a very positive impact in Google search--my site rose from a ~30k rank to ~15K in the USA. The problem was that the modifications I did were done in a way in which they could get overwritten, and at some point after August 2018 they were overwritten, and I did not notice this. My site rank kept dropping for months.

I updated to 4.4 on March 10th, and since that moment my site is gradually climbing back. This leads me to believe that IPB should really port those fixes now to 4.3 (maybe even 4.2), because they really are flaws in the software that are dragging down the search rank of any site using those versions.

The main reason I am posting this, however, is to get an opinion of a possible new plugin that would create a per post sitemap, totally separate from the original sitemap.php, so it would be something like sitemap2.php, that would map every forum post.  So, for example all forum links like to individual posts would be in that sitemap:

https://www.celiac.com/forums/topic/109854-searching-for-refractory-friends/?do=findComment&amp;comment=933769

The sitemap functionality would work exactly the same as it does now, but it would just focus on the forum posts.

I would like to hear pro an con opinions of this approach. To me, I don't see any reason why it would not have a positive impact in google search. Please let me know your thoughts.

Link to comment
Share on other sites

28 minutes ago, sadams101 said:

So I'd like to follow up about the terrible downward spiral of my site that began around August 2018--it dropped from a USA site rank of ~15K to 85K. Certainly part of it was caused by Google's Medic update, however, a good deal of it, I believe, was also caused by flaws with the IPB sitemap that came back into play on my site (I should have been watching this more closely), and the flaws with how the topics are indexed, both of which were corrected in 4.4. 

Around April 2018 I used this thread to fix the lastmod issue in the sitemap in version 4.3, which had a very positive impact in Google search--my site rose from a ~30k rank to ~15K in the USA. The problem was that the modifications I did were done in a way in which they could get overwritten, and at some point after August 2018 they were overwritten, and I did not notice this. My site rank kept dropping for months.

I updated to 4.4 on March 10th, and since that moment my site is gradually climbing back. This leads me to believe that IPB should really port those fixes now to 4.3 (maybe even 4.2), because they really are flaws in the software that are dragging down the search rank of any site using those versions.

The main reason I am posting this, however, is to get an opinion of a possible new plugin that would create a per post sitemap, totally separate from the original sitemap.php, so it would be something like sitemap2.php, that would map every forum post.  So, for example all forum links like to individual posts would be in that sitemap:

https://www.celiac.com/forums/topic/109854-searching-for-refractory-friends/?do=findComment&amp;comment=933769

The sitemap functionality would work exactly the same as it does now, but it would just focus on the forum posts.

I would like to hear pro an con opinions of this approach. To me, I don't see any reason why it would not have a positive impact in google search. Please let me know your thoughts.

I'll buy this sitemap plugin if it comes out. Technically, Google should be able to crawl all posts from the topic url, but I'll try anything to test. I would build something in there to not include posts with less than "x" number of words, to not waste time crawling useless one worded posts.

 

Link to comment
Share on other sites

  • Management
1 hour ago, sadams101 said:

I updated to 4.4 on March 10th, and since that moment my site is gradually climbing back. This leads me to believe that IPB should really port those fixes now to 4.3 (maybe even 4.2), because they really are flaws in the software that are dragging down the search rank of any site using those versions.

Glad to hear that 4.4 has resolved the sitemap issues. 🙂

Link to comment
Share on other sites

@DawPi has completed the custom sitemap application that I mentioned which handles all posts. You can see an example of it here:

https://www.celiac.com/sitemap_posts.php

but it is modeled after the existing sitemap--but this one handles just forum posts. We were not able to add a minimum word count to this application, as one person suggested, because it slowed things down and put a load on the CPU. We were, however, able to add the ability to exclude certain forums, for example my site has around 25 different forums, so I was able to exclude a general chit chat and a technical support forum that are more or less off the main topic of my site.

Whether or not this will be a benefit or liability in Google search is an open question, but I'll follow up here with any noticeable changes, positive or negative.

I believe this will have positive results, simply because we are making it easier for Google to index specific content and allow them to provide more relevant search results for any given query. Currently Google's results may send people to a topic page like this one /page/10/, and the person who searched will still have to fish on it for the specific results of their query. With all posts indexed, Google should be able to send the person directly to the content that answers their query...at least that is my hope here.

Anyone interested in this app can contact @DawPi

Here is a screenshot of the admin page:

image.png

Edited by sadams101
add screenshot
Link to comment
Share on other sites

14 hours ago, sadams101 said:

I believe this will have positive results, simply because we are making it easier for Google to index specific content

I think this becomes a negative result for Google. Google scans the whole page and index content on a specific page. In your sitemap_posts file, you have tons of links to the same pages (they have not differences by the code or view). So with that, you bring Google just problems. Moreover, for preventing duplications Google gives us a links management service, where we can 'say' to Google which params identify something important with content changing (like a 'page' attribute) or something not necessarily (like an 'unread' param).
Additional, you made just problems for Google. If you wish to highlight or markup the content on the page - you should use schema.org meta, html5 tags. All of that already done by IPS. You don't need to do anything more in general purpose.
Anyway, if your really want to scan every post like an individual content - make a special request to the topic, which will show only one post without other (but with the link to the default view). In this way, you'll increase your number of scanned 'pages'. But it mustn't be good for your SEO.

image.thumb.png.f8b86364468ea884e92d49775104ac3f.png

Edited by Upgradeovec
Link to comment
Share on other sites

I don't think it will have any impact, mostly because we use canonical meta tags to prevent duplicate "pages" being indexed. When I clicked on this topic I ended up at URL:

https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/page/10/?tab=comments#comment-2784429

In the HTML page source I have

<link rel="canonical" href="https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/page/10/" />

So when Google visits the URL I did, the resulting URL they will index is 

https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/page/10/

 

With your customization you are attempting to send signals to Google "look at this URL and this URL and this URL" all from the same topic with the same canonical tag, which is different from the URL in the sitemap. I think, if anything, this will cause confusion to Google's spiders along the lines of "why does their sitemap say to index url X but when we visit it they tell us the canonical URL is Y?"

Link to comment
Share on other sites

7 hours ago, bfarber said:

I don't think it will have any impact, mostly because we use canonical meta tags to prevent duplicate "pages" being indexed. When I clicked on this topic I ended up at URL:


https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/page/10/?tab=comments#comment-2784429

In the HTML page source I have


<link rel="canonical" href="https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/page/10/" />

So when Google visits the URL I did, the resulting URL they will index is 


https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/page/10/

 

With your customization you are attempting to send signals to Google "look at this URL and this URL and this URL" all from the same topic with the same canonical tag, which is different from the URL in the sitemap. I think, if anything, this will cause confusion to Google's spiders along the lines of "why does their sitemap say to index url X but when we visit it they tell us the canonical URL is Y?"

Good information, thanks for weighing in!

Link to comment
Share on other sites

I appreciate your input here, and concerns with the new posts sitemap. As you know it will take time to see whether or not this will be a good or bad thing. I do understand the canonical link situation, but one positive I can think of for this would be that it will still direct google search to include all topic pages that have more than one page. For example, this is not in the sitemap:

https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/page/10/

nor are pages 9, 8, 7. 6, etc. Only page 1 is in the sitemap. To Google, at least how I've learned to interpret their sly "Google Speak," the message you are sending to Google by not including these pages in your sitemap is that the content on those pages it not important enough to include in their index. After all, they do say that if you want it included in their index, submit the page to their index. They have 2 ways to do this--1) manually; 2) use a sitemap.

On a side note, we're also developing a separate sitemap just for those extra topic pages. Overkill? Maybe. We'll see...

Edited by sadams101
Link to comment
Share on other sites

  • 2 weeks later...

I want to report a HUGE drop in my site's pages that are no longer in Google's index. As mentioned, I implemented the new custom posts sitemap on April 4th, and that is the day I also added the new sitemap to Google Search Console. As you can see below, since January I was seeing VERY SLOW drops in the number of pages not in google's index, and then on April 8th I saw a dramatic drop from 488,367 crawled but not indexed, to only 186,713. This is clearly a dramatic drop in non-indexed pages.

I realize that this isn't necessarily proof of concept here, but certainly my new sitemap has not hurt me as some of you predicted, and in my opinion it is solving this issue. I've read lots of Googles posts on this topic, and the bottom line is that if it isn't in your sitemap you are telling Google's spider that it isn't important.

image.thumb.png.633d3c3e4fa7537f0f5a84c583c8bbcb.png

Link to comment
Share on other sites

Indeed that is currently what is not in the index due to the "noindex" tags now in profiles without data, but 80% of the pages in this list were regular forum topic and post pages, and those appear to be going back into the index now. It was a 61% drop in the crawled but not indexed.

So what you used to see here were lots and lots of links that SHOULD HAVE BEEN in the index...many of those are now gone.

Edited by sadams101
Link to comment
Share on other sites

After further examination, many of the profiles currently in my list of crawled but not in index SHOULD be in the index--but I don't believe that profiles are in the sitemap. If this is correct, my next project is a sitemap for profiles, and I am also looking into this for article comments as well. The more detailed the sitemap the better.

Link to comment
Share on other sites

I have run an experiment for over a month. I have increased the number of entries in one sitemap from 500 to 1500 and have added lastmod date to the sitemap index that contains the last modification date of the newest entry in the sitemap (and NOT the timestamp of sitemap generation). Result: there was a slight increase in page indexed by Google, BUT there was no change in organic search traffic. 

For me it is not worth just increasing the number of pages in index if there is no influence on traffic. That's why I have given up to "optimize" the sitemap. Can somebody here give a prove that number of pages in Google index correlates with organic search traffic from Google? I mean the traffic is the goal at the end of the day and not the number of indexed pages.

Link to comment
Share on other sites

As a general SEO rule, the more keywords your site is ranking for, the more organic search traffic you should see, and this is simply because there are more possible ways to find your site in search. I saw the number of keywords that my site was ranking for drop in unison with the number of pages that were crawled but not in their index--it looked to me like a direct correlation--and it makes sense, because if 85% of my forum's pages are de-indexed, then so are any key words associated with those pages.

I think the first step for anyone with the same issue is to get de-indexed content back in the search engine, and to do this Google says to either submit each page one by one in GSC, or include them in your sitemap. 

Can anyone think of a reason, for example, why profiles would not need to be in the sitemap? In my case I have hundreds of profiles from doctors, researchers, etc., who have written articles for me for over 20 years, and they should give my site and very nice EAT score from Google, yet IPB does not include them in the sitemap, thus, sending a message to Google that the content is not valuable enough for their index. From what I've seen all my site's profiles are not in the index, not just the ones without content--ALL of them. Once I'm finished with the custom profile sitemap I predict all profile pages with content will actually be indexed.

Link to comment
Share on other sites

2 hours ago, sadams101 said:

As a general SEO rule, the more keywords your site is ranking for, the more organic search traffic you should see, and this is simply because there are more possible ways to find your site in search. I saw the number of keywords that my site was ranking for drop in unison with the number of pages that were crawled but not in their index--it looked to me like a direct correlation--and it makes sense, because if 85% of my forum's pages are de-indexed, then so are any key words associated with those pages.

This is not fully correct. If 85% of your pages had almost no traffic before drop then there will be no decrease in traffic after the drop. It means that the number of pages does not always correlate to traffic. Old pages, outdated pages, pages without valuable content can be just a number in the index but do not give you any traffic. 

How does the HUGE drop of indexed pages you have watched on the 8th of April affected your site traffic? Have you seen the drop in Performance report (Google Webmaster) or in Google Analytics?

 

 

Edited by Sonya*
Link to comment
Share on other sites

I've gone through this in earlier posts, but yes, much of this seems to be sitemap issues. I am also working on a sitemap for article comments. A full sitemap let's google know that the content is important enough to index. The lastmod date, which IPB did not have in their sitemap until 4.4, is crucial for efficiently crawling your site's latest content. Without that you just have to hope the spider will find it. 

Again, others don't need to follow me down this path of full and complete sitemaps that cover every inch on the site, with each having a proper lastmod date, but this is the path I am going down, and so far I do see positive results in both natural search traffic, which corresponds directly with the increase in the number of key words indexed.

PS - Having 750,000+ pages of content that has been crawled but is not in google index cannot be helpful for you in google search, could it? I'd love to have someone here explain how that could be. This was where I was at in January, and now I'm down to 186K, with a vast change after the posts sitemap I explained earlier.

Link to comment
Share on other sites

@sadams101 the pages that are crawled but not indexed have been already found by spider. Otherwise it would know the pages even exist. So I just miss a logic here: why do you think that crawled but not indexed pages are there due to the "false" sitemap? :blush:

20 minutes ago, sadams101 said:

A full sitemap let's google know that the content is important enough to index.

That would mean if the page is found by spider but is not included in the sitemap it will not be indexed. But if the same page is included in the sitemap then it will be indexed?

Link to comment
Share on other sites

I was of the same thought that the more you index will correlate to more traffic, but it just isn’t so if the content is not quality. Pages with what google sees as not valuable content (empty profile pages) can get dropped and if they don’t, don’t rank for anything good anyway. IPS default sitemap does include profile pages but excludes ones with no content, which I think is better. You don’t want spiders wasting time with pages that have no valuable/ranking content.

However, I do think some forum pages are not getting crawled for some reason and it’s likely due to content not being of value to Google or spiders not getting there. I also think older topics get ranked less than newer ones.

One thing I know works is to have well written topic titles, which in turn are your page title tags. I’ve edited many topic titles and have seen them rank better afterwards.

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...