Jump to content

Large community? You have a problems with sitemap!


Recommended Posts

The more pages you send to Google to crawl, the more pages Google has to split its resources over. Google will have a finite "crawl budget" for your site - how much resources it will spend crawling your site. If you send Google a ton of links to crawl that are low quality, you ultimately take resources away from Google crawling your higher quality content. You will find that as usual, it's quality over quantity.

You are incorrect about profiles not being included in the sitemap (there is an option to toggle it off in the AdminCP - make sure it's enabled).

Example: https://invisioncommunity.com/sitemap.php?file=sitemap_profiles_1

Link to comment
Share on other sites

I'm curious to see the follow up to these experiences after 3 months or 6 months.  

I agree that you're going to see an initial bump because you're essentially forcing Google's bots to spider every content item.  And in the beginning, it will artificially boost your rankings.  But I wonder in the future if Google won't simply adjust it's algorithm to delist the content again? 

Link to comment
Share on other sites

You guys do what you like...there is no sitemap for posts--there should be. If all sitemaps are set up correctly, Google only crawls everything in the maps once, then only crawls the new content when it sees the lastmod date change. There is no finite crawl budget for my site, this is nonsense.

PS - If there were such a think as a crawl budget, certainly not putting a lastmod date in the sitemap would cause google's spider to waste a ton of time trying to find new content...IPB currently has this issue if you're not running 4.4.

Link to comment
Share on other sites

13 minutes ago, sadams101 said:

You guys do what you like...there is no sitemap for posts--there should be. If all sitemaps are set up correctly, Google only crawls everything in the maps once, then only crawls the new content when it sees the lastmod date change. There is no finite crawl budget for my site, this is nonsense.

PS - If there were such a think as a crawl budget, certainly not putting a lastmod date in the sitemap would cause google's spider to waste a ton of time trying to find new content...IPB currently has this issue if you're not running 4.4.

I agree, Posts should be part of a forum sitemap. Didn't know they weren't.

Link to comment
Share on other sites

31 minutes ago, sadams101 said:

You guys do what you like...there is no sitemap for posts--there should be. If all sitemaps are set up correctly, Google only crawls everything in the maps once, then only crawls the new content when it sees the lastmod date change. There is no finite crawl budget for my site, this is nonsense.

PS - If there were such a think as a crawl budget, certainly not putting a lastmod date in the sitemap would cause google's spider to waste a ton of time trying to find new content...IPB currently has this issue if you're not running 4.4.

https://searchengineland.com/google-explains-crawl-budget-means-webmasters-267597

https://webmasters.googleblog.com/2017/01/what-crawl-budget-means-for-googlebot.html

I'm afraid, it's not "nonsense". Google has a limited number of resources they can dedicate to each site, and they do just that.

20 minutes ago, kihon said:

I agree, Posts should be part of a forum sitemap. Didn't know they weren't.

Posts do not represent a separate page and in fact the canonical URL that is embedded into the page points back to the topic (which is included in the sitemap).

Link to comment
Share on other sites

I feel some are overthinking this, thinking that if it's not in your sitemap then it won't get indexed, that really isn't the way Google and other search engines work, a sitemap is basically to kick start indexing. The goal for any site is to get the content indexed, a sitemap kick starts this and provides info for search engines to get started indexing, as they do, they will naturally index the whole site based on links, etc on your site as normal. 

I wouldn't think of a sitemap as a must have for every single item, think of it more as a way to push start the indexing process and let the search engines do their thing by indexing the complete site as they are crawling. If a topic is being crawled, search engines will certainly pick up the replies to this topic and index them too.  🙂

 

 

Link to comment
Share on other sites

So the fact that you know of a crawl budget, yet haven't back-ported the lastmod date in your sitemap then is malpractice. 

And what is really meant here is just that, if your sitemap isn't efficiently guiding google to the latest content, then don't expect your latest content to get indexed in a timely manner. It may get discovered weeks later...like if you have no lastmod date in your sitemap.

Google loves detail when it comes to efficiently guiding its bot to your newest content. If you do that, you'll never have to worry about any crawl budget--google will find the newest content each time it crawls with no issues.

Link to comment
Share on other sites

I've added my RSS feeds, thank you because I did not know we could do this in Google sitemaps...I recommend everyone do this.

My Crawled - Currently Not Index dropped another 17% on April 30th, and the number of search terms ranking doubled in mobile in the last week:

image.thumb.png.4ca96b268463aa31d7cd58a76529fd75.png

Edited by sadams101
Link to comment
Share on other sites

4 hours ago, sadams101 said:

I've added my RSS feeds, thank you because I did not know we could do this in Google sitemaps...I recommend everyone do this.

My Crawled - Currently Not Index dropped another 17% on April 30th, and the number of search terms ranking doubled in mobile in the last week:

image.thumb.png.4ca96b268463aa31d7cd58a76529fd75.png

That's good news, which sitemap...your posts or profiles sitemap? Most important, have you seen an uptick in organic traffic from this?

Link to comment
Share on other sites

So the profile's do exist in the default sitemap, and there are various settings for the profiles in the system, so doing another sitemap for them was unnecessary (I did add the Profile Meta Tags plugin, because profiles did not include that data by default). Given how important EAT is in google's ranking now, the profiles in IPS seem to have, at least until recently, been sorely neglected...especially for sites like mine that use Pages and have thousands of articles by authors with high EAT authority. 

I am currently using the mentioned custom posts sitemap:
https://www.celiac.com/sitemap_posts.php

and custom extra pages sitemap, so that posts on pages like this one, #11 in this thread, will be in my sitemap (they are not currently for an unknown reason):
https://www.celiac.com/sitemap_pages.php

and I will soon be launching a new article comments custom sitemap, that will basically do the same as the posts sitemap, but for comments on my articles.

So far the improvement I am seeing has been across the board, so an increase in natural search traffic, total number of keywords for which my site ranks, etc. 

Of course this could all be just coincidence...has anyone else here who has complained about having a high rate of "Crawled but not in google's index" seen any improvement like I've demonstrated? If so, please share it here.

Link to comment
Share on other sites

Why would you guys want to profiles being indexed, since google will treat it mostly as thin content ? My forum have 240.000 members and 180.000 topics, now imagine what the crawler most think of this, not even mentioning the problem with crawl budget (ps: 180k topics with 1,4 million posts).

About the algorithm, I don't think google will check EAT authority using your own site, if this was the case people would manipulate it very easily.

Dont you think google will get the author name from the topic itself and compare using other factors ? Like where the name of the author is mentioned OUTSIDE the site ? Social media ? etc ?

If content of the author is the case, google will follow the forum structure and find it. But if google find the same content in the author page and in the forum, the chances of duplicated content will increase, right ? (not a rethorical question).

Edited by FabioPaz
Link to comment
Share on other sites

37 minutes ago, FabioPaz said:

Why would you guys want to profiles being indexed, since google will treat it mostly as thin content ?

The most profiles in most communities are indeed almost empty and do not contain any valuable content. But if the "About" field is filled with lot of content then this is something that might be valuable and unique. Therefore IPS does not include empty profiles in sitemap as far as I know, but does include filled profiles.

Link to comment
Share on other sites

21 minutes ago, Sonya* said:

 Therefore IPS does not include empty profiles in sitemap as far as I know, but does include filled profiles.

That's really good to know.

The only question is how IPB consider a profile as empty or filled.

Most people dont take the time to fill out the profile completly with rich content and just fill out basic stuff, this could make IPB to include the profile in the sitemap, but its still thin content (in a large community this is not good).

Off-topic:

Anyone know a way to remove "dead content" ? Like topics that didnt received any traffic in the last "X" months (this would mostly indicate zero SERP traffic/no valuable information/not backlink worthy).

Edited by FabioPaz
Link to comment
Share on other sites

1 hour ago, FabioPaz said:

The only question is how IPB consider a profile as empty or filled.

It's base on a posts count:

		if ( \IPS\Settings::i()->sitemap_profiles_content > 0 )
		{
			$where = array( array( 'member_posts >= ?', \IPS\Settings::i()->sitemap_profiles_content ) );
		}

 

Link to comment
Share on other sites

All my authors all have detailed "About Me" info, and, since in IPB the About Me is more or less a hidden field (I mean you really have to work to find it, especially in mobile), I've added an "About Me" app that shows this content with the article.  For me there is unique content there for sure, but, more importantly many of my authors are well known enough that people search their names in google.

By assuming that google will--for ANY of your site's content--simply "follow the forum structure and find it" is exactly where you may run into a problem with a google crawl budget. The bot may waste lots of bandwidth on your site using this approach.

My approach is to map everything so google only needs to hit the new content where the lastmod date changed, thus using less overall bandwidth, and more efficiently picking up ALL new content.

In any case, I'll keep reporting how this goes, but so far, so good.

 

Edited by sadams101
Link to comment
Share on other sites

  • 2 weeks later...

doesn't google not treat all your comment links as redirected links?

eg the below submitted link

 

https://www.celiac.com/articles.html/celiac-disease-amp-related-diseases-and-disorders/fibromyalgia-and-celiac-disease
/fibromyalgia-and-celiac-disease-by-ronald-hoggan-r117/?do=findComment&comment=1&d=2&tab=comments

actually redirects to a different url

on my site any url with  'findcomment'  is not indexed by google and is classed by a google as a 'Page with redirect' and  given the 'Status: Excluded'

 
Edited by sound
Link to comment
Share on other sites

  • 4 months later...

I wanted to follow up with something interesting. As you may recall I had some custom sitemaps developed which include a map to every post made, and a map for all comments.

After submitting the maps I saw the "Discovered - currently not indexed" shoot up to ~923K, and it just stayed there for months. Most of the URL's there were the individual post links.

Last month I saw a big change--a fast drop in Discovered - currently not indexed, as you can see below. It dropped to 320K. As this started happening I also saw my organic traffic, and the number of key words indexed increase.

image.thumb.png.9575d0479577ec1a7b5c481427601681.png

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...