bfarber Posted May 1, 2019 Posted May 1, 2019 The more pages you send to Google to crawl, the more pages Google has to split its resources over. Google will have a finite "crawl budget" for your site - how much resources it will spend crawling your site. If you send Google a ton of links to crawl that are low quality, you ultimately take resources away from Google crawling your higher quality content. You will find that as usual, it's quality over quantity. You are incorrect about profiles not being included in the sitemap (there is an option to toggle it off in the AdminCP - make sure it's enabled). Example: https://invisioncommunity.com/sitemap.php?file=sitemap_profiles_1 Markus Jung, opentype, AlexWebsites and 1 other 4
Joel R Posted May 2, 2019 Posted May 2, 2019 I'm curious to see the follow up to these experiences after 3 months or 6 months. I agree that you're going to see an initial bump because you're essentially forcing Google's bots to spider every content item. And in the beginning, it will artificially boost your rankings. But I wonder in the future if Google won't simply adjust it's algorithm to delist the content again?
sadams101 Posted May 2, 2019 Posted May 2, 2019 You guys do what you like...there is no sitemap for posts--there should be. If all sitemaps are set up correctly, Google only crawls everything in the maps once, then only crawls the new content when it sees the lastmod date change. There is no finite crawl budget for my site, this is nonsense. PS - If there were such a think as a crawl budget, certainly not putting a lastmod date in the sitemap would cause google's spider to waste a ton of time trying to find new content...IPB currently has this issue if you're not running 4.4.
kihon Posted May 2, 2019 Posted May 2, 2019 13 minutes ago, sadams101 said: You guys do what you like...there is no sitemap for posts--there should be. If all sitemaps are set up correctly, Google only crawls everything in the maps once, then only crawls the new content when it sees the lastmod date change. There is no finite crawl budget for my site, this is nonsense. PS - If there were such a think as a crawl budget, certainly not putting a lastmod date in the sitemap would cause google's spider to waste a ton of time trying to find new content...IPB currently has this issue if you're not running 4.4. I agree, Posts should be part of a forum sitemap. Didn't know they weren't.
bfarber Posted May 2, 2019 Posted May 2, 2019 31 minutes ago, sadams101 said: You guys do what you like...there is no sitemap for posts--there should be. If all sitemaps are set up correctly, Google only crawls everything in the maps once, then only crawls the new content when it sees the lastmod date change. There is no finite crawl budget for my site, this is nonsense. PS - If there were such a think as a crawl budget, certainly not putting a lastmod date in the sitemap would cause google's spider to waste a ton of time trying to find new content...IPB currently has this issue if you're not running 4.4. https://searchengineland.com/google-explains-crawl-budget-means-webmasters-267597 https://webmasters.googleblog.com/2017/01/what-crawl-budget-means-for-googlebot.html I'm afraid, it's not "nonsense". Google has a limited number of resources they can dedicate to each site, and they do just that. 20 minutes ago, kihon said: I agree, Posts should be part of a forum sitemap. Didn't know they weren't. Posts do not represent a separate page and in fact the canonical URL that is embedded into the page points back to the topic (which is included in the sitemap). Thomas P and Rhett 2
Rhett Posted May 2, 2019 Posted May 2, 2019 I feel some are overthinking this, thinking that if it's not in your sitemap then it won't get indexed, that really isn't the way Google and other search engines work, a sitemap is basically to kick start indexing. The goal for any site is to get the content indexed, a sitemap kick starts this and provides info for search engines to get started indexing, as they do, they will naturally index the whole site based on links, etc on your site as normal. I wouldn't think of a sitemap as a must have for every single item, think of it more as a way to push start the indexing process and let the search engines do their thing by indexing the complete site as they are crawling. If a topic is being crawled, search engines will certainly pick up the replies to this topic and index them too. 🙂 Thomas P 1
sadams101 Posted May 2, 2019 Posted May 2, 2019 So the fact that you know of a crawl budget, yet haven't back-ported the lastmod date in your sitemap then is malpractice. And what is really meant here is just that, if your sitemap isn't efficiently guiding google to the latest content, then don't expect your latest content to get indexed in a timely manner. It may get discovered weeks later...like if you have no lastmod date in your sitemap. Google loves detail when it comes to efficiently guiding its bot to your newest content. If you do that, you'll never have to worry about any crawl budget--google will find the newest content each time it crawls with no issues.
Sonya* Posted May 2, 2019 Posted May 2, 2019 @sadams101, have you tried to use RSS to submit newest content to Google? You can create RSS in ACP and then submit the feeds along with your sitemap.
sadams101 Posted May 2, 2019 Posted May 2, 2019 Thanks, but I did not see a place in GSC to submit RSS, only sitemaps...is there a separate area for this? I do have RSS feeds.
Sonya* Posted May 3, 2019 Posted May 3, 2019 (edited) You submit RSS like sitemaps, same area, they are just listed along with your sitemaps in the Google Webmaster. https://webmasters.googleblog.com/2014/10/best-practices-for-xml-sitemaps-rssatom.html Edited May 3, 2019 by Sonya* AlexWebsites and Maxxius 2
sadams101 Posted May 6, 2019 Posted May 6, 2019 (edited) I've added my RSS feeds, thank you because I did not know we could do this in Google sitemaps...I recommend everyone do this. My Crawled - Currently Not Index dropped another 17% on April 30th, and the number of search terms ranking doubled in mobile in the last week: Edited May 6, 2019 by sadams101
AlexWebsites Posted May 7, 2019 Posted May 7, 2019 4 hours ago, sadams101 said: I've added my RSS feeds, thank you because I did not know we could do this in Google sitemaps...I recommend everyone do this. My Crawled - Currently Not Index dropped another 17% on April 30th, and the number of search terms ranking doubled in mobile in the last week: That's good news, which sitemap...your posts or profiles sitemap? Most important, have you seen an uptick in organic traffic from this?
sadams101 Posted May 7, 2019 Posted May 7, 2019 So the profile's do exist in the default sitemap, and there are various settings for the profiles in the system, so doing another sitemap for them was unnecessary (I did add the Profile Meta Tags plugin, because profiles did not include that data by default). Given how important EAT is in google's ranking now, the profiles in IPS seem to have, at least until recently, been sorely neglected...especially for sites like mine that use Pages and have thousands of articles by authors with high EAT authority. I am currently using the mentioned custom posts sitemap:https://www.celiac.com/sitemap_posts.php and custom extra pages sitemap, so that posts on pages like this one, #11 in this thread, will be in my sitemap (they are not currently for an unknown reason):https://www.celiac.com/sitemap_pages.php and I will soon be launching a new article comments custom sitemap, that will basically do the same as the posts sitemap, but for comments on my articles. So far the improvement I am seeing has been across the board, so an increase in natural search traffic, total number of keywords for which my site ranks, etc. Of course this could all be just coincidence...has anyone else here who has complained about having a high rate of "Crawled but not in google's index" seen any improvement like I've demonstrated? If so, please share it here. DawPi 1
FabioPaz Posted May 8, 2019 Posted May 8, 2019 (edited) Why would you guys want to profiles being indexed, since google will treat it mostly as thin content ? My forum have 240.000 members and 180.000 topics, now imagine what the crawler most think of this, not even mentioning the problem with crawl budget (ps: 180k topics with 1,4 million posts). About the algorithm, I don't think google will check EAT authority using your own site, if this was the case people would manipulate it very easily. Dont you think google will get the author name from the topic itself and compare using other factors ? Like where the name of the author is mentioned OUTSIDE the site ? Social media ? etc ? If content of the author is the case, google will follow the forum structure and find it. But if google find the same content in the author page and in the forum, the chances of duplicated content will increase, right ? (not a rethorical question). Edited May 8, 2019 by FabioPaz Thomas P 1
Sonya* Posted May 8, 2019 Posted May 8, 2019 37 minutes ago, FabioPaz said: Why would you guys want to profiles being indexed, since google will treat it mostly as thin content ? The most profiles in most communities are indeed almost empty and do not contain any valuable content. But if the "About" field is filled with lot of content then this is something that might be valuable and unique. Therefore IPS does not include empty profiles in sitemap as far as I know, but does include filled profiles.
FabioPaz Posted May 8, 2019 Posted May 8, 2019 (edited) 21 minutes ago, Sonya* said: Therefore IPS does not include empty profiles in sitemap as far as I know, but does include filled profiles. That's really good to know. The only question is how IPB consider a profile as empty or filled. Most people dont take the time to fill out the profile completly with rich content and just fill out basic stuff, this could make IPB to include the profile in the sitemap, but its still thin content (in a large community this is not good). Off-topic: Anyone know a way to remove "dead content" ? Like topics that didnt received any traffic in the last "X" months (this would mostly indicate zero SERP traffic/no valuable information/not backlink worthy). Edited May 8, 2019 by FabioPaz
DawPi Posted May 8, 2019 Posted May 8, 2019 1 hour ago, FabioPaz said: The only question is how IPB consider a profile as empty or filled. It's base on a posts count: if ( \IPS\Settings::i()->sitemap_profiles_content > 0 ) { $where = array( array( 'member_posts >= ?', \IPS\Settings::i()->sitemap_profiles_content ) ); } FabioPaz 1
sadams101 Posted May 8, 2019 Posted May 8, 2019 (edited) All my authors all have detailed "About Me" info, and, since in IPB the About Me is more or less a hidden field (I mean you really have to work to find it, especially in mobile), I've added an "About Me" app that shows this content with the article. For me there is unique content there for sure, but, more importantly many of my authors are well known enough that people search their names in google. By assuming that google will--for ANY of your site's content--simply "follow the forum structure and find it" is exactly where you may run into a problem with a google crawl budget. The bot may waste lots of bandwidth on your site using this approach. My approach is to map everything so google only needs to hit the new content where the lastmod date changed, thus using less overall bandwidth, and more efficiently picking up ALL new content. In any case, I'll keep reporting how this goes, but so far, so good. Edited May 8, 2019 by sadams101
sadams101 Posted May 22, 2019 Posted May 22, 2019 I've also added @DawPi's new article comment sitemap: https://www.celiac.com/sitemap_comments.php Unless I've missed something I believe I now have ALL content in my site mapped in agonizing detail! 😅 Maxxius and DawPi 2
sound Posted May 22, 2019 Posted May 22, 2019 (edited) doesn't google not treat all your comment links as redirected links? eg the below submitted link https://www.celiac.com/articles.html/celiac-disease-amp-related-diseases-and-disorders/fibromyalgia-and-celiac-disease /fibromyalgia-and-celiac-disease-by-ronald-hoggan-r117/?do=findComment&comment=1&d=2&tab=comments actually redirects to a different url on my site any url with 'findcomment' is not indexed by google and is classed by a google as a 'Page with redirect' and given the 'Status: Excluded' Edited May 22, 2019 by sound Sonya* 1
sadams101 Posted May 22, 2019 Posted May 22, 2019 Thanks for noticing this, we'll look into it and get this updated.
sadams101 Posted May 29, 2019 Posted May 29, 2019 We've fixed this issue, which also existed in the posts sitemap...now all is fine, thank you! DawPi 1
sadams101 Posted October 3, 2019 Posted October 3, 2019 I wanted to follow up with something interesting. As you may recall I had some custom sitemaps developed which include a map to every post made, and a map for all comments. After submitting the maps I saw the "Discovered - currently not indexed" shoot up to ~923K, and it just stayed there for months. Most of the URL's there were the individual post links. Last month I saw a big change--a fast drop in Discovered - currently not indexed, as you can see below. It dropped to 320K. As this started happening I also saw my organic traffic, and the number of key words indexed increase. AlexWebsites 1
AlexWebsites Posted October 3, 2019 Posted October 3, 2019 Can you post a screen shot of your indexed urls? They must have shot up during the same time your discovered urls dropped. How much did your organic traffic increase?
Recommended Posts