mark007 Posted January 31, 2018 Posted January 31, 2018 37 minutes ago, jair101 said: I believe this issue is with the share links, not with the embeds: No ... an no. That's the problem in every post (Date of post with direct link): (for example here) https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/?do=findComment&comment=2728237
jair101 Posted January 31, 2018 Posted January 31, 2018 4 minutes ago, mark007 said: No ... an no. That's the problem in every post (Date of post with direct link): (for example here) https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/?do=findComment&comment=2728237 Click on the icon to share a post and see how the link looks like...
sudo Posted January 31, 2018 Posted January 31, 2018 The question would be are these links inside the sitemaps (I dont know tbh) Redirecting links like that are fine tbh and shouldnt be in the index although the link should have nofollow set for it I guess.
SebastienG Posted February 1, 2018 Posted February 1, 2018 (edited) Hello, i have the same problem and another problem. My sitemap bug when blog is deleted but blog_entry still here. The sitemap stop. It's possible to add a security to not add entry if entry_author_id=0 ? For the topic date, my script is updated, but the date does not appear on the sitemap Edited February 1, 2018 by Xavier Hallade
sadams101 Posted February 1, 2018 Posted February 1, 2018 The starter of the topic corrected me and said to only run his script once per minute, not once per second. SeNioR- 1
SeNioR- Posted February 2, 2018 Posted February 2, 2018 also check your robots.txt file. Good configuration looks like this: Quote # Sitemap... Sitemap: http://yoursite.com/sitemap.php User-agent: * # Disallow directory Disallow: /api/ Disallow: /cgi-bin/ Disallow: /datastore/ Disallow: /plugins/ Disallow: /system/ Disallow: /themes/ Disallow: /go/ #Disallow files Disallow: /403error.php Disallow: /404error.php Disallow: /500error.php Disallow: /Credits.txt Disallow: /error.php Disallow: /upgrading.html # Querystring Disallow: /?tab=* Disallow: /index.php?* Disallow: /*?app=* Disallow: /*sortby=* Disallow: /*/?do=download Disallow: /profil/*/?do=* Disallow: /profil/*/content/ Disallow: /*?do=add Disallow: /*?do=email Disallow: /*?do=getNewComment Disallow: /*?do=getLastComment Disallow: /*?do=getLastComment Disallow: /*?do=findComment* Disallow: /*?do=reportComment* # Allow specific parts Allow: /applications/core/interface/imageproxy/imageproxy.php?img=* Maxxius 1
SebastienG Posted February 2, 2018 Posted February 2, 2018 7 hours ago, SeNioR- said: also check your robots.txt file. Good configuration looks like this: We can add admin to. in my case, with more than 1,200 sitemap, I think it's more useful to frequently generate the list of latest topics and blog comments than to regenerate everything. If i run the script every minute, my last topic sitemap will take more than 20 hours to build. So I modify the script that it forces the generation of all the x launches of the sitemap of the last sitemap of blogs and topics Numbered 1
Management Matt Posted February 2, 2018 Management Posted February 2, 2018 Just so you know, we're watching this topic and looking at our own stats to build a better picture. The facts we know: 1) Almost every site I've got access to (via friends, etc) have seen a massive drop since June of indexed pages. This is not exclusive to Invision Community powered sites. I've seen the same with Wordpress. 2) Google slipped in an update in 2017 to target several things, one of these things is poor backlinks and other poor quality links. It looks like this means that user profiles that have no content have been dropped from the index along with links that 301. That is fine. You don't want Google storing the 301 link, as long as it stores the real link (and it does seem to). 3) A drop in what is indexed doesn't actually correlate to the health of the site. We've seen our index volume drop, but clicks, engagement and discovery slightly increase (probably due to better quality results?) As always, Google say nothing so we're left guessing. We will look at stopping user profiles from being submitted. For example, we see nearly 380k links as 'discovered' but Google has chosen to not index them. Looking through the list, it's all user profiles. This means: 1) Sitemaps are working fine. There's no massive problem with them that correlates with a drop in indexed pages 2) The cornerstones of good SEO are taken care of in the software 3) Google is being weird and mysterious as always. What can we do in the short term? 1) Stop sending profiles with no content to the sitemap. They are now ignored and Google appears to be dropping them from its indexes 2) Add in nofollow on links that 301 so Google doesn't bother 'discovering' them at all. opentype, Joel R, SeNioR- and 7 others 10
AlexWebsites Posted February 2, 2018 Posted February 2, 2018 Profiles that are not being indexed seem to be ones that have not posted anything so the pages have little to no content. At least that's what I'm seeing. I don't know if I would exclude all profiles. You could have content, status updates and such that make up an indexed page that could draw in traffic. I would add to the page title "xxxxx's profile page - site name" or something like that with the ability to enhance that. Most display names are short unlike topic names. I would add lastmod as mentioned and update frequency tags into the sitemap. I would see if a separate image sitemap could be generated as part of gallery app with google recommended tags. Also, some SEO things...I would give the option of changing the automatic meta description length in settings. Google seems to be allowing longer descriptions. Add the ability to automatically include tags as meta keywords even though google may not use keywords there are other traffic sources that do.
SebastienG Posted February 2, 2018 Posted February 2, 2018 I think an account with 0 post can be exclude from the sitemap. Other options : - allow to exclude topic with 0 response - add last-mod to all sitemap content ( try to run the google crawl on new content ) - allow to exclude / noindex particular topic / profile / page - generate a robots.txt with the good values - permit to refresh more frequently sitemap for the last added content I've using IPboard since 2004 if my old topics are not frequently generated to the sitemap it's not a problem, but for my last topics / blogs / profiles , i would like a refresh more often. I have more than 1000 sitemap files, with a 15 min refresh, my new topics are in the sitemap each 10days - generate error on the sitemap I have old entry, old blog, old content, .... my sitemap was not updated since 2 month because their 10 blogs entry witch are linked to a deleted blog. No error but the sitemap was not updated ($e->last_message empty but not $e) ADKGamers and Cyboman 2
AlexWebsites Posted February 2, 2018 Posted February 2, 2018 8 minutes ago, SebastienG said: I think an account with 0 post can be exclude from the sitemap. Other options : - allow to exclude topic with 0 response - add last-mod to all sitemap content ( try to run the google crawl on new content ) - allow to exclude / noindex particular topic / profile / page - generate a robots.txt with the good values - permit to refresh more frequently sitemap for the last added content I've using IPboard since 2004 if my old topics are not frequently generated to the sitemap it's not a problem, but for my last topics / blogs / profiles , i would like a refresh more often. I have more than 1000 sitemap files, with a 15 min refresh, my new topics are in the sitemap each 10days - generate error on the sitemap I have old entry, old blog, old content, .... my sitemap was not updated since 2 month because their 10 blogs entry witch are linked to a deleted blog. No error but the sitemap was not updated ($e->last_message empty but not $e) Good list but I am not for excluding topics with no responses. Most of the time the first post contains all the initial topic content and should index well as long as members aren't starting topics with a very low word count. ADKGamers 1
SebastienG Posted February 2, 2018 Posted February 2, 2018 It's my problem in particular forums If we can have an option Enable/Disable (enable topic with 0 post to be indexed by forum or in global ) i think everybody will be happy
Management Matt Posted February 2, 2018 Management Posted February 2, 2018 I think we need to be mindful the the sitemap is just one way that Google discovers and crawls links. What goes in the sitemap isn't a hard rule that Google must only check out those links, so there's little point in adding too many restrictions here and there because it'll be mostly pointless. You'll submit fewer links, but Google will still pull up the ones you didn't add. I did add a setting for profiles, because of the huge number of 'dead' profiles that stuff up the sitemap, which is just a waste. What may or may not be in the sitemap doesn't solve why Google is shedding indexed pages. That said, when using the new search console, the figures are totally different. We have 92k indexed pages We have about 400k pages that Google has either 'discovered' or 'crawled but not indexed' due to its own algorithms. These are 301 redirect links (this is OK, it has no reason to store these) and empty profiles which have almost zero content. But it's important to realise that Google is not punishing us, it is just working harder to index content that it thinks others will find useful, and "Johnny@11" who registered in 2011 and has never posted doesn't count any more. prupdated, crmarks, ADKGamers and 2 others 5
Nesa Posted February 2, 2018 Posted February 2, 2018 4 hours ago, Matt said: 1) Sitemaps are working fine. There's no massive problem with them that correlates with a drop in indexed pages Yes, they're doing great. Then how would you explain this? Google says: Google says that this Topic in my Forum has 25 posts written by 16 authors. But, reality is different: That Topic has 1.238 replies written by who knows how many authors. Who is guilty because Google does not see 98% of the posts written in this Topic? Who is responsible if the sitemap is working fine? Please, give me a reasonable explanation. If I understand your post well, IPS only wants to reduce the number of pages that are excluded, and not to increase the number of indexed pages.
Management Matt Posted February 2, 2018 Management Posted February 2, 2018 Again, the sitemap is not a YOU CAN ONLY LOOK AT THESE LINKS GOOGLE LOL. The sitemap just informs Google of "important" URLs on your site. It will use these as a base to spider out from. I have no idea why Google is not updating the meta data of your indexed URL. That's not down to the sitemap. That's down to Google not refreshing the data. Google will pull the replies meta data from the page itself. To save me bother, what is the URL to that topic? I'd like to review the meta tags in the json LD to make sure they're correct. SeNioR- and Rhett 2
Nesa Posted February 2, 2018 Posted February 2, 2018 1 minute ago, Matt said: To save me bother, what is the URL to that topic? I'd like to review the meta tags in the json LD to make sure they're correct. https://www.fiat-lancia.org.rs/forum/index.php?/topic/46923-zatamnjena-stakla/
AlexWebsites Posted February 2, 2018 Posted February 2, 2018 3 minutes ago, Nesa said: Yes, they're doing great. Then how would you explain this? Google says: Google says that this Topic in my Forum has 25 posts written by 16 authors. But, reality is different: That Topic has 1.238 replies written by who knows how many authors. Who is guilty because Google does not see 98% of the posts written in this Topic? Who is responsible if the sitemap is working fine? Please, give me a reasonable explanation. If I understand your post well, IPS only wants to reduce the number of pages that are excluded, and not to increase the number of indexed pages. Wouldn't Google only capture how many authors are part of that page /URL? If you want to test, change how many posts you show per page which will reduce or increase your topic page count. Maybe I'm wrong through. The question for topics with a lot of replies, how well are those additional pages indexed? Are the page titles, urls, etc., SEF and without duplicate meta info.
Management Matt Posted February 2, 2018 Management Posted February 2, 2018 Ok, right away I can see the LD is fine. "interactionStatistic": [ { "@type": "InteractionCounter", "interactionType": "http://schema.org/ViewAction", "userInteractionCount": 80927 }, { "@type": "InteractionCounter", "interactionType": "http://schema.org/CommentAction", "userInteractionCount": 1239 }, { "@type": "InteractionCounter", "interactionType": "http://schema.org/FollowAction", "userInteractionCount": 3 } ], Testing the link using Google's tool shows the meta data is being received perfectly. Invision Community is doing its job. Thomas P 1
Nesa Posted February 2, 2018 Posted February 2, 2018 1 minute ago, AlexWebsites said: Wouldn't Google only capture how many authors are part of that page /URL? If you want to test, change how many posts you show per page which will reduce or increase your topic page count. Maybe I'm wrong through. Yes, that sounds logical. Like you, I'm not sure... I've noticed that new posts, new topics do not appear on the Google Index for 10 days...and just those 10 days of delay were also mentioned by other people on this topic
Management Matt Posted February 2, 2018 Management Posted February 2, 2018 10 days might be fine depending on how often Google visits your site. Again, the frequency that Google visits your site has nothing to do with the sitemap. In 4.3, we have added the lastmod timestamp, and added a button to rebuild your index from scratch. Also, just double check your forum and topic permissions. Remember, if a guest cannot see the page, then Google cannot either. CheersnGears and ADKGamers 2
SebastienG Posted February 2, 2018 Posted February 2, 2018 14 minutes ago, Matt said: I think we need to be mindful the the sitemap is just one way that Google discovers and crawls links. What goes in the sitemap isn't a hard rule that Google must only check out those links, so there's little point in adding too many restrictions here and there because it'll be mostly pointless. You'll submit fewer links, but Google will still pull up the ones you didn't add. I did add a setting for profiles, because of the huge number of 'dead' profiles that stuff up the sitemap, which is just a waste. What may or may not be in the sitemap doesn't solve why Google is shedding indexed pages. That said, when using the new search console, the figures are totally different. We have 92k indexed pages We have about 400k pages that Google has either 'discovered' or 'crawled but not indexed' due to its own algorithms. These are 301 redirect links (this is OK, it has no reason to store these) and empty profiles which have almost zero content. But it's important to realise that Google is not punishing us, it is just working harder to index content that it thinks others will find useful, and "Johnny@11" who registered in 2011 and has never posted doesn't count any more. Of Course, I'm Ok with you, I have about 1.2M non indexed pages and 200K indexed But the sitemap is important. Google analyse it and can configure it's crawl with the submited URL If I have 200k of indexed pages Google crawl 'daily' pages and my indexed pages, with 30000 pages/day, Google need 1 week to crawl my indexed pages I think if last-mod is set and probably we can use the changefreq value to : New topic: changefreq daily or hourly Topic not update since 1 week : daily Topic not update since 1 month : weekly Topic not update since 1 year : yearly The but is to give at the different crawler the new value quickly and to not use crawl ressource for old ressources which are not updated
Management Matt Posted February 5, 2018 Management Posted February 5, 2018 Also, make sure if you have switched to HTTPS that you add your HTTPS link to Google's search console, or it won't pick up those hits and indexes. We've seen this being the reason that people have seen drop offs in multiple cases now. There isn't a drop off, it's just Google dropping http indexes and picking up https indexes. Ocean West 1
sadams101 Posted February 5, 2018 Posted February 5, 2018 @SeNioR- can you tell me where this robots.txt is from? Also, what is the current standard robots.txt?
Morgin Posted February 5, 2018 Posted February 5, 2018 1 minute ago, sadams101 said: @SeNioR- can you tell me where this robots.txt is from? Also, what is the current standard robots.txt? Unless it’s changed recently, I don’t think there is a robots.txt in the 4 series download.
SebastienG Posted February 6, 2018 Posted February 6, 2018 10 hours ago, sadams101 said: @SeNioR- can you tell me where this robots.txt is from? Also, what is the current standard robots.txt? You have to create it 15 hours ago, Matt said: Also, make sure if you have switched to HTTPS that you add your HTTPS link to Google's search console, or it won't pick up those hits and indexes. We've seen this being the reason that people have seen drop offs in multiple cases now. There isn't a drop off, it's just Google dropping http indexes and picking up https indexes. I have the http and https URL in Google Search and i have seen drop offs
Recommended Posts