Jump to content

Large community? You have a problems with sitemap!


Recommended Posts

12 hours ago, sadams101 said:

One last question...is putting this file in root ok, or should I put it in admin?

mycustomsitemapupdater.php

You can place it at any place, include not webserver root folder. Moreover, putting this file outside root folder is much better (in all cases like that tools). For working you just need to write correct path to the init.php. Then you can run it manually from cli or with cron or any other technics.

Link to comment
Share on other sites

This could be totally unrelated, but shortly after making these changes I got a notice from google about an increase in crawler errors:

Quote

Googlebot for smartphones identified a significant increase in the number of URLs on https://www.celiac.com/ that return a 404 (not found) error. If these pages exist on your desktop site, showing an error for mobile users can be a bad user experience. This misconfiguration can also prevent Google from showing the correct page in mobile search results. If these URLs don't exist, no action is necessary.

When looking at the errors I see they relate to the Report Comments Link, for example:

https://www.celiac.com/gluten-free/topic/77816-is-kimchi-gluten-less/?do=reportComment&comment=676584

and

https://www.celiac.com/gluten-free/topic/29519-chocoholics-there-is-hope/?do=reportComment&comment=317896

Here is the error:

Quote

Sorry, there is a problem

Admin

Error code: 2S136/4

 

Would your changes cause this?

Edited by sadams101
Link to comment
Share on other sites

5 hours ago, sadams101 said:

This could be totally unrelated, but shortly after making these changes I got a notice from google about an increase in crawler errors:

When looking at the errors I see they relate to the Report Comments Link, for example:

https://www.celiac.com/gluten-free/topic/77816-is-kimchi-gluten-less/?do=reportComment&comment=676584

and

https://www.celiac.com/gluten-free/topic/29519-chocoholics-there-is-hope/?do=reportComment&comment=317896

Here is the error:

 

Would your changes cause this?

Error 2S136/4 thrown by IPS only in one situation - when somebody try to use report system. Just disable this ability for guests and all be fine. Now Google make your attention because it started crawl your site better and found that problem (it was exist before that changes and this changes is not link with that) :thumbsup:

Your example links said about this too (?do=reportComment - guest shoudn't use report system).

Link to comment
Share on other sites

This was a bug which I have fixed for IPS 4.3 ( The report and reportComment Form page returned response code 404 instead of 403 )

6 hours ago, sadams101 said:

This could be totally unrelated, but shortly after making these changes I got a notice from google about an increase in crawler errors:

When looking at the errors I see they relate to the Report Comments Link, for example:

https://www.celiac.com/gluten-free/topic/77816-is-kimchi-gluten-less/?do=reportComment&comment=676584

and

https://www.celiac.com/gluten-free/topic/29519-chocoholics-there-is-hope/?do=reportComment&comment=317896

Here is the error:

 

Would your changes cause this?

 

Link to comment
Share on other sites

On 1/15/2018 at 6:50 PM, sadams101 said:

k. In my case it is very clear that something very bad happened to IPB's site map around the end of June 2017. See below if you doubt this:

google_index.jpg

The same situation in my Forum too:

ips001.thumb.jpg.cb34404f85ca4706b389b8a03926bce8.jpg

I've seen this big drop earlier, but I did not know what was the reason.
This problem has to be resolved quickly, I think it can not wait for a few more months until version 4.3.0 comes out.
We all know that if the Forum is not indexed properly, then there is no visit ---> no activity --> Forum is gone.

@Matt @Daniel F @Lindy

Link to comment
Share on other sites

@opentype

Okay, why is this then written :

On 1/16/2018 at 11:56 AM, Matt said:

We've added the timestamp into the sitemap and we're looking to add a tool to quickly rebuild the sitemap on demand.

...and this:
https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/?do=findComment&comment=2725509

Edited by Nesa
Link to comment
Share on other sites

5 minutes ago, Nesa said:

@opentype

Okay, why is this then written :

Because there are indeed improvements to the sitemap to be made. But not having them so far is NOT the reason for pages dropped from the index. That was just a false assumption of a single user who joined the discussion. Correlation is not causation. 

Edited by opentype
Link to comment
Share on other sites

On 12/25/2017 at 4:51 AM, Upgradeovec said:

Little improvement (5214 elements will update more than 3 days). So you can speed up more this. Just get time needed for one


time php mycustomsitemapupdater.php // return something like 4 sec

So with that you can create a cycle inside for X times to run $generator->buildNextSitemap(); For example in my case - 10 times in one minute. So for 5214 elements i will need 521 minuts for all update (~= 8 hours - not bad). 

So I am still 4 days behind on my last indexed item in my sitemap. When I run the command above I get:

Quote

real    0m0.039s
user    0m0.019s
sys     0m0.020s

Do I need to run my cron on mycustomsitemapupdater.php more often than 1 minute?

Also, even though this has been running for ~1 week, I've not seen any improvement in the google index graph shown above. Any idea how long improvement could take (if the issue is related to the sitemap)?

Link to comment
Share on other sites

23 hours ago, opentype said:

That was just a false assumption of a single user who joined the discussion. Correlation is not causation. 

Upgradeovec, ProSkill and me. It's not a single user.
It is common for us to have little larger forums, ie Boards with over 700k posts.


Sorry, but it's hard for me to accept your opinion because:
- you never wrote that at least one line of code (written on this Topic by topic starter) is wrong.
- the official IPS answer is that it will fix things with the sitemap.

According to you, It turns out that everything written on this Topic is rubbish.

I only urged this to be settled faster, and not wait for version 4.3 for a few more months. You think this will not affect the number of pages indexed by Google, I think it will.
I am a client of this company for 10 years, and it seems to me that I have the right to ask, at least once, to resolve something faster.

Link to comment
Share on other sites

15 hours ago, sadams101 said:

Do I need to run my cron on mycustomsitemapupdater.php more often than 1 minute?

You can't do this with cron. I mean got this time for make a while inside php script. I didn't share this while for prevent copy-paste my version without understanding. Because too much while repeats may work more than 60 seconds and that can overflow your memory. You get 0.039s per one working. So you can create a 40-60 repeats this working per one minute (started with cron). So just write something for ($i=1; $i<60; $i++) { $generator->buildNextSitemap(); } inside example script instead line $generator->buildNextSitemap(); and that make a 60 runs per minute.

Link to comment
Share on other sites

1 hour ago, Nesa said:

You think this will not affect the number of pages indexed by Google …

Don’t put words in my mouth. Read what I actually say. 
Of course a sitemap created faster will lead to faster indexing. That’s out of question. 
But nothing discussed in this topic will lead to “Google dropping already indexed pages” as shown in Search console graphs several times. That is my point. It’s not that hard to understand. 

Link to comment
Share on other sites

1 hour ago, opentype said:

But nothing discussed in this topic will lead to “Google dropping already indexed pages” as shown in Search console graphs several times. That is my point.

But, already indexed pages are not static content ...new content is added on topics, and this new content is missing on Google because Google has not indexed it because of bad sitemap.

Example from my Board:
Topic started March 3, 2011, 1238 replies until now:
https://www.fiat-lancia.org.rs/forum/index.php?/topic/46923-zatamnjena-stakla/

If you type in Google any post written in this Topic on January 16th (or earlier), you will see that Google has indexed it, for example:

Screenshot_21.thumb.jpg.b70553229e14a57472e852edf08f1e0d.jpg

The same sentence is indexed, here it is on Google:

Screenshot_28.thumb.jpg.0011e9f889de73fab5e495a0e8c608e1.jpg

 

But, anything written on January 17th (or later) on this Topic, for example:

Screenshot_29.thumb.jpg.0e3ab26d298e083db3b11cb117c9d834.jpg

...still does not exist in the Google index:

Screenshot_34.thumb.jpg.6341a1859a5a0f78d6d7b72ccb176ab6.jpg

What is your explanation?
Why I used to have the new content from Board in the same day on Google, and now here is a minimum of 10 days of delay to get new content from the Board on Google?
Now multiply this impact with over 32.000 topics that I have on the Board, and...

Edited by Nesa
Link to comment
Share on other sites

opentype, I fully understand your position, and it indeed seems nonsensical to think google would drop already indexed pages for lack of a sitemap. Thing is.... the Google 'Fred' algorithm update actually arbitrarily culled thousands if not millions of pages from google's index. not as any penalty for black hat, but as a measure to curb their own storage needs, and the only thing getting them back in google's index is the sitemap. It's not the lack of a sitemap entry itself causing them to be dropped, it's Fred that dropped them, and the sitemap as the main route of recovery.

Edited by Marcher Technologies
Link to comment
Share on other sites

6 hours ago, Upgradeovec said:

You can't do this with cron. I mean got this time for make a while inside php script. I didn't share this while for prevent copy-paste my version without understanding. Because too much while repeats may work more than 60 seconds and that can overflow your memory. You get 0.039s per one working. So you can create a 40-60 repeats this working per one minute (started with cron). So just write something for ($i=1; $i<60; $i++) { $generator->buildNextSitemap(); } inside example script instead line $generator->buildNextSitemap(); and that make a 60 runs per minute.

I wish I could understand what you mean here. Any chance you can just paste a copy of an example a mycustomsitemapupdater.php with this code in it, and then I'll set it up to run every second?

Link to comment
Share on other sites

Google makes it clear that sitemaps will always be of some benefit, especially to larger sites:

https://support.google.com/webmasters/answer/156184?hl=en

Quote

Do I need a sitemap?
If your site’s pages are properly linked, our web crawlers can usually discover most of your site. Even so, a sitemap can improve the crawling of your site, particularly if your site meets one of the following criteria:

  • Your site is really large. As a result, it’s more likely Google web crawlers might overlook crawling some of your new or recently updated pages.

It's not a guarantee of course - though it does says it will always be of some benefit..

So IPS may as well optimise it to the best they can by ensuring it's as complete as possible by the time Googlebot reads it. :)

Edited by Optic14
Link to comment
Share on other sites

On 1/16/2018 at 5:56 AM, Matt said:

We've added the timestamp into the sitemap and we're looking to add a tool to quickly rebuild the sitemap on demand.

@Matt can IPS release the code that you used to add the timestamp, specifically into the topic sitemaps so we don't need to wait for an update?

Also, if you are running gallery, there is no image sitemap and only image page urls with no time stamp either. See: https://support.google.com/webmasters/answer/178636?hl=en

Edited by AlexWebsites
Link to comment
Share on other sites

Apparently I am also a victim of whatever issue is going on here. :(

I will have to read through this whole thread and see if there is any solution buried herein.

Thank you

index loss.JPG

On 1/25/2018 at 5:49 AM, Nesa said:

The same situation in my Forum too:

ips001.thumb.jpg.cb34404f85ca4706b389b8a03926bce8.jpg

I've seen this big drop earlier, but I did not know what was the reason.
This problem has to be resolved quickly, I think it can not wait for a few more months until version 4.3.0 comes out.
We all know that if the Forum is not indexed properly, then there is no visit ---> no activity --> Forum is gone.

@Matt @Daniel F @Lindy

it's killing me. I was wondering why my registrations have tanked. I was digging in and figured out that. index pages have gone from 75K+ to ~1K

REALLY BAD NEWS!!

Edited by superj707
Link to comment
Share on other sites

On 1/27/2018 at 1:02 PM, AlexWebsites said:

@Matt can IPS release the code that you used to add the timestamp, specifically into the topic sitemaps so we don't need to wait for an update?

Also, if you are running gallery, there is no image sitemap and only image page urls with no time stamp either. See: https://support.google.com/webmasters/answer/178636?hl=en

 

Yes would like to see this update and then able to rebuild the sitemap on demand soon as well.  :)

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...