Sonya* Posted June 13, 2019 Posted June 13, 2019 There are empty runs for the sitemap that slow down sitemap generation delaying the generation of sitemaps sometimes by days. It deals with sitemap settings and runs for the sitemaps that are not included in the sitemaps. This is how it is designed: If you do not include e. g. profiles in the sitemap (ACP settings) then those sitemaps still require one run for each sitemap that WOULD be generated if enabled. It means if you have over 200 000 users and exclude profiles from the sitemap you will not fasten its generation. There are still 400 runs you need to generate database entries: sitemap_profiles_1 sitemap_profiles_2 sitemap_profiles_X till all sitemap entries for profiles are created. Yes, they have not any data in it, BUT they all need ONE run of the task sitemapgenerator. This task runs every 15 minutes. Generation of 400 sitemaps that are not really needed takes in the example over 100 hours or more than 4 days. 4 days for nothing. A small walk-around is not only to exclude but also set Number of items to include to 0 (works for profiles, but not for gallery images). But caution! If you set number of items to exclude to 0 for Pages or Databases then you sitemap stops to update entirely (ticket #1037936). And I mean not only sitemap for Pages or Databases, all other sitemaps will not be updated any more! Empty not needed sitemaps will not generated once. They are generated again and again. Means that "4 days for nothing action" is repeated over and over. The same for all other applications and items. Excluding large data from sitemap does not mean the sitemap will be generated faster. This is very ineffective. I suggest NOT to include excluded sitemaps in the task so that only sitemaps that are really needed and contain data for output have the highest priority.
bfarber Posted June 13, 2019 Posted June 13, 2019 I've logged a bug report to have this looked into. Thanks for raising the concern!
Sonya* Posted June 15, 2019 Author Posted June 15, 2019 Thank you! I have investigated in my database further. I have total 938 sitemaps in the table core_sitemap but 241 of them are empty (due to sitemap disabled in ACP). But there is still a run required for each of these 241 sitemaps to "generate" an empty entry. Again and again. I need nearly 9,7 days for all sitemaps to be updated. And I could save 2,5 days if there were not empty tasks runs for the sitemaps I have disabled.
bfarber Posted June 18, 2019 Posted June 18, 2019 I've taken a closer look at this and am a little confused. You said a work around for profiles was to set "number to include to 0" in addition to excluding profiles from sitemaps - but that is the only way to exclude profiles from sitemaps. Are you referring to the following when you indicate you've excluded profiles from sitemaps? If so, then that checkbox only results in the priority (for profiles) being excluded from the sitemap, not profiles as a whole. It appears that if you set number of profiles to include to 0, this works correctly. I'm looking in to your other reported issues still, but wanted to get some clarification on this bit too.
Sonya* Posted June 18, 2019 Author Posted June 18, 2019 Ehm, yes. Probably I do not understand what this should do. Profiles If I set number of profiles to Unlimited and do not select "Do not include" - then all profile sitemaps are generated and filled. If I set number of profiles to Unlimited and do select "Do not include" - then I have "empty runs" for all profile sitemaps. It means each one requires one task run but the column data in database is set to NULL. All these empty entries are generated and replaced again and again in database core_sitemap. If I set number of profiles to 0 - then I do not have any sitemap profiles. There are no entries for them in database and no empty runs. What was the purpose of the second setting? I thought if I tick "Do not include", then no sitemap should be generated. Gallery images If I set number of images to Unlimited and do not select "Do not include" - then all image sitemaps are generated and filled. If I set number of images to Unlimited and do select "Do not include" - then I have "empty runs" for all image sitemaps. It means each one requires one task run but the column data in database is set to NULL. All these empty entries are generated and replaced again and again in database core_sitemap. If I set number of images to 0 - then I still have "empty runs" for all image sitemaps with empty data column. It seems that I cannot suppress generation of image sitemap (database entries) entirely. Pages If I set number of pages to Unlimited and do not select "Do not include" - then all pages sitemaps are generated and filled. If I set number of pages to Unlimited and do select "Do not include" - then I have "empty runs" for all pages sitemaps. It means each one requires one task run but the column data in database is set to NULL. All these empty entries are generated and replaced again and again in database core_sitemap. If I set number of pages to 0 - then whole sitemap stops to update. The reason is that no database entry for pages sitemap is generated in this case. The task tries to generate it but fails silently (NULL is returned instead of database entry). And it tries it every 15 minute. Again and again. There is indefinite loop preventing any other sitemap to be ever updated. The questions are: What "Do not include" should do? I thought it should not include sitemap. How to stop "empty sitemaps" with data=NULL to be generated again and again and consume tasks runs generating nothing?
bfarber Posted June 19, 2019 Posted June 19, 2019 The setting means "do not include the priority element in the sitemap". Random example: https://invisioncommunity.com/sitemap.php?file=22_sitemap_database_categories Thus, what you are describing for each type #2 makes no sense - the sitemap task should still run and generate the sitemaps, just without the priority element being included in the XML file. The issues outlined for Gallery #3 and Pages #3 I've resolved for the next maintenance release (some areas weren't honoring "include 0 items" properly), while profiles was already behaving correctly in this regard.
Sonya* Posted June 20, 2019 Author Posted June 20, 2019 21 hours ago, bfarber said: Thus, what you are describing for each type #2 makes no sense - the sitemap task should still run and generate the sitemaps, just without the priority element being included in the XML file. But this is exactly what happens here e. g. with status updates. These are the settings for status update sitemap: The database entries for status updates (all there but without data, updating again and again). If I untick "Do not include", then all sitemaps are generated with data despite number of items = 0.
bfarber Posted June 20, 2019 Posted June 20, 2019 "Number of items" set to 0 was not being respected properly for content item classes (of which status updates are one). This has been resolved for the next release, as I said.
Sonya* Posted June 20, 2019 Author Posted June 20, 2019 It was just a confirmation that tick/untick "Do not include" does wipe out the whole data of the sitemap, not only priority. If I untick "Do not include" and leave number of items = 0 then sitemaps are generated with data. Means this is not the number of items that results in empty sitemaps. This is "Do not include" setting.
bfarber Posted June 20, 2019 Posted June 20, 2019 Gotcha - I've looked closer and I see where that inconsistent behavior is stemming from and have corrected that as well.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.