Jump to content
You are viewing a curated collection of the most significant posts in this topic.

Featured Replies

Posted

IPS Sitemap generator using special database table source for refreshing - core_sitemap.

Primary search engine source of sitemap is url https://example.com/sitemap.php which is list of sub-sitemap files. You can see list of that files proceed for this link.

Each of that file contain no more than 1000 urls to specail pages (profile status, topic (without number of pages or comment) and other elements, with supported sitemap as core extension).

One of our case is forum with more than 100k topics, more than 4.2kk posts and more than 6kk users. So with simply math we have 5214 sitemap files (you can simply count number of that files with command 

select count(*) from core_sitemap; // 5214

Sitemap generator task run by default once per 15 minuts and update only one oldest element from that big list. With simple math we can try to answer question 'how many time we need for update everything?' (because users can post not only in newest and may post in some old topics... but.. new created topic will add to sitemap file only when ALL older files will newer than current file with new topic inside). So, how much time we need for update?

5214*15 = 78210 minuts = 1303 hours = 54 days! 54! days! Search engine will add your newest content after 54 days after them posted. Incredible thing. Not believe? Or want to know this lag for your community? You can simple know your lag time with that sql:

select FROM_UNIXTIME(updated,'%a %b %d %H:%i:%s UTC %Y') from core_sitemap order by updated asc limit 1; // Wed Nov 01 14:13:49 UTC 2017

Yep.. In our case oldest file last updated in 1 November...

What we should do for fix it? Very fast solution - create a temp file, like a 'mycustomsitemapupdater.php' with this content:

<?php

require 'init.php';

$generator = new \IPS\Sitemap;
$generator->buildNextSitemap();

$last = \IPS\Db::i()->select('FROM_UNIXTIME(updated, "%a %b %d %H:%i:%s UTC %Y")', 'core_sitemap', null, 'updated asc', 1)->first();
print_r('Oldest time now: ' . $last . PHP_EOL);

And run it via web or cli so times, what you want (before oldest time not be so old).

Solution for a longer time - add this script to the cron and run it every minute or, which better - change task 'sitemap generator' run time from 15 mins to one minute (but it may be not solve you special problem, if you need to update it faster - do it with smart).

Better solution - wait for IPS updating of that system.

Thanks for attension!

P.S. If you read my text with negative speach - it's wrong. I love IPS and just want to make attension for that problem and help others with their large communities. ^_^

  • Replies 300
  • Views 28.8k
  • Created
  • Last Reply

Top Posters In This Topic

Most Popular Posts

  • We've added the timestamp into the sitemap and we're looking to add a tool to quickly rebuild the sitemap on demand.

  • So many misconceptions. If @daveoh were to share his site, you would see it's very clean, well organized, has high quality content and a glance shows his backlink profile is strong. He also has l

  • Just so you know, we're watching this topic and looking at our own stats to build a better picture. The facts we know: 1) Almost every site I've got access to (via friends, etc) have seen a

Posted Images

  • Community Expert

I am also not sure what ProSkill is asking about for example. The original post is about the speed of generating sitemaps for large sites, ProSkill talks about “decrease in cached URLs”. Not sure what that is and how it relates to sitemap generation. 

6 hours ago, opentype said:

I am also not sure what ProSkill is asking about for example. The original post is about the speed of generating sitemaps for large sites, ProSkill talks about “decrease in cached URLs”. Not sure what that is and how it relates to sitemap generation. 

I don't think he clearly read it correctly. lol

  • 1 month later...

following this

  • 2 weeks later...
  • 1 year later...
13 minutes ago, sadams101 said:

You guys do what you like...there is no sitemap for posts--there should be. If all sitemaps are set up correctly, Google only crawls everything in the maps once, then only crawls the new content when it sees the lastmod date change. There is no finite crawl budget for my site, this is nonsense.

PS - If there were such a think as a crawl budget, certainly not putting a lastmod date in the sitemap would cause google's spider to waste a ton of time trying to find new content...IPB currently has this issue if you're not running 4.4.

I agree, Posts should be part of a forum sitemap. Didn't know they weren't.

  • 6 months later...
6 hours ago, Upgradeovec said:

Wow! I see that too @Sonya*! I think IPS fixed that right now!

duck dancing GIF

No, we didn't. I think you might have paginated from page 1 using AJAX and ended up looking at the canonical tag that was initially loaded, or simply made a mistake. 😛

image.png

  • 3 years later...
1 hour ago, SeNioR- said:

no

Great, thank you!

Recently Browsing 0

  • No registered users viewing this page.