Jump to content

Sphinx kills resources when rebuilding a large community's indexes


Recommended Posts

After bringing this issue up with IPS in a ticket, I was directed to post the question in the server management forum. Hopefully, a few of you who use Sphinx on a large community can offer some advice!

So, according to the official docs page on setting up Sphinx with the IPS suite, the index needs to be wiped and fully rebuilt once a day. I understand the purpose of this rebuild is to ensure the integrity of edited posts and such in the index; but the process causes significant resource problems on my community with over 800,000 posts.

  • Some major database locking appears to be going on, as the site becomes completely inaccessible while the nightly cronjob runs.
  • The rebuild process is so intensive that the server load surpasses 220 as it runs; it is around 3-4 or so during normal operation.
  • The massive load causes significant server-wide slowdown for everything else that runs on the machine, including other sites. Some services even crash.

Even though I have set Sphinx's rebuild to run at an off-peak time, it still results in roughly 15-20 minutes of nightly downtime, which is becoming increasingly unacceptable as the community grows. Sphinx returns better search results than MySQL fulltext and allows us to use InnoDB for our posts table (which has been immensely beneficial for performance - I have no intention of switching back to MyISAM); but the downtime it causes every night is an annoyance, to say the least, and it only gets worse as the database grows larger.

We do have a replicated copy of our database set up for backup purposes, so it would be possible to run Sphinx's rebuilds against that to avoid the locking issue; however, that would only exacerbate the load problem as the replicated database is stored on a hard drive array (the production database sits on SSD's). An approach enabling us to drop full nightly rebuilds entirely, one way or another, would be ideal, but I'm not sure how that can be done without creating a data integrity problem. Or perhaps there is a way to simply make Sphinx play nice with the server resources?

For reference, this is running on a dedicated server equipped as follows:

  • Intel Xeon E3-1230 @ 3.2 GHz (quad-core)
  • 32 GB RAM
  • 2 SSD's in software RAID1
  • 2 7200 RPM HDD's in software RAID1

The OS, production MySQL databases, and applications are all installed on the SSD array. My site's Sphinx indexes are currently stored on it as well. Website files, including user uploads, as well as a replicated MySQL installation (used for backups, as noted above) are on the hard drive array. It's hardly a low-performance setup, but Sphinx manages to nearly kill it every night.

Link to comment
Share on other sites

We used to have an identical problem. We have almost identical setups on our servers except that our CPUs are arguably better and it still used to take our site down.

Do you use wildcard indexes, as in so your users can use * in their search terms? I found that with that enabled, it took 15 minutes to build the indexes for our 5m posts, and during that time it would do exactly as you described. Without wildcard searching, the indexes build in about 2 minutes and it doesn't take long enough to cause any downtime.

I spent quite some time researching this and even trying to get help from the Sphinx forums but I couldn't find any solution:
http://sphinxsearch.com/forum/view.html?id=9706

I did just notice this blog post though which has a couple of suggested throttling options which I'll try out later today:
http://sphinxsearch.com/blog/2012/08/14/indexing-tips-tricks/

Link to comment
Share on other sites

Do you use wildcard indexes, as in so your users can use * in their search terms? I found that with that enabled, it took 15 minutes to build the indexes for our 5m posts, and during that time it would do exactly as you described. Without wildcard searching, the indexes build in about 2 minutes and it doesn't take long enough to cause any downtime


Indeed, I have wildcard indexes enabled. I suppose disabling them for the time being may fix the downtime issue, but they're a great Sphinx feature that makes search a lot more useful.

Let me know how your throttling tweaks work out. It's reassuring to see that I'm not the first to run into this problem.

Why doesn't IPB have multiple tables and keep new updated/new posted content on differrent indexes ?

The need of rebuilding index every day is not wise at all.

A nightly index rebuild seems like an extremely inefficient way to approach the handling of updated and deleted content to me as well. It simply becomes unmaintainable once you're into hundreds of thousands, or even millions, of posts. IPS have stated several times that virtually rewriting the suite's search systems is a major goal of the 4.x series, so here's hoping there will be significant improvements in not just the search algorithm, but also the indexing performance. I'd advise starting a feedback topic for specific suggestions on improving the search system.

Later versions of Sphinx support real time indexes. Search is set to be a big focus for IPS4 so I suspect real time indexes may play a part in those changes.

I'm really happy to hear that, Andy. :smile: Sphinx looks like a really robust piece of software that is currently used in a rather barebones implementation in the IPS suite, but the better relevancy and ability to use InnoDB tables alone make it worthwhile for larger forums. If an IPS-backed solution to the index rebuild problem can be found, that will be excellent.

Link to comment
Share on other sites

the wildcard search is so useful, really do hope IPB works in sphinx integration.

I couldn't agree more! I love my IP.Board, but search is such a weak point. I installed Sphinx, not for performance but for functionality. The standard search just doesn't work. My nightly sphinx index rotation was failing. It turns out that I was running out of memory so to fix the issue had to reduce the memory for the indexer (no real noticeable depreciation in performance). I would love to see better integration with Sphinx, or better functionality of search in general. While I'm glad that we can use Sphinx, I sure wish that there was more attention to it so we can fully utilize the latest versions.

Link to comment
Share on other sites

  • 1 month later...

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...