Jump to content

Sphinx isn't just for big boards....


Recommended Posts

I have been wanting to try sphinx for a while now and although we don't have a big board we are around 200,000 posts and growing. I know it has more to do with the results that mysql returns when doing search queries and not with IPB itself because we would get the same thing with the other software we used to use. It always seemed like search results never returned what you were looking for or if it did it was buried among less then stellar results. There have been quite a few times I have relied on using google to search the site.

I installed and setup sphinx yesterday and setup IPB per the article in the documentation here on the site. I have also set it to allow 2 chars rather then 3. The search results we get now are exactly what you would be wanting and no results that make you think What the heck?. So even though we are not a big board sphinx has been a nice improvement.

Having sphinx setup now will also be nice since we are debating switching some of the tables to innodb now that we are on mysql 5.5 due to occasional table lock.

Link to comment
Share on other sites

Personally I don't care for sphinx, if I'm searching I want real time results, waiting for items to be indexed is not my ideal situation. While sphinx does have it's place for very large sites which have no other option, it's not for smaller boards imo. As for switching to innob, it would be better to find the reason for your locking tables and resolving the issue imo.

Link to comment
Share on other sites


Personally I don't care for sphinx, if I'm searching I want real time results, waiting for items to be indexed is not my ideal situation. While sphinx does have it's place for very large sites which have no other option, it's not for smaller boards imo. As for switching to innob, it would be better to find the reason for your locking tables and resolving the issue imo.




sphinx does appear to now offer real time instant indexing
but haven't seen any words if invision plans to take advantage of this yet
Link to comment
Share on other sites


Personally I don't care for sphinx, if I'm searching I want real time results, waiting for items to be indexed is not my ideal situation. While sphinx does have it's place for very large sites which have no other option, it's not for smaller boards imo. As for switching to innob, it would be better to find the reason for your locking tables and resolving the issue imo.




What is considered a large site in your opinion? We have ~3million posts, and I am very frustrated with Sphinx since I am not sure how to give additional weight to the topic name. We are a celebrity site and when a person types say "Angelina Jolie" it may be on page 3 or so of the search results. If the topic title == search query that should be the number one result.

Do you think the traditional search would yield more relevant results? I cant switch easily because when we were on IPB2 we moved the posts forms to innodb because of full table locks vs the row level locking of innodb (which solved our performance issues at the time). I could move back to myisam but I worry our performance will go bad. We are currently on IPB 3.3.2

Any info or either the weight or traditional full text search would be appreciated.
Link to comment
Share on other sites

3 Million would be a large board imo... and would perform better with sphinx, however as you are seeing sphinx results are much different then a mysql search.

My first question would be do you need 3 million post or can you get away with arching half of those? If you can returning to myisam and mysql search would be my option, however more info would be needed, server specs, traffic etc for an informed decision.

Link to comment
Share on other sites

The problem is not Sphinx at all: it is that Invision doesn't support sorting results by search score. Sphinx will happily return a search ranking in which some fields have higher weights, closer matches are considered better, etc. ... But the board software will not allow you to sort by that ranking, only by things like last post date, title, etc. You might consider weighing in on my feature suggestion here.

For an overview of the available options see this Sphinx page.

Link to comment
Share on other sites


3 Million would be a large board imo... and would perform better with sphinx, however as you are seeing sphinx results are much different then a mysql search.



My first question would be do you need 3 million post or can you get away with arching half of those? If you can returning to myisam and mysql search would be my option, however more info would be needed, server specs, traffic etc for an informed decision.




Regarding archiving I am afraid not, again we are a celebrity type forum so sometimes certain people are not to popular so they can go unposted for years but it does not mean I want to lock the topic, because if they do a new movie or something then they need to be posted to again.

As far as stats
5.4million pageviews last month
650k visitors
about half are unique.

My server specs are dual quad xeons (8 cores total), 24gb ram, 4x10krpm 900gb drives RAID 5.

I would personally be really interested in what eGullet mentioned if there were some way to adjust the rankings. Anyways I will deal with it for now, I just hoped there was an easy way.
Link to comment
Share on other sites

My board basically requires Sphinx since the native search is FAR too limiting. We have a lot of posts with hyphenated names such as C-160 and the SQL search simply does NOT work with these things. On 3.2 I modified Sphinx and the IPB PHP code to allow the hyphenated names and it worked nicely, those changes have been incorporated into 3.3.2 making the search FAR more useful in my opinion. Yes, there is a wait for indexing, but for my forum that is a VERY small price to pay. I didn't go with Sphinx for performance, but for functionality.

Go ahead, go to the top of this topic and type C-160 into the search box, and although it is clearly here twice, you will get no results. On a Sphinx search, it'll pull it right up! Major win for Sphinx and I agree with the OP.

Link to comment
Share on other sites


I would personally be really interested in what eGullet mentioned if there were some way to adjust the rankings.



When using Sphinx you can fine-tune the rankings in mind-dumbing detail, but really right out of the box if you just make the title count for more than the content the new proximity stuff in his latest release is fantastic. It really is just a matter of figuring out how to allow custom sorting on a per-search-engine basis in IPB (since obviously SQL search doesn't have this field at all).
Link to comment
Share on other sites


When using Sphinx you can fine-tune the rankings in mind-dumbing detail, but really right out of the box if you just make the title count for more than the content the new proximity stuff in his latest release is fantastic. It really is just a matter of figuring out how to allow custom sorting on a per-search-engine basis in IPB (since obviously SQL search doesn't have this field at all).



Very interesting! Could you give an example of these settings you made in Sphinx?
Link to comment
Share on other sites

If you have a look at this page you can see all the available rankers. Unfortunately IPB does not support using any of them, there is currently no way to get results sorted by anything other than the few fields that are actually in the DB since those are the only things that SQL can sort by and Invision didn't implement a way to have different sortings possible for different search engines.

Link to comment
Share on other sites


My board basically requires Sphinx since the native search is FAR too limiting. We have a lot of posts with hyphenated names such as C-160 and the SQL search simply does NOT work with these things. On 3.2 I modified Sphinx and the IPB PHP code to allow the hyphenated names and it worked nicely, those changes have been incorporated into 3.3.2 making the search FAR more useful in my opinion. Yes, there is a wait for indexing, but for my forum that is a VERY small price to pay. I didn't go with Sphinx for performance, but for functionality.



Go ahead, go to the top of this topic and type C-160 into the search box, and although it is clearly here twice, you will get no results. On a Sphinx search, it'll pull it right up! Major win for Sphinx and I agree with the OP.




I have to agree and which is why we went with Sphinx. The size of our site had nothing to do with wanting to try Sphinx, it was the results which is why we went with it. We had a thread that was posted a good many months ago and someone tried finding it. Went as far as posting for help to find it. I tried searching for it and couldn't find it either. So I went on google and did a site search and found it. Now that we are using Sphinx, if you do a search on the site to try and find that thread with even less keywords it is one of the first couple of results. Our search results and relevance are much better now.
Link to comment
Share on other sites

We've also got our Sphinx re-index running every 5 minutes, rather than the default fifteen: I think this is a good compromise, yielding almost real-time search results. Presumably someday Invision will add support for Sphinx's realtime indexing, and then we won't have to worry about it at all.

Link to comment
Share on other sites

The realtime is a completely different beast: it's much more than just changing the sphinx.conf file. Every time someone posts something you have to make calls to the Sphinx server. It's going to be some real work on Invision's part, and since only a fraction of their customers use Sphinx (admittedly probably their larger customers, but we pay the same as anyone else) they just don't prioritize it highly.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...