Invision Community 4: SEO, prepare for v5 and dormant account notifications By Matt November 11, 2024
Huai Posted October 9, 2010 Posted October 9, 2010 Hi, I really like to use IPB. But it uses MySQL fulltext search or manual search which is very slow. Only way to solve in using Sphinx. Is it possible to improve it like phpBB does? phpBB have it's own search method and uses PCRE and mbstring in the back, works very well with non-Latin characters.
Enkidu Posted October 14, 2010 Posted October 14, 2010 +1 from me :) I really think the search system should be improved especially for those who can't install Sphinx.
bfarber Posted October 28, 2010 Posted October 28, 2010 Hi, I really like to use IPB. But it uses MySQL fulltext search or manual search which is very slow. Only way to solve in using Sphinx. Is it possible to improve it like phpBB does? phpBB have it's own search method and uses PCRE and mbstring in the back, works very well with non-Latin characters. I have no idea how phpBB works, but I can't fathom how any PHP libraries help with database searching, unless it's to convert the content before sending it to the database to some normalized form. In any event, I don't think this is necessary at all - you just need your database configured correctly for the languages you use on your site.
Huai Posted October 31, 2010 Author Posted October 31, 2010 I have no idea how phpBB works, but I can't fathom how any PHP libraries help with database searching, unless it's to convert the content before sending it to the database to some normalized form. In any event, I don't think this is necessary at all - you just need your database configured correctly for the languages you use on your site. IPB only use MySQL native search, which has really bad support for language characters such as Korean, Japanese and Chinese. not sure ow phpBB handles it, I think they are converting the contents as you said. The only way I know is to use Sphinx to solve the problem...
Mark Posted October 31, 2010 Posted October 31, 2010 You'll have to excuse me if any of this is a little off - I'm by no means an expert on Asian languages, but have come across this problem while working on a Japanese site. The problem with MySQL fulltext and Asian languages is that fulltext searches by word - and assumes that each word is separated by a space (by default). This isn't the case in Chinese, Japanese and Korean. Now, before I carry on, I should point out that normal (non-fulltext search) does work fine: And we do have an option to use that. The best thing you can do with MySQL is pre-process text to be stored separating the words. I'm not sure if phpBB does this - last time I checked they just uses non-fulltext search, but it sounds possible from what you're describing. Once the words are seperated by a delimiter, MySQL can use that as it's word delimiter for fulltext searches. Now, this isn't perfect, a few problems that come into play: You'd need to remove the delimiter when showing the content so it in general adds a fair amount overhead (could cause struggles with very long posts) You wouldn't be able to have a bilingual site if you were doing this You'd need to configure MySQL to allow fulltext searches on 1 character (since just 1 character can represent a word) - not only is this not ideal, many hosting providers will be unwilling to do this. Now when you add up all those problems, I have to wonder if it's not just better off using non-fulltext search (I never did any extensive tests, but if I had to guess I'd say you'd be very close to loosing any benefits fulltext gives you at all). Or using Sphinx - after all, you're going to need to change your MySQL configuration anyway, you may as well install Sphinx. As an alternative to code-level changes, if you're really set on making this work, there is a project called Senna which is like MySQL but with support for fulltext searching Asian languages. The above problems still apply, but it should in theory make fulltext searching work without any code-level changes to IP.Board. All in all, I think it's a very invasive change to make for little benefit when 3 solutions (non-fulltext searching, Sphinx and Senna) are already available.
Huai Posted November 1, 2010 Author Posted November 1, 2010 All in all, I think it's a very invasive change to make for little benefit when 3 solutions (non-fulltext searching, Sphinx and Senna) are already available. Yes, I know those 3 solutions. But non-fulltext searching is really slow, and for other 2 you have to set-up manually. So I was thinking are there batter choices or not. But I'm not coder, so just asking questions here, sorry for this. Thank you for the detailed explanation!
Recommended Posts
Archived
This topic is now archived and is closed to further replies.