Invision Community 4: SEO, prepare for v5 and dormant account notifications By Matt Monday at 02:04 PM
nylyon Posted November 7, 2015 Posted November 7, 2015 Since 4.1 removed Sphinx, I need to get the SQL search working for my forum. The primary item I need is the ability to search hyphenated words. Here's what I have tried: In /usr/share/mysql/charsets/ I edited latin1.xml (where character_set_server and character_set_database point to) and in the ctable entry changed 10 to 01 in the 3rd row 14th column (0x2D) I then restarted mysql and even apache, rebuilt the search index and still am unable to search for hyphenated words. Can anyone help me with what I may be doing wrong?
Flitterkill Posted November 8, 2015 Posted November 8, 2015 Without editing the character set, using my 2-character enabled MySql search: D-4 Not found D-23 Found No quotes wrapped around the search terms needed. The dash-enabled search stuff was asked for and tackled way way back in the alpha days a year ago. What are some examples of terms you are looking for?
nylyon Posted November 8, 2015 Author Posted November 8, 2015 Searching for C-160. If I reduce the MySQL search to 3 characters it will find EVERY occurance of 160 not necessarily C-160.
Flitterkill Posted November 8, 2015 Posted November 8, 2015 Repeating my test from above. Made two new posts in my test install: C-160. 160. Search for C-160 finds: Post with C-160, Post with 160. Search for "C-160" finds: Just C-160. Wrap the search term in quotes.
nylyon Posted November 8, 2015 Author Posted November 8, 2015 Hey, thank you for your help. While it still doesn't function as I think it should, it gets me closer to where it's probably livable. I had to change ft_min_word_len=3 for the searches to find anything. I made 6 posts one: C-160 two: D-160 three: C 160 four: D 160 five: 160 six: C160 Search for: C-160: one, two, three, four, five "C-160": one, three <-- why did it find three? "C 160": one, three <-- why did it find one? -160: one, two, three, four, five "-160": one, two, three, four, five 160: zero results <-- shouldn't that find them all? "160": one, two, three, four, five <-- not sure why 160 gets zero and "160" finds them all? c160: six Sphinx saw the hyphen more as a character, something is still not seeing the hyphen as a character in SQL search I think. I can probably live with this, but would have hoped for a more precise search, losing Sphinx removes a more comprehensive search for me.
Flitterkill Posted November 8, 2015 Posted November 8, 2015 I'm not entirely sure, I'd have to poke the code to see how the hyphen is being accounted for. I suspect it is accounted for but also ignored (that's a horrible way of saying it but I just woke up ) ft_min_length? You may wish to convert your posts table to inno_db instead of MyIsam if not for anything else the stopwords list is much much smaller with innoDB.. https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html If it already is innoDB then you need the other var setting: [mysqld] innodb_ft_min_token_size=2 ft_min_word_len=2 Or change it to 3 if you like. The index size is bigger but not anywhere near as big as I thought it would be when set to two characters. Let me enter your test entries and give it a whirl...
Flitterkill Posted November 8, 2015 Posted November 8, 2015 I'm diving a bit deeper into this now. Gonna wait to hear from you regarding whether you have inno or myisam set and/or configured right before I share what I just got results wise. The IPS search index looks pretty good though. Everything is there minus html elements. If IPS's goal was to drop Sphinx in order to make the default MySql search more robust they have an opportunity to do so. Might even be an opportunity for an app/plugin to bulk up the query routine on it's own.
nylyon Posted November 8, 2015 Author Posted November 8, 2015 My test system is myisam the production has innodb for the post and topic tables.
Flitterkill Posted November 8, 2015 Posted November 8, 2015 These tests on your test install or production?
nylyon Posted November 8, 2015 Author Posted November 8, 2015 On the test, production is running Sphinx
Flitterkill Posted November 8, 2015 Posted November 8, 2015 For MyISAM you'd have to change it to 3 character no matter what as the default for that is 4 character (inno is three by default). For my test install 160 or "160" finds everything. No idea why 160 is not working for you. Additional FYI:
Flitterkill Posted November 9, 2015 Posted November 9, 2015 Hell of a rabbit hole you are sending me on. I'll leave it at that. :)
nylyon Posted November 9, 2015 Author Posted November 9, 2015 My production forum is running 4.0.13.1, I am biting the bullet and converting to SQL search. Currently rebuilding the index. It is as yours is as well, were searching for 160 is bringing up everything. This is livable, but not optimal.
Flitterkill Posted November 9, 2015 Posted November 9, 2015 Make sure you have both ft_min_word_len and innodb_ft_min_token_size set to what you want for character searches. The routine actually reads in the MyIsam ft_min_word_len to determine the min character search on the IPS side. If not found it defaults to 3. So even if you have the search index as innodb, etc. etc... Like I said, rabbit hole...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.