Invision Community 4: SEO, prepare for v5 and dormant account notifications Matt November 11, 2024Nov 11
Posted November 7, 20159 yr Since 4.1 removed Sphinx, I need to get the SQL search working for my forum. The primary item I need is the ability to search hyphenated words. Here's what I have tried: In /usr/share/mysql/charsets/ I edited latin1.xml (where character_set_server and character_set_database point to) and in the ctable entry changed 10 to 01 in the 3rd row 14th column (0x2D) I then restarted mysql and even apache, rebuilt the search index and still am unable to search for hyphenated words. Can anyone help me with what I may be doing wrong?
November 8, 20159 yr Without editing the character set, using my 2-character enabled MySql search: D-4 Not found D-23 Found No quotes wrapped around the search terms needed. The dash-enabled search stuff was asked for and tackled way way back in the alpha days a year ago. What are some examples of terms you are looking for?
November 8, 20159 yr Author Searching for C-160. If I reduce the MySQL search to 3 characters it will find EVERY occurance of 160 not necessarily C-160.
November 8, 20159 yr Repeating my test from above. Made two new posts in my test install: C-160. 160. Search for C-160 finds: Post with C-160, Post with 160. Search for "C-160" finds: Just C-160. Wrap the search term in quotes.
November 8, 20159 yr Author Hey, thank you for your help. While it still doesn't function as I think it should, it gets me closer to where it's probably livable. I had to change ft_min_word_len=3 for the searches to find anything. I made 6 posts one: C-160 two: D-160 three: C 160 four: D 160 five: 160 six: C160 Search for: C-160: one, two, three, four, five "C-160": one, three <-- why did it find three? "C 160": one, three <-- why did it find one? -160: one, two, three, four, five "-160": one, two, three, four, five 160: zero results <-- shouldn't that find them all? "160": one, two, three, four, five <-- not sure why 160 gets zero and "160" finds them all? c160: six Sphinx saw the hyphen more as a character, something is still not seeing the hyphen as a character in SQL search I think. I can probably live with this, but would have hoped for a more precise search, losing Sphinx removes a more comprehensive search for me.
November 8, 20159 yr I'm not entirely sure, I'd have to poke the code to see how the hyphen is being accounted for. I suspect it is accounted for but also ignored (that's a horrible way of saying it but I just woke up ) ft_min_length? You may wish to convert your posts table to inno_db instead of MyIsam if not for anything else the stopwords list is much much smaller with innoDB.. https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html If it already is innoDB then you need the other var setting: [mysqld] innodb_ft_min_token_size=2 ft_min_word_len=2 Or change it to 3 if you like. The index size is bigger but not anywhere near as big as I thought it would be when set to two characters. Let me enter your test entries and give it a whirl...
November 8, 20159 yr I'm diving a bit deeper into this now. Gonna wait to hear from you regarding whether you have inno or myisam set and/or configured right before I share what I just got results wise. The IPS search index looks pretty good though. Everything is there minus html elements. If IPS's goal was to drop Sphinx in order to make the default MySql search more robust they have an opportunity to do so. Might even be an opportunity for an app/plugin to bulk up the query routine on it's own.
November 8, 20159 yr Author My test system is myisam the production has innodb for the post and topic tables.
November 8, 20159 yr For MyISAM you'd have to change it to 3 character no matter what as the default for that is 4 character (inno is three by default). For my test install 160 or "160" finds everything. No idea why 160 is not working for you. Additional FYI:
November 9, 20159 yr Author My production forum is running 4.0.13.1, I am biting the bullet and converting to SQL search. Currently rebuilding the index. It is as yours is as well, were searching for 160 is bringing up everything. This is livable, but not optimal.
November 9, 20159 yr Make sure you have both ft_min_word_len and innodb_ft_min_token_size set to what you want for character searches. The routine actually reads in the MyIsam ft_min_word_len to determine the min character search on the IPS side. If not found it defaults to 3. So even if you have the search index as innodb, etc. etc... Like I said, rabbit hole...
Archived
This topic is now archived and is closed to further replies.