Jump to content

Stemming and Synonyms in search


WebCMS

Recommended Posts

Obviously, IC does not use full-text search but also lacks the basic stemming, synonyms, plurals in search.

If we search for the word "consist" and there also exist words like consists, consistent, consistently, etc. these additional words are not matched in the search which does an exact word match search only making the search far from accurate.

As you are aware, stemming works by using the derived variations of the word's root in the search.

Searched word: manage

Root: manag

Stems: manage, manager, managers, manages, management, managerial, managing...

Now we can use these root words with wild cards in search logic for a knock-out search. Plurals are covered by stemming (wildcard on root).

https://www.roscripts.com/php_search_engine-119/

https://github.com/hugsbrugs/php-synonym
https://www.google.com/search?q=download+thesaurus+database

https://stackoverflow.com/questions/2475045/php-script-to-find-synonyms
https://github.com/jmagnone/codeigniter-googlesearch-api
https://www.hitbullseye.com/Vocab/List-of-Synonyms.php

https://github.com/markfullmer/porter2

https://tartarus.org/martin/PorterStemmer

https://tartarus.org/martin/PorterStemmer/php.txt

https://www.javatpoint.com/stemming-words-using-python

https://www.phpclasses.org/package/12888-PHP-PHP-extension-to-implement-the-Porter-stemmer.html

https://www.geeksforgeeks.org/introduction-to-stemming/

https://pecl.php.net/package/stem

https://en.wikipedia.org/wiki/Stemming

Soundex - https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_soundex
(maybe implemented using an optional "Sounds Like" checkbox on the Search page)

Please implement stemming and synonyms in search logic.

Edited by WebCMS
Link to comment
Share on other sites

I saw that already which is meant for large, high-traffic sites.

I made the suggestion for smaller sites with some basic stemming and synonyms for a wider search coverage (vs exact word match search which is limiting and not user-friendly).

Link to comment
Share on other sites

Why is Elasticsearch not included on CiC managed hosting?

I understand the web, database, auth, APIs, load-balancing, media, caching, search, etc. can be hosted on separate servers but not including ElasticSearch on CiC and locating it on an external server does not make it managed. It requires self-hosting + maintenance that cost $$$ or paying subscription $100+ per month separately. Instead of each CiC client setting up EC separately, it could be setup on CiC. Those who wish to use it could just switch to it inside ACP.

The current search without stemming, synonyms, plurals, etc. is limited and sub-optimal.

Edited by WebCMS
Link to comment
Share on other sites

This is something they've been working on previously based on other live streams, etc.  It's not as simple as simply tossing up a random Opensearch/Elasticsearch instance when you talk about the size and scale that iPS operates at.  In addition, there are BIG costs associated with an enterprise class platform... so it has to be done in a way that is economical for IPS as well otherwise that hosting cost could have a decent size bump!

Link to comment
Share on other sites

PHP Stemming class + Thesaurus dB for synonyms => poor man's full text search for free 😀

Most sites are small-mid size and don't really need ElasticSearch

Edited by WebCMS
Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...