Jump to content

Elasticsearch improvements


Makoto

Recommended Posts

Playing around with Elasticsearch, I've noticed some things that could potentially use improvement.

First, let's start with a basic query,

jMhfsxB.png

Ignore the gibberish. It's lorem ipsum text, but it works well for demonstration.

So, what's my problem with these results?

Well, further down, on page 3, there is a post that contains this exact phrase in the content body,

sZl8hLM.png

There doesn't seem to be an issue with prioritizing exact matches when it comes to thread titles that I can see, it's only content bodies that aren't given proper priority.

Alright, now before we go further, let's look at the Elasticsearch query for that,

{
  "query": {
    "function_score": {
      "query": {
        "function_score": {
          "query": {
            "bool": {
              "must": [
                {
                  "multi_match": {
                    "query": "Et dolore consequatur ",
                    "fields": [
                      "index_content",
                      "index_title^5"
                    ],
                    "operator": "or"
                  }
                }
              ],
              "must_not": [],
              "filter": [
                {
                  "bool": {
                    "should": [
                      [
                        {
                          "terms": {
                            "index_class": [
                              "IPS\\core\\Statuses\\Status",
                              "IPS\\core\\Statuses\\Reply"
                            ]
                          }
                        },
                        {
                          "terms": {
                            "index_class": [
                              "IPS\\forums\\Topic\\Post"
                            ]
                          }
                        },
                        {
                          "terms": {
                            "index_class": [
                              "IPS\\downloads\\File",
                              "IPS\\downloads\\File\\Comment",
                              "IPS\\downloads\\File\\Review"
                            ]
                          }
                        },
                        {
                          "terms": {
                            "index_class": [
                              "IPS\\blog\\Entry",
                              "IPS\\blog\\Entry\\Comment"
                            ]
                          }
                        },
                        {
                          "terms": {
                            "index_class": [
                              "IPS\\gallery\\Image",
                              "IPS\\gallery\\Image\\Comment",
                              "IPS\\gallery\\Image\\Review"
                            ]
                          }
                        },
                        {
                          "terms": {
                            "index_class": [
                              "IPS\\gallery\\Album\\Item",
                              "IPS\\gallery\\Album\\Comment",
                              "IPS\\gallery\\Album\\Review"
                            ]
                          }
                        },
                        {
                          "terms": {
                            "index_class": [
                              "IPS\\calendar\\Event",
                              "IPS\\calendar\\Event\\Comment",
                              "IPS\\calendar\\Event\\Review"
                            ]
                          }
                        },
                        {
                          "terms": {
                            "index_class": [
                              "IPS\\cms\\Pages\\PageItem"
                            ]
                          }
                        },
                        {
                          "terms": {
                            "index_class": [
                              "IPS\\nexus\\Package\\Item",
                              "IPS\\nexus\\Package\\Review"
                            ]
                          }
                        }
                      ]
                    ]
                  }
                },
                {
                  "terms": {
                    "index_permissions": [
                      4,
                      "m1",
                      "ca",
                      "cm",
                      "c1",
                      "*"
                    ]
                  }
                }
              ]
            }
          },
          "linear": {
            "index_date_updated": {
              "scale": "120d",
              "decay": "0.5"
            }
          }
        }
      },
      "script_score": {
        "script": {
          "source": "doc['index_author'].value == params.param_memberId ? ( _score * Float.parseFloat( params.param_booster ) ) : _score",
          "lang": "painless",
          "params": {
            "param_memberId": 1,
            "param_booster": "1.5"
          }
        }
      }
    }
  },
  "sort": [],
  "from": 50,
  "size": 25
}

Now the next question, how can we improve this so the above doesn't happen?

It's worth nothing, that when searching for the exact phrase, it does return this topic without issue, but wouldn't it be better for this to appear at the top of the search results without us having to do anything at all?

One option I see is to perform a bool query that gives priority to phrase matches, even when we're not explicitly searching for a phrase.

For example,

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "field": "web developer"
          }
        },
        {
          "match_phrase": {
            "field": "web developer",
            "boost": 10
          }
        }
      ],
      "minimum_number_should_match": 1
    }
  }
}

We're using multi-match above, but the concept is the same. Do both an or/and query and a phrase query, then give a major boost to the phrase query.

This should give us some rock solid search results without requiring the user to fiddle around with the advanced search tools at all.

Link to comment
Share on other sites

Actually, of even more interest might be using slop with phrase matching,

https://www.elastic.co/guide/en/elasticsearch/guide/current/_closer_is_better.html

Compare the two now:

kabOtBT.png

T4ELpJM.png

This was accomplished by merely adding a slop value the the search query,

$searchConditions[] = array( 'multi_match' => array( 'query' => $term, 'fields' => array( 'index_content', $titleField ), 'type' => 'phrase', 'slop' => 50 ) );

It's no longer an exact match, no, but this seems to be more along the lines of what I'd expect a general search query to produce outside of a MySQL based engine. It prioritizes queries based on proximity of terms, not just if they have terms.

The caveat with this is that it is still at its core phrase matching and it requires all of the terms to be present.

Essentially, I would consider this a prime drop-in replacement for the AND operator.

Link to comment
Share on other sites

Good ideas! I have not yet migrated to Elasticsearch because I did not see improvements in search results.

In my community we do lots of searches with parts of words. In fact, we search for electronic circuit codes and most of the time we only find the electronic circuit by typing only the important part of the circuit board model. LIK E%search%

I asked, I complained and I asked for God's sake. But the IPS gurus did not hear me. I wanted adminCP to have options for customizing searches. The forum administrator would be able to customize according to the needs of your community.

For now I had to solve the problem by manually manipulating IPS core scripts. But it would be great if that native resource in IPS

Link to comment
Share on other sites

@Makoto I have problem with elastic. Could you help me?

[2019-04-03T19:34:53,376][DEBUG][o.e.a.b.TransportShardBulkAction] [elastic-node1] [content][1] failed to execute bulk item (index) index {[content][content][forums_topic_post-1509514], source[{"index_title":null,"index_date_created":1423006305,"index_author":148190,"index_date_updated":1423006305,"index_hidden":0,"index_comments":1616,"index_is_last_comment":false,"index_permissions":["*"],"index_tags":[],"index_container_class":"IPS\\forums\\Forum","index_club_id":null,"index_class":"IPS\\forums\\Topic\\Post","index_date_commented":1423006305,"index_class_type_id_hash":"bccbfa12944544291b6a049e047de5b6","index_content":"Łącznie 200 tak pierwszy raz robię. Jak robić to robić  raz się żyje i dlatego pytam czy ktoś może coś powiedzieć.  Ze swojej praktyki. Czy forte zaprawiac czy coś innego. Jakie. Opryski zaraz po posadzeniu ...A nie głupie teksty ze tyle robisz a nie wiesz jak czy ktoś od razu z was wszystko wiedział jeżeli tak to po co to forum po to żeby głupio dosrywac innym","index_item_index_id":"forums_topic_post-213758","index_item_id":11619,"index_item_author":5451,"index_object_id":1509514,"index_participants":"[5451,19455,4564,14802,10119,20623,7068,18756,619,5521,5131,25138,18755,26122,13925,32680,32380,29397,24526,31520,20652,34622,32336,3900,37397,37470,10840,26512,2518,27475,20646,979,13640,11076,31674,43840,27895,56413,38641,505,13669,18336,36837,5790,381,5038,60682,21821,64609,34075,29952,46886,34309,5638,38982,8845,30205,38251,21715,52634,22138,73592,39512,59882,98572,80103,136554,129416,123495,64740,38291,47474,66684,501,23659,15673,143857,48363,32916,5169,116635,72290,23673,4951,136145,61978,44178,148190,67766,16890,148260,4870,77660,38742,57201,7923,44189,56747,129947,48863,146828,141726,25459,56801,61349,10470,40989,25319,12937,100704,44708,137538,168417,33155,21115,18806,0,206053,11830,9026,188978,209470,47917,31055,12132,216648,41629,38172,49348,154151,45675,37494,219324,30776,212853,149991,65808,35512,8008,175390,26235,5620,28376,221533,145954,45930,57340,218317,171650,151184,133845,57943,195018,150131,21376,201733,53216,13119,23902,227916,227970,223057,220193,228909,58212,153886,227944,34021]","index_reviews":null,"index_prefix":null,"index_container_id":43}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [index_participants] of type [long] in document with id 'forums_topic_post-1509514'
        at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:303) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:488) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:616) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:410) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:384) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:96) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:69) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:281) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:799) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:775) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:744) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.lambda$executeIndexRequestOnPrimary$3(TransportShardBulkAction.java:454) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeOnPrimaryWhileHandlingMappingUpdates(TransportShardBulkAction.java:477) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:452) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:216) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:159) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:151) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:139) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:79) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1050) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1028) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:105) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:424) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:370) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:273) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:240) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2561) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:987) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:369) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:324) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:311) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250) [x-pack-security-6.7.0.jar:6.7.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:308) [x-pack-security-6.7.0.jar:6.7.0]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:686) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.0.jar:6.7.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.lang.IllegalArgumentException: For input string: "[5451,19455,4564,14802,10119,20623,7068,18756,619,5521,5131,25138,18755,26122,13925,32680,32380,29397,24526,31520,20652,34622,32336,3900,37397,37470,10840,26512,2518,27475,20646,979,13640,11076,31674,43840,27895,56413,38641,505,13669,18336,36837,5790,381,5038,60682,21821,64609,34075,29952,46886,34309,5638,38982,8845,30205,38251,21715,52634,22138,73592,39512,59882,98572,80103,136554,129416,123495,64740,38291,47474,66684,501,23659,15673,143857,48363,32916,5169,116635,72290,23673,4951,136145,61978,44178,148190,67766,16890,148260,4870,77660,38742,57201,7923,44189,56747,129947,48863,146828,141726,25459,56801,61349,10470,40989,25319,12937,100704,44708,137538,168417,33155,21115,18806,0,206053,11830,9026,188978,209470,47917,31055,12132,216648,41629,38172,49348,154151,45675,37494,219324,30776,212853,149991,65808,35512,8008,175390,26235,5620,28376,221533,145954,45930,57340,218317,171650,151184,133845,57943,195018,150131,21376,201733,53216,13119,23902,227916,227970,223057,220193,228909,58212,153886,227944,34021]"
        at org.elasticsearch.common.xcontent.support.AbstractXContentParser.toLong(AbstractXContentParser.java:199) ~[elasticsearch-x-content-6.7.0.jar:6.7.0]
        at org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:220) ~[elasticsearch-x-content-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$7.parse(NumberFieldMapper.java:714) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$7.parse(NumberFieldMapper.java:685) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:1050) ~[elasticsearch-6.7.0.jar:6.7.0]
        at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297) ~[elasticsearch-6.7.0.jar:6.7.0]
        ... 42 more

Problem is with

Quote

org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [index_participants] of type [long] in document with id 'forums_topic_post-1509514'

And index_participants is type long. long is the same like bigint in mysql, but You add array of int. Change type to keyword

Link to comment
Share on other sites

On 3/31/2019 at 9:44 AM, DSyste said:

Good ideas! I have not yet migrated to Elasticsearch because I did not see improvements in search results.

In my community we do lots of searches with parts of words. In fact, we search for electronic circuit codes and most of the time we only find the electronic circuit by typing only the important part of the circuit board model. LIK E%search%

I asked, I complained and I asked for God's sake. But the IPS gurus did not hear me. I wanted adminCP to have options for customizing searches. The forum administrator would be able to customize according to the needs of your community.

For now I had to solve the problem by manually manipulating IPS core scripts. But it would be great if that native resource in IPS

I also did not see a benefit of using elastic search with results but looking over this topic, maybe @Makoto ideas could make it better.

Link to comment
Share on other sites

  • 2 weeks later...

I've got a simple application written up that patches in a few of the changes mentioned above, but I need to make sure it's okay to post with Daniel due to the nature of the application (it completely replaces the existing search method and re-uses a lot of existing IPS code) before submitting it to the marketplace.

hwDR17B.png

Link to comment
Share on other sites

Just a note here that elasticsearch 7 (and likely 6.7 or higher) is not compatible with Invision Community at the moment. Keep your installation at a version less than 6.7. If you have other people doing your server maintenance make sure they don't upgrade ES as a part of routine maintenance. 

On the plus side recovery is simple. Stop service, uninstall current ES, go to var/lib/elasticsearch and remove the nodes directory (this is to clear the current indexes - also assumes you do NOT have other things being indexed with ES - if all you are using ES for is IPS search indexing, go ahead and remove this directory), install a version of ES lower than 6.7, confirm working via systemctl and the curl hit (see ES docs), open up your ACP, Content Discovery:Search, rebuild indexes. All better.

Link to comment
Share on other sites

3 hours ago, All Astronauts said:

Just a note here that elasticsearch 7 (and likely 6.7 or higher) is not compatible with Invision Community at the moment. Keep your installation at a version less than 6.7. If you have other people doing your server maintenance make sure they don't upgrade ES as a part of routine maintenance. 

I don't know about ES 7, but im running version 6.7.1 without any visible issues.

Link to comment
Share on other sites

6.7 is a demarcation point (see ES upgrade pages), that's why I threw that in there (better safe etc...) though it may just be because its the end of the 6 line  7 definately has breaking changes. At least for one thing, ES is ingesting from IPS an index entry given with two simultaneous types [_doc, content] when it only wants one and 7 is choking on that alone. 

As for 6.7, check your var/log/elasticsearch directory for the es log and take a peek. You'll probably see some indexing errors but they are effectively meaningless on the front end, at least as far as searching goes (at least the ones I saw didn't prevent the indexing of the entry at least - they were there none the less)

I did test on 7 setting a new index in the acp and letting it rip but aside from seeing the new index being created in the ES logs, the reindex did nothing. Maybe the other indexes still lying around in the nodes dir were causing problems but I doubt it.

More fun is 7 was released a few days ago and 8 alpha is already being pushed. These ES guys are moving way faster than I thought and there are definitely backend config changes alone that need to be accounted for on upgrading to 7 (and 8). 5 to 6 was nothing. Ahh the joys of production env...

I tossed a support request in so they might consider warning folks before someone upgrades inadvertently. As far as I can see, aside from a 5.6 min version constant in the code itself, there is no min or max version warning anywhere that I can find here on this site.

PS: Kibana 7 is nice

Link to comment
Share on other sites

On 4/22/2019 at 1:18 AM, All Astronauts said:

As for 6.7, check your var/log/elasticsearch directory for the es log and take a peek. You'll probably see some indexing errors but they are effectively meaningless on the front end, at least as far as searching goes (at least the ones I saw didn't prevent the indexing of the entry at least - they were there none the less)

Yes, you are right, i see the index errors in elastisearch logs. I opened a ticket with IPS and they told me its addressed for 4.4.3.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...