Friday, March 22, 2019

Elastic Stack Multiple Index Search Examples

I am working with Elastic Stack (elasticsearch, logstash, and kibana) for a report where data needed to be joined with two indexes, where the level_X_id to level_Y_id could be matched upon.  The level_X_id to level_Y_id attributes exist in both indexes.  NOTE: You can have multiple documents in each index that have level_X_id to level_Y_id, not just one document that matches with an exact match.

Environment:
Elasticsearch 5.0.2
Logstash 5.0.2
Kibana 5.0.2

Here are multiple ways to do that:
1.  Use multiple indexes in your _search API.  This returns all documents where field name(s) match in both indexes.

GET /cherryshoe_primary_idx,cherryshoe_secondary_idx/_search
{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "analyze_wildcard": false,
          "query": "level_1_id:7268 AND level_2_id:7292"
        }
      }
    }
  }
}

2. Using Terms Query to specify primary index to match on, only returning the secondary index’s records.  This returns all documents where field name(s) match in both indexes.  The Elasticsearch API for Terms Query was not very clear, it took a while for me to get the query to work so I will explain it in detail below:
  • Retrieves cherryshoe_secondary_idx documents, where both cherryshoe_primary_idx and cherryshoe_secondary_idx documents have matching "level_1_id" value of "3629".  
  • query.terms.level_1_id json attribute refers to the cherryshoe_secondary_idx index attribute.
  • query.terms.level_1_id.path value of level_1_id refers to the query.terms.level_1_id.index document json structure "_source.level_1_id".  You can see this in Kibana -> Discover -> cherryshoe_primary_idx.  Expand one of the results -> and instead of the "Table" view look at the "JSON" view.  You'll notice the "_source" JSON object holds all the index attributes.
  • query.terms.level_1_id.type json attribute refers to the document json structure "_type".  You can see this in Kibana -> Discover -> cherryshoe_primary_idx.  Expand one of the results -> and instead of the "Table" view look at the "JSON" view.  You'll notice the "_type" JSON attribute has value "logs".

Single "terms":
GET cherryshoe_secondary_idx/_search
{
    "query" : {
        "terms" : {
            "level_1_id" : {
                "index" : "cherryshoe_primary_idx",
                "type" : "logs",
                "id" : 3629,
                "path" : "level_1_id"
            }
        }
    }
}

I thought I could immediately put multiple Terms in the query, to add additional attributes, but you can't have multiple Terms be defined and return the results you expect.  For example, the below runs with valid syntax, but doesn't return any data.  I haven't been able to find documentation to say that you cannot have multiple Terms in a query work.  Interesting because you can also have one Term.

Multiple "terms":
GET cherryshoe_secondary_idx/_search
{
    "query" : {
   "bool": {
    "must": 
    [{
     "terms" : {
      "level_1_id" : {
       "index" : "cherryshoe_primary_idx",
       "type" : "logs",
       "id" : 3629,
       "path" : "level_1_id"
      }
     }
    },
    {
     "terms" : {
      "level_2_id" : {
       "index" : "cherryshoe_primary_idx",
       "type" : "logs",
       "id" : 3719,
       "path" : "level_2_id"
      }
     }
    }]
   }
  }
}

3. Using the multi-search template, which allows you to execute several search template requests within the same API.  It returns records from either index depending on the query criteria that you want from each respective index.  NOTE:  each "index" and "query" json should not span multiple lines.

POST /_msearch
{"index": "cherryshoe_primary_idx" }
{"query":{"bool":{"must":{"query_string":{"analyze_wildcard":false,"query":"level_1_id:3629 AND tier_2_fa_id:level_2_id"}}}}}
{"index": "cherryshoe_secondary_idx" }
{"query":{"prefix":{"level_id":"_3629_3719_"}}}



I've now found at least one solution that works where you query the primary index to find valid level_X_id permutations for the use case, append all the possible level_X_id attribute(s) into one field which is stored in secondary index, let's call it "level_id", then you can do a substring "prefix" match to find all possible records.

You have to surround all level_X_id's with a special character (let's use underscore) to be able to correctly match on it level_1_id down to level_X_id (in other words you have to start from the beginning, you can't have a substring match that starts in the middle of the "level_id" string).

If you have a level_id of _1_2_3_, and you want to match on _1_2_, then _1_2_3 and _1_2_3_20 and _1_2_3_98 and _1_2_3_99_99 should all be returned.

The solution is to use a Prefix Query, where you can do a substring match starting with the beginning of the word.  NOTE:  This only works if you have an index attribute data type of "string".

GET /cherryshoe_secondary_idx/_search
{
  "query": {
      "prefix" : { 
          "level_id": "_3629_3719_" 
      }
  }
}

Another solution is to use the multi-search template where you can return all documents needed with each index having it's own respective query.

Other helpful links
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/multi-index.html
https://stackoverflow.com/questions/27694934/elasticsearch-combining-query-string-and-bool-query-in-filter
https://stackoverflow.com/questions/6467067/how-to-search-for-a-part-of-a-word-with-elasticsearch
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/search-search.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/mapping-index-field.html
http://teknosrc.com/execute-multiple-search-query-elasticsearch/

No comments:

Post a Comment

I appreciate your time in leaving a comment!