Environment:
Elasticsearch 5.0.2
Windows 10
There were two issues that came up in recent months:
1. Looking at all field mappings for a particular index you can see that fields with type "text" has a max value of 256, defined by "ignore_above": 256. This is the default setting of "text" fields. Performing the following GET to retrieve index field mappings -
- curl http://localhost:9200/{index_name}
- i.e. curl http://localhost:9200/cherryshoe_idx
{
"cherryshoe_idx": {
"aliases": {},
"mappings": {
"logs": {
"properties": {
"@timestamp": {
"type": "date"
},
"@version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"text_data_that_can_be_very_long": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"unique_id": {
"type": "long"
}
}
}
},
"settings": {
"index": {
"creation_date": "1546610232085",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "cC1mdfLfSi68sZe6r-QNLA",
"version": {
"created": "5000299"
},
"provided_name": "cherryshoe_idx"
}
}
}
}
PROBLEM and SOLUTION:
One of the filters was using the "text_data_that_can_be_very_long" field to filter on; sometimes the value was being cut off because of the length restriction. Because of this, an additional field was added for the "id" value of the filter (text_data_that_can_be_very_long_id), the query was updated to use the "id" field of this value to filter instead, and the "ignore_above": 256 restriction was removed for "text_data_that_can_be_very_long" for data display purposes.
Updated field mapping json snippet:
"text_data_that_can_be_very_long": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"text_data_that_can_be_very_long_id": {
"type": "long"
}
2. As I mentioned above, the report can specify multiple filters, one of them the state filter -
PROBLEM and SOLUTION:
If a user only chooses "Virginia" it returned both "Virginia" and "West Virginia" records (it should only return "Virginia"). The Lucene query portion constructed used the field "state_name:Virginia", but this retrieves documents that have "Virginia" and "West Virginia" for the state_name attribute.
This is because the way Elasticsearch tokenized the state_name with both "West" and "Virginia" tokens, so "West Virginia" documents were incorrectly retrieved along with "Virginia" records.
The fix was to add the .keyword after the state_name attribute (i.e. "state_name.keyword:Virginia". This comes by default (meaning it didn't have to be specially defined) where state_name.keyword tokenizes by the entire value (in other words exact match).
No comments:
Post a Comment
I appreciate your time in leaving a comment!