We are working on updating this book for the latest version. Some content might be out of date.
The built-in language analyzers are available globally and don’t need to be configured before being used. They can be specified directly in the field mapping:
PUT /my_index { "mappings": { "blog": { "properties": { "title": { "type": "string", "analyzer": "english"} } } } }
Of course, by passing
text through the english
analyzer, we lose
information:
We can’t tell if the document mentions one fox
or many foxes
; the word
not
is a stopword and is removed, so we can’t tell whether the document is
happy about foxes or not. By using the english
analyzer, we have increased
recall as we can match more loosely, but we have reduced our ability to rank
documents accurately.
To get the best of both worlds, we can use multifields to
index the title
field twice: once
with the english
analyzer and once with
the standard
analyzer:
PUT /my_index { "mappings": { "blog": { "properties": { "title": {"type": "string", "fields": { "english": {
"type": "string", "analyzer": "english" } } } } } } }
The main | |
The |
With this mapping in place, we can index some test documents to demonstrate how to use both fields at query time:
PUT /my_index/blog/1 { "title": "I'm happy for this fox" } PUT /my_index/blog/2 { "title": "I'm not happy about my fox problem" } GET /_search { "query": { "multi_match": { "type": "most_fields","query": "not happy foxes", "fields": [ "title", "title.english" ] } } }
Use the |
Even though neither of our documents contain the word foxes
, both documents
are returned as results thanks to the word stemming on the title.english
field. The second document is ranked as more relevant, because the word not
matches on the title
field.