We are working on updating this book for the latest version. Some content might be out of date.
In the same way that the match query is the go-to query for standard
full-text search, the match_phrase query
is the one you should reach for
when you want to find words that are near each other:
GET /my_index/my_type/_search
{
"query": {
"match_phrase": {
"title": "quick brown fox"
}
}
}Like the match query, the match_phrase query first analyzes the query
string to produce a list of terms. It then searches for all the terms, but
keeps only documents that contain all of the search terms, in the same
positions relative to each other. A query for the phrase quick fox
would not match any of our documents, because no document contains the word
quick immediately followed by fox.
The match_phrase query can also be written as a match query with type
phrase:
"match": {
"title": {
"query": "quick brown fox",
"type": "phrase"
}
}When a string is analyzed, the analyzer returns not only a list of terms, but also the position, or order, of each term in the original string:
GET /_analyze?analyzer=standard Quick brown fox
This returns the following:
{
"tokens": [
{
"token": "quick",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "brown",
"start_offset": 6,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "fox",
"start_offset": 12,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 3
}
]
}Positions can be stored in the inverted index, and position-aware queries like
the match_phrase query can use them to match only documents that contain
all the words in exactly the order specified, with no words in-between.
For a document to be considered a match for the phrase “quick brown fox,” the following must be true:
-
quick,brown, andfoxmust all appear in the field. -
The position of
brownmust be1greater than the position ofquick. -
The position of
foxmust be2greater than the position ofquick.
If any of these conditions is not met, the document is not considered a match.
Internally, the match_phrase query uses the low-level span query family to
do position-aware matching.
Span queries are term-level queries, so they have
no analysis phase; they search for the exact term specified.
Thankfully, most people never need to use the span queries directly, as the
match_phrase query is usually good enough. However, certain specialized
fields, like patent searches, use these low-level queries to perform very
specific, carefully constructed positional searches.