We are working on updating this book for the latest version. Some content might be out of date.
The match query is the go-to query—the first query that you should
reach for whenever you need to query any field.
It is a high-level full-text
query, meaning that it knows how to deal with both full-text fields and exact-value fields.
That said, the main use case for the match query is for full-text search. So
let’s take a look at how full-text search works with a simple example.
First, we’ll create a new index and index some
documents using the
bulk API:
DELETE /my_indexPUT /my_index { "settings": { "number_of_shards": 1 }}
POST /my_index/my_type/_bulk { "index": { "_id": 1 }} { "title": "The quick brown fox" } { "index": { "_id": 2 }} { "title": "The quick brown fox jumps over the lazy dog" } { "index": { "_id": 3 }} { "title": "The quick brown fox jumps over the quick dog" } { "index": { "_id": 4 }} { "title": "Brown fox brown dog" }
Delete the index in case it already exists. | |
Later, in Relevance Is Broken!, we explain why we created this index with only one primary shard. |
Our first example explains what
happens when we use the match query to
search within a full-text field for a single word:
GET /my_index/my_type/_search
{
"query": {
"match": {
"title": "QUICK!"
}
}
}Elasticsearch executes the preceding match query
as follows:
Check the field type.
The
titlefield is a full-text (analyzed)stringfield, which means that the query string should be analyzed too.Analyze the query string.
The query string
QUICK!is passed through the standard analyzer, which results in the single termquick. Because we have just a single term, thematchquery can be executed as a single low-leveltermquery.Find matching docs.
The
termquery looks upquickin the inverted index and retrieves the list of documents that contain that term—in this case, documents 1, 2, and 3.Score each doc.
The
termquery calculates the relevance_scorefor each matching document, by combining the term frequency (how oftenquickappears in thetitlefield of each document), with the inverse document frequency (how oftenquickappears in thetitlefield in all documents in the index), and the length of each field (shorter fields are considered more relevant). See What Is Relevance?.
This process gives us the following (abbreviated) results:
"hits": [
{
"_id": "1",
"_score": 0.5,
"_source": {
"title": "The quick brown fox"
}
},
{
"_id": "3",
"_score": 0.44194174,
"_source": {
"title": "The quick brown fox jumps over the quick dog"
}
},
{
"_id": "2",
"_score": 0.3125,
"_source": {
"title": "The quick brown fox jumps over the lazy dog"
}
}
]