We are working on updating this book for the latest version. Some content might be out of date.
The uppermost level of a mapping is known as the root object. It may contain the following:
- A properties section, which lists the mapping for each field that a document may contain
-
Various metadata fields, all of which start with an underscore, such
as
_type,_id, and_source -
Settings, which control how the dynamic detection of new fields
is handled, such as
analyzer,dynamic_date_formats, anddynamic_templates -
Other settings, which can be applied both to the root object and to fields
of type
object, such asenabled,dynamic, andinclude_in_all
We have already discussed the three most important settings for document fields or properties in Core Simple Field Types and Complex Core Field Types:
-
type -
The datatype that the field contains, such as
stringordate -
index -
Whether a field should be searchable as full text (
analyzed), searchable as an exact value (not_analyzed), or not searchable at all (no) -
analyzer -
Which
analyzerto use for a full-text field, both at index time and at search time
We will discuss other field types such as ip, geo_point, and geo_shape in
the appropriate sections later in the book.
By default, Elasticsearch
stores the JSON string representing the
document body in the _source field. Like all stored fields, the _source
field is compressed before being written to disk.
This is almost always desired functionality because it means the following:
- The full document is available directly from the search results—no need for a separate round-trip to fetch the document from another data store.
-
Partial
updaterequests will not function without the_sourcefield. - When your mapping changes and you need to reindex your data, you can do so directly from Elasticsearch instead of having to retrieve all of your documents from another (usually slower) data store.
-
Individual fields can be extracted from the
_sourcefield and returned ingetorsearchrequests when you don’t need to see the whole document. - It is easier to debug queries, because you can see exactly what each document contains, rather than having to guess their contents from a list of IDs.
That said, storing the _source field does use disk space. If none of the
preceding reasons is important to you, you can disable the _source field with
the following mapping:
PUT /my_index
{
"mappings": {
"my_type": {
"_source": {
"enabled": false
}
}
}
}In a search request, you can ask for only certain fields by specifying the
_source parameter in the request body:
GET /_search
{
"query": { "match_all": {}},
"_source": [ "title", "created" ]
}Values for these fields will be extracted from the _source field and
returned instead of the full _source.
In Search Lite, we introduced the _all field: a special field that
indexes the
values from all other fields as one big string. The query_string
query clause (and searches performed as ?q=john) defaults to searching in
the _all field if no other field is specified.
The _all field is useful during the exploratory phase of a new application,
while you are still unsure about the final structure that your documents will
have. You can throw any query string at it and you have a good chance of
finding the document you’re after:
GET /_search
{
"match": {
"_all": "john smith marketing"
}
}As your application evolves and your search requirements become more exacting,
you will find yourself using the _all field less and less. The _all field
is a shotgun approach to search. By querying individual fields, you have more
flexbility, power, and fine-grained control over which results are considered
to be most relevant.
One of the important factors taken into account by the
relevance algorithm
is the length of the field: the shorter the field, the more important. A term
that appears in a short title field is likely to be more important than the
same term that appears somewhere in a long content field. This distinction
between field lengths disappears in the _all field.
If you decide that you no longer need the _all field, you can disable it
with this mapping:
PUT /my_index/_mapping/my_type
{
"my_type": {
"_all": { "enabled": false }
}
}Inclusion in the _all field can be controlled on a field-by-field basis
by using the include_in_all setting, which defaults to true. Setting
include_in_all on an object (or on the root object) changes the
default for all fields within that object.
You may find that you want to keep the _all field around to use
as a catchall full-text field just for specific fields, such as
title, overview, summary, and tags. Instead of disabling the _all
field completely, disable include_in_all for all fields by default,
and enable it only on the fields you choose:
PUT /my_index/my_type/_mapping
{
"my_type": {
"include_in_all": false,
"properties": {
"title": {
"type": "string",
"include_in_all": true
},
...
}
}
}Remember that the _all field is just
an analyzed string field. It
uses the default analyzer to analyze its values, regardless of which
analyzer has been set on the fields where the values originate. And
like any string field, you can configure which analyzer the _all
field should use:
PUT /my_index/my_type/_mapping
{
"my_type": {
"_all": { "analyzer": "whitespace" }
}
}There are four metadata fields associated with document identity:
-
_id - The string ID of the document
-
_type - The type name of the document
-
_index - The index where the document lives
-
_uid -
The
_typeand_idconcatenated together astype#id
By default, the _uid field is stored (can be retrieved) and
indexed (searchable). The _type field is indexed but not stored,
and the _id and _index fields are neither indexed nor stored, meaning
they don’t really exist.
In spite of this, you can query the _id field as though it were a real
field. Elasticsearch uses the _uid field to derive the _id. Although you
can change the index and store settings for these fields, you almost
never need to do so.