Skip to content
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion docs/reference/elasticsearch/mapping-reference/keyword.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,21 @@ The following parameters are accepted by `keyword` fields:
: Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations.

[`ignore_above`](/reference/elasticsearch/mapping-reference/ignore-above.md)
: Do not index any string longer than this value. Defaults to `2147483647` in standard indices so that all values would be accepted, and `8191` in logsdb indices to protect against Lucene's term byte-length limit of `32766`. Please however note that default dynamic mapping rules create a sub `keyword` field that overrides this default by setting `ignore_above: 256`.
: Do not index any field containing a string with more characters than this value. This is important because {{es}}
will reject entire documents if they contain keyword fields that exceed `32766` bytes when UTF-8 encoded.

To avoid any risk of document rejection, set this value to `8191` or less. Fields with strings exceeding this
length will be excluded from indexing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on text fields? Or only keyword fields?

Also further down you say:

`logsdb` indices: `8191`. `keyword` fields longer than `8191` characters won't be indexed, but the documents are
      accepted and the values unindexed values are available from `_source.

Does the previous statement only apply to logsdb indices? Or to standard indices as well? If both, that feels important.

What about this:

Skip indexing of a keyword value whose UTF-8–encoded size is larger than ignore_above. The value is still kept in _source, but the field won’t be searchable or aggregatable.

If you do not set ignore_above, {es} will reject entire documents if they contain one or more keyword fields exceeding a UTF-8–encoded size of 32766.

To avoid any risk of document rejection, set this value to 8191 or less.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on text fields? Or only keyword fields?

This setting is only available on keyword fields. But on text fields some tokenizers can have a max_token_length setting which doesn't ignore but instead splits tokens that exceed this length (so quite a bit different)

What about this:

I think it might be a bit clearer to specify characters/bytes, like "UTF-8–encoded size of 32766 bytes." and "set this value to 8191 characters or less."


The defaults are complicated:
* Standard indices: `2147483647` (effectively unbounded). Documents containing `keyword` fields longer than `32766`
bytes will be rejected.
* `logsdb` indices: `8191`. `keyword` fields longer than `8191` characters won't be indexed, but the documents are
accepted and the values unindexed values are available from `_source.
Copy link
Member

@bmorelli25 bmorelli25 Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a table for this information?

The defaults are complicated:

Index type Default Effect
Standard indices 2147483647 (effectively unbounded) Documents will be rejected if any keyword exceeds 32766 bytes.
logsdb indices 8191 Documents are never rejected. Keywords exceding this limit are still kept in _source, but won’t be searchable or aggregatable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see this is in definition list already, so maybe a table won't work. But if you like my wording you can update accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me like that wording :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "Documents are never rejected" might be a bit too strongly worded; maybe something like:

Documents won't be rejected if a keyword field exceeds this limit and the field will still be kept in _source, but it won’t be searchable or aggregatable.

* The [dynamic mapping](docs-content://manage-data/data-store/mapping/dynamic-mapping.md) for string fields
defaults to a `text` field with a [sub](/reference/elasticsearch/mapping-reference/multi-fields.md)-`keyword`
field with an `ignore_above` of `256`. String fields longer than 256 characters are available for full text
search but won't have a value in their `.keyword` sub-field they can not do exact matching over _search.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part I struggle to understand. But it feels separate from the defaults above? Maybe this can be in a new paragraph. I think you're saying that...

When ES finds a new string field without an explicit mapping, it automatically:

  1. Maps the field to a text field so the entire value is searchable with full-text search.
  2. Adds a sub keyword field with ignore_above set to 256 bytes. This means that values less than 256 bytes are available for exact matching over _search. Values longer than that are still searchable via the text field, but are not indexed as keywords.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree I am a bit confused by the very last sentence in this paragraph.
@bmorelli25 I like your suggested rewrite, but I believe it should be "256 characters" not bytes


[`index`](/reference/elasticsearch/mapping-reference/mapping-index.md)
: Should the field be quickly searchable? Accepts `true` (default) and `false`. `keyword` fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md) enabled can still be queried, albeit slower.
Expand Down
Loading