diff --git a/docs/redirects.yml b/docs/redirects.yml index 5d0c161511b8a..860d4ac06ed97 100644 --- a/docs/redirects.yml +++ b/docs/redirects.yml @@ -105,3 +105,10 @@ redirects: 'reference/query-languages/esql/kibana/docs/functions/st_geohash_to_long.md': 'reference/query-languages/esql/esql-functions-operators.md' 'reference/query-languages/esql/kibana/docs/functions/st_geotile_to_long.md': 'reference/query-languages/esql/esql-functions-operators.md' 'reference/query-languages/esql/kibana/docs/functions/st_geohex_to_long.md': 'reference/query-languages/esql/esql-functions-operators.md' + + 'reference/elasticsearch/mapping-reference/text.md': + to: 'reference/elasticsearch/mapping-reference/text.md' + anchors: { } # pass-through unlisted anchors in the `many` ruleset + many: + - to: 'reference/elasticsearch/mapping-reference/match-only-text.md' + anchors: { 'match-only-text-field-type', 'match-only-text-params' } diff --git a/docs/reference/elasticsearch/mapping-reference/field-data-types.md b/docs/reference/elasticsearch/mapping-reference/field-data-types.md index 0265546eedbec..a941af41fcc0a 100644 --- a/docs/reference/elasticsearch/mapping-reference/field-data-types.md +++ b/docs/reference/elasticsearch/mapping-reference/field-data-types.md @@ -77,8 +77,8 @@ Dates ### Text search types [text-search-types] -[`text` fields](/reference/elasticsearch/mapping-reference/text.md) -: The text family, including `text` and `match_only_text`. Analyzed, unstructured text. +[`text` fields](/reference/elasticsearch/mapping-reference/text-type-family.md) +: The text family, including `text`, `match_only_text`, and `pattern_text`. Analyzed, unstructured text. [`annotated-text`](/reference/elasticsearch-plugins/mapper-annotated-text.md) : Text containing special markup. Used for identifying named entities. diff --git a/docs/reference/elasticsearch/mapping-reference/match-only-text.md b/docs/reference/elasticsearch/mapping-reference/match-only-text.md new file mode 100644 index 0000000000000..8db554d046488 --- /dev/null +++ b/docs/reference/elasticsearch/mapping-reference/match-only-text.md @@ -0,0 +1,43 @@ +--- +navigation_title: "Match Only Text" +mapped_pages: + - https://www.elastic.co/guide/en/elasticsearch/reference/current/match-only-text.html +--- + +# Match-only text field type [match-only-text-field-type] + +A variant of [`text`](/reference/elasticsearch/mapping-reference/text.md) that trades scoring and efficiency of positional queries for space efficiency. This field effectively stores data the same way as a `text` field that only indexes documents (`index_options: docs`) and disables norms (`norms: false`). Term queries perform as fast if not faster as on `text` fields, however queries that need positions such as the [`match_phrase` query](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) perform slower as they need to look at the `_source` document to verify whether a phrase matches. All queries return constant scores that are equal to 1.0. + +Analysis is not configurable: text is always analyzed with the [default analyzer](docs-content://manage-data/data-store/text-analysis/specify-an-analyzer.md#specify-index-time-default-analyzer) ([`standard`](/reference/text-analysis/analysis-standard-analyzer.md) by default). + +[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](/reference/elasticsearch/mapping-reference/text.md) field type if you absolutely need span queries. + +Other than that, `match_only_text` supports the same queries as `text`. And like `text`, it does not support sorting and has only limited support for aggregations. + +```console +PUT logs +{ + "mappings": { + "properties": { + "@timestamp": { + "type": "date" + }, + "message": { + "type": "match_only_text" + } + } + } +} +``` + + +## Parameters for match-only text fields [match-only-text-params] + +The following mapping parameters are accepted: + +[`fields`](/reference/elasticsearch/mapping-reference/multi-fields.md) +: Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations, or the same string value analyzed by different analyzers. + +[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md) +: Metadata about the field. + diff --git a/docs/reference/elasticsearch/mapping-reference/pattern-text.md b/docs/reference/elasticsearch/mapping-reference/pattern-text.md new file mode 100644 index 0000000000000..3bf04523cee66 --- /dev/null +++ b/docs/reference/elasticsearch/mapping-reference/pattern-text.md @@ -0,0 +1,85 @@ +--- +navigation_title: "Pattern Text" +mapped_pages: + - https://www.elastic.co/guide/en/elasticsearch/reference/current/pattern-text.html +--- + +# Pattern text field type [pattern-text-field-type] +```{applies_to} +serverless: preview +stack: preview 9.2 +``` +:::{note} +This feature requires a [subscription](https://www.elastic.co/subscriptions). +::: + +The `pattern_text` field type is a variant of [`text`](/reference/elasticsearch/mapping-reference/text.md) with improved space efficiency for log data. +Internally, it decomposes values into static parts that are likely to be shared among many values, and dynamic parts that tend to vary. +The static parts usually come from the explanatory text of a log message, while the dynamic parts are the variables that were interpolated into the logs. +This decomposition allows for improved compression on log-like data. + +We call the static portion of the value the `template`. +Although the template cannot be accessed directly, a separate field called `.template_id` is accessible. +This field is a hash of the template and can be used to group similar values. + +Analysis is configurable but defaults to a delimiter-based analyzer. +This analyzer applies a lowercase filter and then splits on whitespace and the following delimiters: `=`, `?`, `:`, `[`, `]`, `{`, `}`, `"`, `\`, `'`. + +## Limitations + +Unlike most mapping types, `pattern_text` does not support multiple values for a given field per document. +If a document is created with multiple values for a pattern_text field, an error will be returned. + +[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](/reference/elasticsearch/mapping-reference/text.md) field type if you absolutely need span queries. + +Like `text`, `pattern_text` does not support sorting and has only limited support for aggregations. + +## Phrase matching +Pattern text supports an `index_options` parameter with valid values of `docs` and `positions`. +The default value is `docs`, which makes `pattern_text` behave similarly to `match_only_text` for phrase queries. +Specifically, positions are not stored, which reduces the index size at the cost of slowing down phrase queries. +If `index_options` is set to `positions`, positions are stored and `pattern_text` will support fast phrase queries. +In both cases, all queries return a constant score of 1.0. + +## Index sorting for improved compression +The compression provided by `pattern_text` can be significantly improved if the index is sorted by the `template_id` field. +For example, a typical approach would be to sort first by `message.template_id`, then by `@timestamp`, as shown in the following example. + +```console +PUT logs +{ + "settings": { + "index": { + "sort.field": [ "message.template_id", "@timestamp" ], + "sort.order": [ "asc", "desc" ] + } + }, + "mappings": { + "properties": { + "@timestamp": { + "type": "date" + }, + "message": { + "type": "pattern_text" + } + } + } +} +``` + + +## Parameters for pattern text fields [pattern-text-params] + +The following mapping parameters are accepted: + +[`analyzer`](/reference/elasticsearch/mapping-reference/analyzer.md) +: The [analyzer](docs-content://manage-data/data-store/text-analysis.md) which should be used for the `pattern_text` field, both at index-time and at search-time (unless overridden by the [`search_analyzer`](/reference/elasticsearch/mapping-reference/search-analyzer.md)). +Supports a delimiter-based analyzer and the standard analyzer, as is used in `match_only_text` mappings. +Defaults to the delimiter-based analyzer, which applies a lowercase filter and then splits on whitespace and the following delimiters: `=`, `?`, `:`, `[`, `]`, `{`, `}`, `"`, `\`, `'`. + +[`index_options`](/reference/elasticsearch/mapping-reference/index-options.md) +: What information should be stored in the index, for search and highlighting purposes. Valid values are `docs` and `positions`. Defaults to `docs`. + +[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md) +: Metadata about the field. + diff --git a/docs/reference/elasticsearch/mapping-reference/text-type-family.md b/docs/reference/elasticsearch/mapping-reference/text-type-family.md new file mode 100644 index 0000000000000..d6a6d85553ff5 --- /dev/null +++ b/docs/reference/elasticsearch/mapping-reference/text-type-family.md @@ -0,0 +1,15 @@ +--- +navigation_title: "Text type family" +mapped_pages: + - https://www.elastic.co/guide/en/elasticsearch/reference/current/text-type-family.html +--- + +# Text type family [text] + + +The text family includes the following field types: + +* [`text`](/reference/elasticsearch/mapping-reference/text.md), the traditional field type for full-text content such as the body of an email or the description of a product. +* [`match_only_text`](/reference/elasticsearch/mapping-reference/match-only-text.md), a space-optimized variant of `text` that disables scoring and performs slower on queries that need positions. It is best suited for indexing log messages. +* [`pattern_text`](/reference/elasticsearch/mapping-reference/pattern-text.md), a variant of `text` which is optimized for log messages which contain sequences that are shared between many messages. By compressing these shared sequences, `pattern_text` provides improved space efficiency relative to `match_only_text`. + diff --git a/docs/reference/elasticsearch/mapping-reference/text.md b/docs/reference/elasticsearch/mapping-reference/text.md index 612ce067cd4b3..962e4a15e8482 100644 --- a/docs/reference/elasticsearch/mapping-reference/text.md +++ b/docs/reference/elasticsearch/mapping-reference/text.md @@ -4,16 +4,7 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html --- -# Text type family [text] - - -The text family includes the following field types: - -* [`text`](#text-field-type), the traditional field type for full-text content such as the body of an email or the description of a product. -* [`match_only_text`](#match-only-text-field-type), a space-optimized variant of `text` that disables scoring and performs slower on queries that need positions. It is best suited for indexing log messages. - - -## Text field type [text-field-type] +# Text field type [text-field-type] A field to index full-text values, such as the body of an email or the description of a product. These fields are `analyzed`, that is they are passed through an [analyzer](docs-content://manage-data/data-store/text-analysis.md) to convert the string into a list of individual terms before being indexed. The analysis process allows Elasticsearch to search for individual words *within* each full text field. Text fields are not used for sorting and seldom used for aggregations (although the [significant text aggregation](/reference/aggregations/search-aggregations-bucket-significanttext-aggregation.md) is a notable exception). @@ -301,43 +292,3 @@ PUT my-index-000001 } } ``` - - -## Match-only text field type [match-only-text-field-type] - -A variant of [`text`](#text-field-type) that trades scoring and efficiency of positional queries for space efficiency. This field effectively stores data the same way as a `text` field that only indexes documents (`index_options: docs`) and disables norms (`norms: false`). Term queries perform as fast if not faster as on `text` fields, however queries that need positions such as the [`match_phrase` query](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) perform slower as they need to look at the `_source` document to verify whether a phrase matches. All queries return constant scores that are equal to 1.0. - -Analysis is not configurable: text is always analyzed with the [default analyzer](docs-content://manage-data/data-store/text-analysis/specify-an-analyzer.md#specify-index-time-default-analyzer) ([`standard`](/reference/text-analysis/analysis-standard-analyzer.md) by default). - -[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](#text-field-type) field type if you absolutely need span queries. - -Other than that, `match_only_text` supports the same queries as `text`. And like `text`, it does not support sorting and has only limited support for aggregations. - -```console -PUT logs -{ - "mappings": { - "properties": { - "@timestamp": { - "type": "date" - }, - "message": { - "type": "match_only_text" - } - } - } -} -``` - - -### Parameters for match-only text fields [match-only-text-params] - -The following mapping parameters are accepted: - -[`fields`](/reference/elasticsearch/mapping-reference/multi-fields.md) -: Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations, or the same string value analyzed by different analyzers. - -[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md) -: Metadata about the field. - - diff --git a/docs/reference/elasticsearch/rest-apis/highlighting-settings.md b/docs/reference/elasticsearch/rest-apis/highlighting-settings.md index a1874cc637f9d..ff3fe35f45439 100644 --- a/docs/reference/elasticsearch/rest-apis/highlighting-settings.md +++ b/docs/reference/elasticsearch/rest-apis/highlighting-settings.md @@ -51,10 +51,10 @@ encoder : Indicates if the snippet should be HTML encoded: `default` (no encoding) or `html` (HTML-escape the snippet text and then insert the highlighting tags) fields -: Specifies the fields to retrieve highlights for. You can use wildcards to specify fields. For example, you could specify `comment_*` to get highlights for all [text](/reference/elasticsearch/mapping-reference/text.md), [match_only_text](/reference/elasticsearch/mapping-reference/text.md#match-only-text-field-type), and [keyword](/reference/elasticsearch/mapping-reference/keyword.md) fields that start with `comment_`. +: Specifies the fields to retrieve highlights for. You can use wildcards to specify fields. For example, you could specify `comment_*` to get highlights for all [text](/reference/elasticsearch/mapping-reference/text.md), [match_only_text](/reference/elasticsearch/mapping-reference/match-only-text.md), [pattern_text](/reference/elasticsearch/mapping-reference/pattern-text.md), and [keyword](/reference/elasticsearch/mapping-reference/keyword.md) fields that start with `comment_`. ::::{note} - Only text, match_only_text, and keyword fields are highlighted when you use wildcards. If you use a custom mapper and want to highlight on a field anyway, you must explicitly specify that field name. + Only text, match_only_text, pattern_text, and keyword fields are highlighted when you use wildcards. If you use a custom mapper and want to highlight on a field anyway, you must explicitly specify that field name. :::: $$$fragmenter$$$ @@ -147,4 +147,4 @@ tags_schema $$$highlighter-type$$$ type -: The highlighter to use: `unified`, `plain`, or `fvh`. Defaults to `unified`. \ No newline at end of file +: The highlighter to use: `unified`, `plain`, or `fvh`. Defaults to `unified`. diff --git a/docs/reference/elasticsearch/toc.yml b/docs/reference/elasticsearch/toc.yml index fa1e66aa2364c..748929ea120a4 100644 --- a/docs/reference/elasticsearch/toc.yml +++ b/docs/reference/elasticsearch/toc.yml @@ -174,7 +174,11 @@ toc: - file: mapping-reference/semantic-text.md - file: mapping-reference/shape.md - file: mapping-reference/sparse-vector.md - - file: mapping-reference/text.md + - file: mapping-reference/text-type-family.md + children: + - file: mapping-reference/text.md + - file: mapping-reference/pattern-text.md + - file: mapping-reference/match-only-text.md - file: mapping-reference/token-count.md - file: mapping-reference/unsigned-long.md - file: mapping-reference/version.md