Skip to content

Commit e82c457

Browse files
Add documentation for pattern_text mapping type (#135856)
Add documentation for pattern_text type. Since the main text page was getting large, create a new text-type-family page with links to text, match_only_text, and pattern_text.
1 parent 60a05a9 commit e82c457

File tree

8 files changed

+161
-56
lines changed

8 files changed

+161
-56
lines changed

docs/redirects.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,3 +105,10 @@ redirects:
105105
'reference/query-languages/esql/kibana/docs/functions/st_geohash_to_long.md': 'reference/query-languages/esql/esql-functions-operators.md'
106106
'reference/query-languages/esql/kibana/docs/functions/st_geotile_to_long.md': 'reference/query-languages/esql/esql-functions-operators.md'
107107
'reference/query-languages/esql/kibana/docs/functions/st_geohex_to_long.md': 'reference/query-languages/esql/esql-functions-operators.md'
108+
109+
'reference/elasticsearch/mapping-reference/text.md':
110+
to: 'reference/elasticsearch/mapping-reference/text.md'
111+
anchors: { } # pass-through unlisted anchors in the `many` ruleset
112+
many:
113+
- to: 'reference/elasticsearch/mapping-reference/match-only-text.md'
114+
anchors: { 'match-only-text-field-type', 'match-only-text-params' }

docs/reference/elasticsearch/mapping-reference/field-data-types.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,8 +77,8 @@ Dates
7777

7878
### Text search types [text-search-types]
7979

80-
[`text` fields](/reference/elasticsearch/mapping-reference/text.md)
81-
: The text family, including `text` and `match_only_text`. Analyzed, unstructured text.
80+
[`text` fields](/reference/elasticsearch/mapping-reference/text-type-family.md)
81+
: The text family, including `text`, `match_only_text`, and `pattern_text`. Analyzed, unstructured text.
8282

8383
[`annotated-text`](/reference/elasticsearch-plugins/mapper-annotated-text.md)
8484
: Text containing special markup. Used for identifying named entities.
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
---
2+
navigation_title: "Match Only Text"
3+
mapped_pages:
4+
- https://www.elastic.co/guide/en/elasticsearch/reference/current/match-only-text.html
5+
---
6+
7+
# Match-only text field type [match-only-text-field-type]
8+
9+
A variant of [`text`](/reference/elasticsearch/mapping-reference/text.md) that trades scoring and efficiency of positional queries for space efficiency. This field effectively stores data the same way as a `text` field that only indexes documents (`index_options: docs`) and disables norms (`norms: false`). Term queries perform as fast if not faster as on `text` fields, however queries that need positions such as the [`match_phrase` query](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) perform slower as they need to look at the `_source` document to verify whether a phrase matches. All queries return constant scores that are equal to 1.0.
10+
11+
Analysis is not configurable: text is always analyzed with the [default analyzer](docs-content://manage-data/data-store/text-analysis/specify-an-analyzer.md#specify-index-time-default-analyzer) ([`standard`](/reference/text-analysis/analysis-standard-analyzer.md) by default).
12+
13+
[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](/reference/elasticsearch/mapping-reference/text.md) field type if you absolutely need span queries.
14+
15+
Other than that, `match_only_text` supports the same queries as `text`. And like `text`, it does not support sorting and has only limited support for aggregations.
16+
17+
```console
18+
PUT logs
19+
{
20+
"mappings": {
21+
"properties": {
22+
"@timestamp": {
23+
"type": "date"
24+
},
25+
"message": {
26+
"type": "match_only_text"
27+
}
28+
}
29+
}
30+
}
31+
```
32+
33+
34+
## Parameters for match-only text fields [match-only-text-params]
35+
36+
The following mapping parameters are accepted:
37+
38+
[`fields`](/reference/elasticsearch/mapping-reference/multi-fields.md)
39+
: Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations, or the same string value analyzed by different analyzers.
40+
41+
[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md)
42+
: Metadata about the field.
43+
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
navigation_title: "Pattern Text"
3+
mapped_pages:
4+
- https://www.elastic.co/guide/en/elasticsearch/reference/current/pattern-text.html
5+
---
6+
7+
# Pattern text field type [pattern-text-field-type]
8+
```{applies_to}
9+
serverless: preview
10+
stack: preview 9.2
11+
```
12+
:::{note}
13+
This feature requires a [subscription](https://www.elastic.co/subscriptions).
14+
:::
15+
16+
The `pattern_text` field type is a variant of [`text`](/reference/elasticsearch/mapping-reference/text.md) with improved space efficiency for log data.
17+
Internally, it decomposes values into static parts that are likely to be shared among many values, and dynamic parts that tend to vary.
18+
The static parts usually come from the explanatory text of a log message, while the dynamic parts are the variables that were interpolated into the logs.
19+
This decomposition allows for improved compression on log-like data.
20+
21+
We call the static portion of the value the `template`.
22+
Although the template cannot be accessed directly, a separate field called `<field_name>.template_id` is accessible.
23+
This field is a hash of the template and can be used to group similar values.
24+
25+
Analysis is configurable but defaults to a delimiter-based analyzer.
26+
This analyzer applies a lowercase filter and then splits on whitespace and the following delimiters: `=`, `?`, `:`, `[`, `]`, `{`, `}`, `"`, `\`, `'`.
27+
28+
## Limitations
29+
30+
Unlike most mapping types, `pattern_text` does not support multiple values for a given field per document.
31+
If a document is created with multiple values for a pattern_text field, an error will be returned.
32+
33+
[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](/reference/elasticsearch/mapping-reference/text.md) field type if you absolutely need span queries.
34+
35+
Like `text`, `pattern_text` does not support sorting and has only limited support for aggregations.
36+
37+
## Phrase matching
38+
Pattern text supports an `index_options` parameter with valid values of `docs` and `positions`.
39+
The default value is `docs`, which makes `pattern_text` behave similarly to `match_only_text` for phrase queries.
40+
Specifically, positions are not stored, which reduces the index size at the cost of slowing down phrase queries.
41+
If `index_options` is set to `positions`, positions are stored and `pattern_text` will support fast phrase queries.
42+
In both cases, all queries return a constant score of 1.0.
43+
44+
## Index sorting for improved compression
45+
The compression provided by `pattern_text` can be significantly improved if the index is sorted by the `template_id` field.
46+
For example, a typical approach would be to sort first by `message.template_id`, then by `@timestamp`, as shown in the following example.
47+
48+
```console
49+
PUT logs
50+
{
51+
"settings": {
52+
"index": {
53+
"sort.field": [ "message.template_id", "@timestamp" ],
54+
"sort.order": [ "asc", "desc" ]
55+
}
56+
},
57+
"mappings": {
58+
"properties": {
59+
"@timestamp": {
60+
"type": "date"
61+
},
62+
"message": {
63+
"type": "pattern_text"
64+
}
65+
}
66+
}
67+
}
68+
```
69+
70+
71+
## Parameters for pattern text fields [pattern-text-params]
72+
73+
The following mapping parameters are accepted:
74+
75+
[`analyzer`](/reference/elasticsearch/mapping-reference/analyzer.md)
76+
: The [analyzer](docs-content://manage-data/data-store/text-analysis.md) which should be used for the `pattern_text` field, both at index-time and at search-time (unless overridden by the [`search_analyzer`](/reference/elasticsearch/mapping-reference/search-analyzer.md)).
77+
Supports a delimiter-based analyzer and the standard analyzer, as is used in `match_only_text` mappings.
78+
Defaults to the delimiter-based analyzer, which applies a lowercase filter and then splits on whitespace and the following delimiters: `=`, `?`, `:`, `[`, `]`, `{`, `}`, `"`, `\`, `'`.
79+
80+
[`index_options`](/reference/elasticsearch/mapping-reference/index-options.md)
81+
: What information should be stored in the index, for search and highlighting purposes. Valid values are `docs` and `positions`. Defaults to `docs`.
82+
83+
[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md)
84+
: Metadata about the field.
85+
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
navigation_title: "Text type family"
3+
mapped_pages:
4+
- https://www.elastic.co/guide/en/elasticsearch/reference/current/text-type-family.html
5+
---
6+
7+
# Text type family [text]
8+
9+
10+
The text family includes the following field types:
11+
12+
* [`text`](/reference/elasticsearch/mapping-reference/text.md), the traditional field type for full-text content such as the body of an email or the description of a product.
13+
* [`match_only_text`](/reference/elasticsearch/mapping-reference/match-only-text.md), a variant of `text` field type with limited functionality. Scoring is always disabled and the `standard` analyzer is always used. It suited for match only free text uses cases. Meaning that the fact that there is a match is important, but scoring and where the match happens is not relevant. Note that positional queries are possible, but are slow.
14+
* [`pattern_text`](/reference/elasticsearch/mapping-reference/pattern-text.md), a variant of `text` which is optimized for space efficient storage of log messages. Pattern text reduces space usage for messages that contain many repeated sequences, like the explanatory text of a log message. Pattern text also disables scoring, but unlike `match_only_text`, positional data can be stored for fast phrase queries.
15+

docs/reference/elasticsearch/mapping-reference/text.md

Lines changed: 1 addition & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,7 @@ mapped_pages:
44
- https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html
55
---
66

7-
# Text type family [text]
8-
9-
10-
The text family includes the following field types:
11-
12-
* [`text`](#text-field-type), the traditional field type for full-text content such as the body of an email or the description of a product.
13-
* [`match_only_text`](#match-only-text-field-type), a space-optimized variant of `text` that disables scoring and performs slower on queries that need positions. It is best suited for indexing log messages.
14-
15-
16-
## Text field type [text-field-type]
7+
# Text field type [text-field-type]
178

189
A field to index full-text values, such as the body of an email or the description of a product. These fields are `analyzed`, that is they are passed through an [analyzer](docs-content://manage-data/data-store/text-analysis.md) to convert the string into a list of individual terms before being indexed. The analysis process allows Elasticsearch to search for individual words *within* each full text field. Text fields are not used for sorting and seldom used for aggregations (although the [significant text aggregation](/reference/aggregations/search-aggregations-bucket-significanttext-aggregation.md) is a notable exception).
1910

@@ -304,43 +295,3 @@ PUT my-index-000001
304295
}
305296
}
306297
```
307-
308-
309-
## Match-only text field type [match-only-text-field-type]
310-
311-
A variant of [`text`](#text-field-type) that trades scoring and efficiency of positional queries for space efficiency. This field effectively stores data the same way as a `text` field that only indexes documents (`index_options: docs`) and disables norms (`norms: false`). Term queries perform as fast if not faster as on `text` fields, however queries that need positions such as the [`match_phrase` query](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) perform slower as they need to look at the `_source` document to verify whether a phrase matches. All queries return constant scores that are equal to 1.0.
312-
313-
Analysis is not configurable: text is always analyzed with the [default analyzer](docs-content://manage-data/data-store/text-analysis/specify-an-analyzer.md#specify-index-time-default-analyzer) ([`standard`](/reference/text-analysis/analysis-standard-analyzer.md) by default).
314-
315-
[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](#text-field-type) field type if you absolutely need span queries.
316-
317-
Other than that, `match_only_text` supports the same queries as `text`. And like `text`, it does not support sorting and has only limited support for aggregations.
318-
319-
```console
320-
PUT logs
321-
{
322-
"mappings": {
323-
"properties": {
324-
"@timestamp": {
325-
"type": "date"
326-
},
327-
"message": {
328-
"type": "match_only_text"
329-
}
330-
}
331-
}
332-
}
333-
```
334-
335-
336-
### Parameters for match-only text fields [match-only-text-params]
337-
338-
The following mapping parameters are accepted:
339-
340-
[`fields`](/reference/elasticsearch/mapping-reference/multi-fields.md)
341-
: Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations, or the same string value analyzed by different analyzers.
342-
343-
[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md)
344-
: Metadata about the field.
345-
346-

docs/reference/elasticsearch/rest-apis/highlighting-settings.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,10 @@ encoder
5151
: Indicates if the snippet should be HTML encoded: `default` (no encoding) or `html` (HTML-escape the snippet text and then insert the highlighting tags)
5252

5353
fields
54-
: Specifies the fields to retrieve highlights for. You can use wildcards to specify fields. For example, you could specify `comment_*` to get highlights for all [text](/reference/elasticsearch/mapping-reference/text.md), [match_only_text](/reference/elasticsearch/mapping-reference/text.md#match-only-text-field-type), and [keyword](/reference/elasticsearch/mapping-reference/keyword.md) fields that start with `comment_`.
54+
: Specifies the fields to retrieve highlights for. You can use wildcards to specify fields. For example, you could specify `comment_*` to get highlights for all [text](/reference/elasticsearch/mapping-reference/text.md), [match_only_text](/reference/elasticsearch/mapping-reference/match-only-text.md), [pattern_text](/reference/elasticsearch/mapping-reference/pattern-text.md), and [keyword](/reference/elasticsearch/mapping-reference/keyword.md) fields that start with `comment_`.
5555

5656
::::{note}
57-
Only text, match_only_text, and keyword fields are highlighted when you use wildcards. If you use a custom mapper and want to highlight on a field anyway, you must explicitly specify that field name.
57+
Only text, match_only_text, pattern_text, and keyword fields are highlighted when you use wildcards. If you use a custom mapper and want to highlight on a field anyway, you must explicitly specify that field name.
5858
::::
5959

6060
$$$fragmenter$$$
@@ -147,4 +147,4 @@ tags_schema
147147
$$$highlighter-type$$$
148148

149149
type
150-
: The highlighter to use: `unified`, `plain`, or `fvh`. Defaults to `unified`.
150+
: The highlighter to use: `unified`, `plain`, or `fvh`. Defaults to `unified`.

docs/reference/elasticsearch/toc.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,11 @@ toc:
174174
- file: mapping-reference/semantic-text.md
175175
- file: mapping-reference/shape.md
176176
- file: mapping-reference/sparse-vector.md
177-
- file: mapping-reference/text.md
177+
- file: mapping-reference/text-type-family.md
178+
children:
179+
- file: mapping-reference/text.md
180+
- file: mapping-reference/pattern-text.md
181+
- file: mapping-reference/match-only-text.md
178182
- file: mapping-reference/token-count.md
179183
- file: mapping-reference/unsigned-long.md
180184
- file: mapping-reference/version.md

0 commit comments

Comments
 (0)