-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Add documentation for pattern_text mapping type #135856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8698e45
657670c
9ec090f
db7491f
fa2d731
67aa9dd
cd99011
180ba1e
fe11cdf
75aa09d
5426984
4e6f29c
5fd13e8
64b2bd5
1f2c619
438454b
dd93135
90933ca
db1ede3
acb57e2
823863f
ab3d63f
a6d1c80
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
--- | ||
navigation_title: "Match Only Text" | ||
mapped_pages: | ||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/match-only-text.html | ||
--- | ||
|
||
# Match-only text field type [match-only-text-field-type] | ||
|
||
A variant of [`text`](/reference/elasticsearch/mapping-reference/text.md) that trades scoring and efficiency of positional queries for space efficiency. This field effectively stores data the same way as a `text` field that only indexes documents (`index_options: docs`) and disables norms (`norms: false`). Term queries perform as fast if not faster as on `text` fields, however queries that need positions such as the [`match_phrase` query](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) perform slower as they need to look at the `_source` document to verify whether a phrase matches. All queries return constant scores that are equal to 1.0. | ||
|
||
Analysis is not configurable: text is always analyzed with the [default analyzer](docs-content://manage-data/data-store/text-analysis/specify-an-analyzer.md#specify-index-time-default-analyzer) ([`standard`](/reference/text-analysis/analysis-standard-analyzer.md) by default). | ||
|
||
[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](/reference/elasticsearch/mapping-reference/text.md) field type if you absolutely need span queries. | ||
|
||
Other than that, `match_only_text` supports the same queries as `text`. And like `text`, it does not support sorting and has only limited support for aggregations. | ||
|
||
```console | ||
PUT logs | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"@timestamp": { | ||
"type": "date" | ||
}, | ||
"message": { | ||
"type": "match_only_text" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
|
||
## Parameters for match-only text fields [match-only-text-params] | ||
|
||
The following mapping parameters are accepted: | ||
|
||
[`fields`](/reference/elasticsearch/mapping-reference/multi-fields.md) | ||
: Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations, or the same string value analyzed by different analyzers. | ||
|
||
[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md) | ||
: Metadata about the field. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
--- | ||
navigation_title: "Pattern Text" | ||
mapped_pages: | ||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/pattern-text.html | ||
--- | ||
|
||
# Pattern text field type [pattern-text-field-type] | ||
```{applies_to} | ||
serverless: preview | ||
stack: preview 9.2 | ||
``` | ||
:::{note} | ||
This feature requires a [subscription](https://www.elastic.co/subscriptions). | ||
::: | ||
|
||
The `pattern_text` field type is a variant of [`text`](/reference/elasticsearch/mapping-reference/text.md) with improved space efficiency for log data. | ||
Internally, it decomposes values into static parts that are likely to be shared among many values, and dynamic parts that tend to vary. | ||
The static parts usually come from the explanatory text of a log message, while the dynamic parts are the variables that were interpolated into the logs. | ||
This decomposition allows for improved compression on log-like data. | ||
|
||
We call the static portion of the value the `template`. | ||
Although the template cannot be accessed directly, a separate field called `<field_name>.template_id` is accessible. | ||
This field is a hash of the template and can be used to group similar values. | ||
|
||
Analysis is configurable but defaults to a delimiter-based analyzer. | ||
This analyzer applies a lowercase filter and then splits on whitespace and the following delimiters: `=`, `?`, `:`, `[`, `]`, `{`, `}`, `"`, `\`, `'`. | ||
|
||
## Limitations | ||
|
||
Unlike most mapping types, `pattern_text` does not support multiple values for a given field per document. | ||
If a document is created with multiple values for a pattern_text field, an error will be returned. | ||
|
||
[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](/reference/elasticsearch/mapping-reference/text.md) field type if you absolutely need span queries. | ||
|
||
Like `text`, `pattern_text` does not support sorting and has only limited support for aggregations. | ||
|
||
## Phrase matching | ||
Pattern text supports an `index_options` parameter with valid values of `docs` and `positions`. | ||
The default value is `docs`, which makes `pattern_text` behave similarly to `match_only_text` for phrase queries. | ||
Specifically, positions are not stored, which reduces the index size at the cost of slowing down phrase queries. | ||
If `index_options` is set to `positions`, positions are stored and `pattern_text` will support fast phrase queries. | ||
In both cases, all queries return a constant score of 1.0. | ||
|
||
## Index sorting for improved compression | ||
The compression provided by `pattern_text` can be significantly improved if the index is sorted by the `template_id` field. | ||
For example, a typical approach would be to sort first by `message.template_id`, then by `@timestamp`, as shown in the following example. | ||
|
||
```console | ||
PUT logs | ||
{ | ||
"settings": { | ||
"index": { | ||
"sort.field": [ "message.template_id", "@timestamp" ], | ||
"sort.order": [ "asc", "desc" ] | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"@timestamp": { | ||
"type": "date" | ||
}, | ||
"message": { | ||
"type": "pattern_text" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
|
||
## Parameters for pattern text fields [pattern-text-params] | ||
|
||
The following mapping parameters are accepted: | ||
|
||
[`analyzer`](/reference/elasticsearch/mapping-reference/analyzer.md) | ||
: The [analyzer](docs-content://manage-data/data-store/text-analysis.md) which should be used for the `pattern_text` field, both at index-time and at search-time (unless overridden by the [`search_analyzer`](/reference/elasticsearch/mapping-reference/search-analyzer.md)). | ||
Supports a delimiter-based analyzer and the standard analyzer, as is used in `match_only_text` mappings. | ||
Defaults to the delimiter-based analyzer, which applies a lowercase filter and then splits on whitespace and the following delimiters: `=`, `?`, `:`, `[`, `]`, `{`, `}`, `"`, `\`, `'`. | ||
|
||
[`index_options`](/reference/elasticsearch/mapping-reference/index-options.md) | ||
: What information should be stored in the index, for search and highlighting purposes. Valid values are `docs` and `positions`. Defaults to `docs`. | ||
|
||
[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md) | ||
: Metadata about the field. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
navigation_title: "Text type family" | ||
mapped_pages: | ||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/text-type-family.html | ||
--- | ||
|
||
# Text type family [text] | ||
|
||
|
||
The text family includes the following field types: | ||
|
||
* [`text`](/reference/elasticsearch/mapping-reference/text.md), the traditional field type for full-text content such as the body of an email or the description of a product. | ||
* [`match_only_text`](/reference/elasticsearch/mapping-reference/match-only-text.md), a space-optimized variant of `text` that disables scoring and performs slower on queries that need positions. It is best suited for indexing log messages. | ||
* [`pattern_text`](/reference/elasticsearch/mapping-reference/pattern-text.md), a variant of `text` which is optimized for log messages which contain sequences that are shared between many messages. By compressing these shared sequences, `pattern_text` provides improved space efficiency relative to `match_only_text`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jordan-powers and @martijnvg What do ya'll think of this blurb? I'm having trouble coming up with a description that is succinct and described the difference between pattern_text and match_only_text. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Match-only text is targeted for match only cases, and relevance and where matches happens are not important. This can be achieved with The pattern_test field type is more suite to index short repeating messages like log messages. That gives a real space saving benefit. So I would document the differences with these things in mind. |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about this?