-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Add documentation for pattern_text mapping type #135856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
parkertimmins
wants to merge
23
commits into
elastic:main
Choose a base branch
from
parkertimmins:parker/pattern-text-docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+161
−56
Open
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
8698e45
First version of pattern_text docs
parkertimmins 657670c
Some syntax fixes
parkertimmins 9ec090f
Incorrect sort settings
parkertimmins db7491f
broken link
parkertimmins fa2d731
Add applies_to badge
parkertimmins 67aa9dd
Update docs/reference/elasticsearch/mapping-reference/text.md
parkertimmins cd99011
Update docs/reference/elasticsearch/mapping-reference/text.md
parkertimmins 180ba1e
Update docs/reference/elasticsearch/mapping-reference/text.md
parkertimmins fe11cdf
Update docs/reference/elasticsearch/mapping-reference/text.md
parkertimmins 75aa09d
Update docs/reference/elasticsearch/mapping-reference/text.md
parkertimmins 5426984
Reorder so limitations are in separate section
parkertimmins 4e6f29c
Add mention of subscription
parkertimmins 5fd13e8
mention standard analyzer is supported
parkertimmins 64b2bd5
Split text family types into separate docs pages
parkertimmins 1f2c619
Remove match_only_text and pattern_text from text docs page
parkertimmins 438454b
Merge branch 'main' into parker/pattern-text-docs
parkertimmins dd93135
Fix a few build errors
parkertimmins 90933ca
Add redirects, fix some anchors
parkertimmins db1ede3
Change redirect syntax
parkertimmins acb57e2
Add to toc.yml
parkertimmins 823863f
Fix incorrect file name
parkertimmins ab3d63f
Another incorrect anchor
parkertimmins a6d1c80
Change pattern_text description
parkertimmins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
docs/reference/elasticsearch/mapping-reference/match-only-text.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
--- | ||
navigation_title: "Match Only Text" | ||
mapped_pages: | ||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/match-only-text.html | ||
--- | ||
|
||
# Match-only text field type [match-only-text-field-type] | ||
|
||
A variant of [`text`](/reference/elasticsearch/mapping-reference/text.md) that trades scoring and efficiency of positional queries for space efficiency. This field effectively stores data the same way as a `text` field that only indexes documents (`index_options: docs`) and disables norms (`norms: false`). Term queries perform as fast if not faster as on `text` fields, however queries that need positions such as the [`match_phrase` query](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) perform slower as they need to look at the `_source` document to verify whether a phrase matches. All queries return constant scores that are equal to 1.0. | ||
|
||
Analysis is not configurable: text is always analyzed with the [default analyzer](docs-content://manage-data/data-store/text-analysis/specify-an-analyzer.md#specify-index-time-default-analyzer) ([`standard`](/reference/text-analysis/analysis-standard-analyzer.md) by default). | ||
|
||
[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](/reference/elasticsearch/mapping-reference/text.md) field type if you absolutely need span queries. | ||
|
||
Other than that, `match_only_text` supports the same queries as `text`. And like `text`, it does not support sorting and has only limited support for aggregations. | ||
|
||
```console | ||
PUT logs | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"@timestamp": { | ||
"type": "date" | ||
}, | ||
"message": { | ||
"type": "match_only_text" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
|
||
## Parameters for match-only text fields [match-only-text-params] | ||
|
||
The following mapping parameters are accepted: | ||
|
||
[`fields`](/reference/elasticsearch/mapping-reference/multi-fields.md) | ||
: Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations, or the same string value analyzed by different analyzers. | ||
|
||
[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md) | ||
: Metadata about the field. | ||
|
85 changes: 85 additions & 0 deletions
85
docs/reference/elasticsearch/mapping-reference/pattern-text.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
--- | ||
navigation_title: "Pattern Text" | ||
mapped_pages: | ||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/pattern-text.html | ||
--- | ||
|
||
# Pattern text field type [pattern-text-field-type] | ||
```{applies_to} | ||
serverless: preview | ||
stack: preview 9.2 | ||
``` | ||
:::{note} | ||
This feature requires a [subscription](https://www.elastic.co/subscriptions). | ||
::: | ||
|
||
The `pattern_text` field type is a variant of [`text`](/reference/elasticsearch/mapping-reference/text.md) with improved space efficiency for log data. | ||
Internally, it decomposes values into static parts that are likely to be shared among many values, and dynamic parts that tend to vary. | ||
The static parts usually come from the explanatory text of a log message, while the dynamic parts are the variables that were interpolated into the logs. | ||
This decomposition allows for improved compression on log-like data. | ||
|
||
We call the static portion of the value the `template`. | ||
Although the template cannot be accessed directly, a separate field called `<field_name>.template_id` is accessible. | ||
This field is a hash of the template and can be used to group similar values. | ||
|
||
Analysis is configurable but defaults to a delimiter-based analyzer. | ||
This analyzer applies a lowercase filter and then splits on whitespace and the following delimiters: `=`, `?`, `:`, `[`, `]`, `{`, `}`, `"`, `\`, `'`. | ||
|
||
## Limitations | ||
|
||
Unlike most mapping types, `pattern_text` does not support multiple values for a given field per document. | ||
If a document is created with multiple values for a pattern_text field, an error will be returned. | ||
|
||
[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](/reference/elasticsearch/mapping-reference/text.md) field type if you absolutely need span queries. | ||
|
||
Like `text`, `pattern_text` does not support sorting and has only limited support for aggregations. | ||
|
||
## Phrase matching | ||
Pattern text supports an `index_options` parameter with valid values of `docs` and `positions`. | ||
The default value is `docs`, which makes `pattern_text` behave similarly to `match_only_text` for phrase queries. | ||
Specifically, positions are not stored, which reduces the index size at the cost of slowing down phrase queries. | ||
If `index_options` is set to `positions`, positions are stored and `pattern_text` will support fast phrase queries. | ||
In both cases, all queries return a constant score of 1.0. | ||
|
||
## Index sorting for improved compression | ||
The compression provided by `pattern_text` can be significantly improved if the index is sorted by the `template_id` field. | ||
For example, a typical approach would be to sort first by `message.template_id`, then by `@timestamp`, as shown in the following example. | ||
|
||
```console | ||
PUT logs | ||
{ | ||
"settings": { | ||
"index": { | ||
"sort.field": [ "message.template_id", "@timestamp" ], | ||
"sort.order": [ "asc", "desc" ] | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"@timestamp": { | ||
"type": "date" | ||
}, | ||
"message": { | ||
"type": "pattern_text" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
|
||
## Parameters for pattern text fields [pattern-text-params] | ||
|
||
The following mapping parameters are accepted: | ||
|
||
[`analyzer`](/reference/elasticsearch/mapping-reference/analyzer.md) | ||
: The [analyzer](docs-content://manage-data/data-store/text-analysis.md) which should be used for the `pattern_text` field, both at index-time and at search-time (unless overridden by the [`search_analyzer`](/reference/elasticsearch/mapping-reference/search-analyzer.md)). | ||
Supports a delimiter-based analyzer and the standard analyzer, as is used in `match_only_text` mappings. | ||
Defaults to the delimiter-based analyzer, which applies a lowercase filter and then splits on whitespace and the following delimiters: `=`, `?`, `:`, `[`, `]`, `{`, `}`, `"`, `\`, `'`. | ||
|
||
[`index_options`](/reference/elasticsearch/mapping-reference/index-options.md) | ||
: What information should be stored in the index, for search and highlighting purposes. Valid values are `docs` and `positions`. Defaults to `docs`. | ||
|
||
[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md) | ||
: Metadata about the field. | ||
|
15 changes: 15 additions & 0 deletions
15
docs/reference/elasticsearch/mapping-reference/text-type-family.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
navigation_title: "Text type family" | ||
mapped_pages: | ||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/text-type-family.html | ||
--- | ||
|
||
# Text type family [text] | ||
|
||
|
||
The text family includes the following field types: | ||
|
||
* [`text`](/reference/elasticsearch/mapping-reference/text.md), the traditional field type for full-text content such as the body of an email or the description of a product. | ||
* [`match_only_text`](/reference/elasticsearch/mapping-reference/match-only-text.md), a space-optimized variant of `text` that disables scoring and performs slower on queries that need positions. It is best suited for indexing log messages. | ||
* [`pattern_text`](/reference/elasticsearch/mapping-reference/pattern-text.md), a variant of `text` which is optimized for log messages which contain sequences that are shared between many messages. By compressing these shared sequences, `pattern_text` provides improved space efficiency relative to `match_only_text`. | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jordan-powers and @martijnvg What do ya'll think of this blurb? I'm having trouble coming up with a description that is succinct and described the difference between pattern_text and match_only_text.