-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Add documentation for pattern_text mapping type #135856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
parkertimmins
wants to merge
23
commits into
elastic:main
Choose a base branch
from
parkertimmins:parker/pattern-text-docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+161
−56
Open
Changes from 5 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
8698e45
First version of pattern_text docs
parkertimmins 657670c
Some syntax fixes
parkertimmins 9ec090f
Incorrect sort settings
parkertimmins db7491f
broken link
parkertimmins fa2d731
Add applies_to badge
parkertimmins 67aa9dd
Update docs/reference/elasticsearch/mapping-reference/text.md
parkertimmins cd99011
Update docs/reference/elasticsearch/mapping-reference/text.md
parkertimmins 180ba1e
Update docs/reference/elasticsearch/mapping-reference/text.md
parkertimmins fe11cdf
Update docs/reference/elasticsearch/mapping-reference/text.md
parkertimmins 75aa09d
Update docs/reference/elasticsearch/mapping-reference/text.md
parkertimmins 5426984
Reorder so limitations are in separate section
parkertimmins 4e6f29c
Add mention of subscription
parkertimmins 5fd13e8
mention standard analyzer is supported
parkertimmins 64b2bd5
Split text family types into separate docs pages
parkertimmins 1f2c619
Remove match_only_text and pattern_text from text docs page
parkertimmins 438454b
Merge branch 'main' into parker/pattern-text-docs
parkertimmins dd93135
Fix a few build errors
parkertimmins 90933ca
Add redirects, fix some anchors
parkertimmins db1ede3
Change redirect syntax
parkertimmins acb57e2
Add to toc.yml
parkertimmins 823863f
Fix incorrect file name
parkertimmins ab3d63f
Another incorrect anchor
parkertimmins a6d1c80
Change pattern_text description
parkertimmins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ The text family includes the following field types: | |
|
||
* [`text`](#text-field-type), the traditional field type for full-text content such as the body of an email or the description of a product. | ||
* [`match_only_text`](#match-only-text-field-type), a space-optimized variant of `text` that disables scoring and performs slower on queries that need positions. It is best suited for indexing log messages. | ||
* [`pattern_text`](#pattern-text-field-type), a variant of `text` with improved space efficiency when storing log messages. | ||
|
||
|
||
|
||
## Text field type [text-field-type] | ||
|
@@ -341,3 +342,82 @@ The following mapping parameters are accepted: | |
: Metadata about the field. | ||
|
||
|
||
## Pattern text field type [pattern-text-field-type] | ||
```{applies_to} | ||
serverless: preview | ||
stack: preview 9.2 | ||
``` | ||
|
||
::::{warning} | ||
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. | ||
:::: | ||
parkertimmins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
parkertimmins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
A variant of [`text`](#text-field-type) with improved space efficiency for log data. | ||
parkertimmins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Internally, it decomposes values into static parts that are likely to be shared among many values, and dynamic parts that tend to vary. | ||
The static parts usually come from the explanatory text of a log message, while the dynamic parts are the variables that were interpolated into the logs. | ||
This decomposition allows for improved compression on log-like data. | ||
|
||
We call the static portion of the value the `template`. | ||
Although the template cannot be accessed directly, a separate field called `<field_name>.template_id` is accessible. | ||
This field is a hash of the template and can be used to group similar values. | ||
As this feature is in technical preview, the internal structure of the template is subject to change. | ||
parkertimmins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Because of this, `<field_name>.template_id` is also subject to future changes. | ||
parkertimmins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
Unlike most mapping types, `pattern_text` does not support multiple values for a given field per document. | ||
parkertimmins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
If a document is created with multiple values for a pattern_text field, an error will be returned. | ||
|
||
Analysis is configurable but defaults to a delimiter-based analyzer. | ||
This analyzer applies a lowercase filter and then splits on whitespace and the following delimiters: `=`, `?`, `:`, `[`, `]`, `{`, `}`, `"`, `\`, `'`. | ||
parkertimmins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
[span queries](/reference/query-languages/query-dsl/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl/query-dsl-intervals-query.md) instead, or the [`text`](#text-field-type) field type if you absolutely need span queries. | ||
|
||
Like `text`, `pattern_text` does not support sorting and has only limited support for aggregations. | ||
|
||
### Phrase matching | ||
Pattern text supports an `index_options` parameter with valid values of `docs` and `positions`. | ||
The default value is `docs`, which makes `pattern_text` behave similarly to `match_only_text` for phrase queries. | ||
Specifically, positions are not stored, which reduces the index size at the cost of slowing down phrase queries. | ||
If `index_options` is set to `positions`, positions are stored and `pattern_text` will support fast phrase queries. | ||
In both cases, all queries return a constant score of 1.0. | ||
|
||
### Index sorting for improved compression | ||
The compression provided by `pattern_text` can be significantly improved if the index is sorted by the `template_id` field. | ||
For example, a typical approach would be to sort first by `message.template_id`, then by `@timestamp`, as shown in the following example. | ||
|
||
```console | ||
PUT logs | ||
{ | ||
"settings": { | ||
"index": { | ||
"sort.field": [ "message.template_id", "@timestamp" ], | ||
"sort.order": [ "asc", "desc" ] | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"@timestamp": { | ||
"type": "date" | ||
}, | ||
"message": { | ||
"type": "pattern_text" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
|
||
### Parameters for pattern text fields [pattern-text-params] | ||
|
||
The following mapping parameters are accepted: | ||
|
||
[`analyzer`](/reference/elasticsearch/mapping-reference/analyzer.md) | ||
parkertimmins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
: The [analyzer](docs-content://manage-data/data-store/text-analysis.md) which should be used for the `pattern_text` field, both at index-time and at search-time (unless overridden by the [`search_analyzer`](/reference/elasticsearch/mapping-reference/search-analyzer.md)). Defaults to a custom delimiter-based analyzer. | ||
This analyzer applies a lowercase filter and then splits on whitespace and the following delimiters: `=`, `?`, `:`, `[`, `]`, `{`, `}`, `"`, `\`, `'`. | ||
|
||
[`index_options`](/reference/elasticsearch/mapping-reference/index-options.md) | ||
: What information should be stored in the index, for search and highlighting purposes. Valid values are `docs` and `positions`. Defaults to `docs`. | ||
|
||
[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md) | ||
: Metadata about the field. | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use some clarification--which is better for log messages,
match_only_text
orpattern_text
?