-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DOCS] Update documentation for index sorting and routing for logsdb #120721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
aaea639
c388974
8588bdf
f565c16
df15cbf
4a83983
1c25703
260eccc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,9 +1,9 @@ | ||||||||||||||||||||||||||
[[logs-data-stream]] | ||||||||||||||||||||||||||
== Logs data stream | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted | ||||||||||||||||||||||||||
and self-managed Elasticsearch as of version 8.17, and is enabled by default for | ||||||||||||||||||||||||||
logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}]. | ||||||||||||||||||||||||||
IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted | ||||||||||||||||||||||||||
and self-managed Elasticsearch as of version 8.17, and is enabled by default for | ||||||||||||||||||||||||||
logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}]. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
A logs data stream is a data stream type that stores log data more efficiently. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
@@ -54,57 +54,48 @@ DELETE _index_template/my-index-template | |||||||||||||||||||||||||
=== Synthetic source | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
If you have the required https://www.elastic.co/subscriptions[subscription], `logsdb` index mode uses <<synthetic-source,synthetic `_source`>>, which omits storing the original `_source` | ||||||||||||||||||||||||||
field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval. | ||||||||||||||||||||||||||
field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
If you don't have the required https://www.elastic.co/subscriptions[subscription], `logsdb` mode uses the original `_source` field. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>. | ||||||||||||||||||||||||||
Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values | ||||||||||||||||||||||||||
are preserved for <<synthetic-source,synthetic source>> reconstruction. In `logsdb`, the default value is `arrays`, | ||||||||||||||||||||||||||
which retains both duplicate values and the order of entries. However, the exact structure of | ||||||||||||||||||||||||||
array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some | ||||||||||||||||||||||||||
log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events. | ||||||||||||||||||||||||||
array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some | ||||||||||||||||||||||||||
log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
[discrete] | ||||||||||||||||||||||||||
[[logsdb-sort-settings]] | ||||||||||||||||||||||||||
=== Index sort settings | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
In `logsdb` index mode, the following sort settings are applied by default: | ||||||||||||||||||||||||||
In `logsdb` index mode, indices are sorted by fields `host.name` and `@timestamp` by default. The `@timestamp` field is | ||||||||||||||||||||||||||
marciw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
automatically injected if it is not present. The `host.name` field is automatically injected as `keyword` if it is not | ||||||||||||||||||||||||||
present and can be injected - this may not be possible if `host` is a keyword field, for instance. If field | ||||||||||||||||||||||||||
`host.name` can't be injected or can't be used for sorting (e.g. it's an IP field), sorting is only applied to field | ||||||||||||||||||||||||||
`@timestamp`. | ||||||||||||||||||||||||||
kkrik-es marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
`index.sort.field`: `["host.name", "@timestamp"]`:: | ||||||||||||||||||||||||||
Indices are sorted by `host.name` and `@timestamp` by default. The `@timestamp` field is automatically injected if it is not present. | ||||||||||||||||||||||||||
NOTE: If `host.name` is injected and `subobjects` is set to `true` (default), the `host` field is mapped as an object | ||||||||||||||||||||||||||
field named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`, a single | ||||||||||||||||||||||||||
`host.name` field is mapped as a `keyword` field. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
`index.sort.order`: `["desc", "desc"]`:: | ||||||||||||||||||||||||||
Both `host.name` and `@timestamp` are sorted in descending (`desc`) order, prioritizing the latest data. | ||||||||||||||||||||||||||
`host.name` and `@timestamp` are sorted in ascending and descending order respectively, prioritizing the latest data. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
`index.sort.mode`: `["min", "min"]`:: | ||||||||||||||||||||||||||
The `min` mode sorts indices by the minimum value of multi-value fields. | ||||||||||||||||||||||||||
marciw marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
It is possible to override the default sort configuration by configuring settings `index.sort.field` and | ||||||||||||||||||||||||||
`index.sort.order`. Section <<index-modules-index-sorting>> covers this topic in detail. To modify the sort | ||||||||||||||||||||||||||
configuration of an existing data stream, update the data stream's component templates, and then perform or wait for a | ||||||||||||||||||||||||||
<<data-streams-rollover,rollover>>. | ||||||||||||||||||||||||||
kkrik-es marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
`index.sort.missing`: `["_first", "_first"]`:: | ||||||||||||||||||||||||||
Missing values are sorted to appear `_first`. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
You can override these default sort settings. For example, to sort on different fields | ||||||||||||||||||||||||||
and change the order, manually configure `index.sort.field` and `index.sort.order`. For more details, see | ||||||||||||||||||||||||||
<<index-modules-index-sorting>>. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
When using the default sort settings, the `host.name` field is automatically injected into the index mappings as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and retrieved based on the `host.name` and `@timestamp` fields. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
NOTE: If `subobjects` is set to `true` (default), the `host` field is mapped as an object field | ||||||||||||||||||||||||||
named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`, | ||||||||||||||||||||||||||
a single `host.name` field is mapped as a `keyword` field. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
To apply different sort settings to an existing data stream, update the data stream's component templates, and then | ||||||||||||||||||||||||||
perform or wait for a <<data-streams-rollover,rollover>>. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
NOTE: In `logsdb` mode, the `@timestamp` field is automatically injected if it's not already present. If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not | ||||||||||||||||||||||||||
automatically added to the list of sort fields. | ||||||||||||||||||||||||||
NOTE: If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not | ||||||||||||||||||||||||||
automatically added to the list of sort fields. It is highly recommended to include it manually, as the last sort | ||||||||||||||||||||||||||
field with `desc` ordering. | ||||||||||||||||||||||||||
kkrik-es marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
[discrete] | ||||||||||||||||||||||||||
[[logsdb-host-name]] | ||||||||||||||||||||||||||
==== Existing data streams | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied. | ||||||||||||||||||||||||||
If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
To avoid mapping conflicts, consider these options: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
@@ -114,7 +105,30 @@ To avoid mapping conflicts, consider these options: | |||||||||||||||||||||||||
|
||||||||||||||||||||||||||
* **Switch to a different <<index-mode-setting,index mode>>**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual). | ||||||||||||||||||||||||||
IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual). | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
[discrete] | ||||||||||||||||||||||||||
[[logsdb-sort-routign]] | ||||||||||||||||||||||||||
kkrik-es marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
==== Optimized routing on sort fields | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The storage footprint of `logsdb` indexes can further be reduced by enabling a routing optimization that relies on | ||||||||||||||||||||||||||
the fields in the sort configuration (except for `@timestamp`) to route documents to shards. The storage wins depend on | ||||||||||||||||||||||||||
the sort configuration and the nature of the logged data - we observed 20% storage reductions in our benchmarks, | ||||||||||||||||||||||||||
compared to the default configuration for `logsdb` mode. Combined with a negligible penalty to ingest | ||||||||||||||||||||||||||
performance (1-4%), this optimization is a good option for data streams that are expected to grow substantially with | ||||||||||||||||||||||||||
time. | ||||||||||||||||||||||||||
kkrik-es marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Configuring the routing optimization requires the following: | ||||||||||||||||||||||||||
kkrik-es marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
* Include index setting `[index.logsdb.route_on_sort_fields:true]` in the data stream configuration. | ||||||||||||||||||||||||||
* <<index-modules-index-sorting, Configure index sorting>> with 2 or more fields, in addition to `@timestamp`. | ||||||||||||||||||||||||||
* Make sure <<mapping-id-field, field `_id`>> is not populated in ingested documents, as it needs to get | ||||||||||||||||||||||||||
auto-generated. | ||||||||||||||||||||||||||
kkrik-es marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a | ||||||||||||||||||||||||||
logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage | ||||||||||||||||||||||||||
efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary | ||||||||||||||||||||||||||
(e.g. `host.name` and `host.id` are likely a bad choice). | ||||||||||||||||||||||||||
|
Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a | |
logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage | |
efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary | |
(e.g. `host.name` and `host.id` are likely a bad choice). | |
Logging spikes can cause hotspots by producing documents that all get routed to a single | |
shard. To prevent hotspots and improve storage efficiency, your configuration should use a few sort fields that have a relatively low cardinality and don't co-vary (for example, `host.name` and `host.id` are not optimal). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic requires a custom sort config to reduce the likelihood of hotspots, as opposed to working with the default sort config. I think the updated text (and my version, possibly) missed this part. Maybe we can clarify this better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks, I see what you mean. I'd suggest losing the logging spikes sentence -- WDYT of this?
Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a | |
logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage | |
efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary | |
(e.g. `host.name` and `host.id` are likely a bad choice). | |
A custom sort configuration is required, to minimize hotspots and improve storage efficiency. For best results, use a few sort fields that have a relatively low cardinality and don't co-vary | |
(for example, `host.name` and `host.id` are not optimal). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, though it'd be nice to explain what leads to hotspots - I don't think this is mentioned elsewhere in this page. Another possibility is to include such a note above, where we describe the option for custom sort config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK one more try 🙂
A custom sort configuration is required, to improve storage efficiency and to
minimize hotspots from logging spikes that route documents to a single shard.
For best results, use a few sort fields that have a relatively low cardinality and
don't co-vary (for example, `host.name` and `host.id` are not optimal).
Uh oh!
There was an error while loading. Please reload this page.