Skip to content

Commit 40d0eb0

Browse files
kkrik-esmarciw
andauthored
[DOCS] Update documentation for index sorting and routing for logsdb (elastic#120721) (elastic#120904)
* [DOCS] Update documentation for index sorting and routing for logsdb * update * Apply suggestions from code review * Update logs.asciidoc * Update docs/reference/data-streams/logs.asciidoc * Update logs.asciidoc --------- Co-authored-by: Marci W <[email protected]>
1 parent 7245c05 commit 40d0eb0

File tree

1 file changed

+52
-38
lines changed

1 file changed

+52
-38
lines changed

docs/reference/data-streams/logs.asciidoc

Lines changed: 52 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
[[logs-data-stream]]
22
== Logs data stream
33

4-
IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted
5-
and self-managed Elasticsearch as of version 8.17, and is enabled by default for
6-
logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}].
4+
IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted
5+
and self-managed Elasticsearch as of version 8.17, and is enabled by default for
6+
logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}].
77

88
A logs data stream is a data stream type that stores log data more efficiently.
99

@@ -54,57 +54,49 @@ DELETE _index_template/my-index-template
5454
=== Synthetic source
5555

5656
If you have the required https://www.elastic.co/subscriptions[subscription], `logsdb` index mode uses <<synthetic-source,synthetic `_source`>>, which omits storing the original `_source`
57-
field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval.
57+
field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval.
5858

5959
If you don't have the required https://www.elastic.co/subscriptions[subscription], `logsdb` mode uses the original `_source` field.
6060

61-
Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>.
61+
Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>.
6262

6363
When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values
6464
are preserved for <<synthetic-source,synthetic source>> reconstruction. In `logsdb`, the default value is `arrays`,
6565
which retains both duplicate values and the order of entries. However, the exact structure of
66-
array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some
67-
log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events.
66+
array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some
67+
log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events.
6868

6969
[discrete]
7070
[[logsdb-sort-settings]]
7171
=== Index sort settings
7272

73-
In `logsdb` index mode, the following sort settings are applied by default:
73+
In `logsdb` index mode, indices are sorted by the fields `host.name` and `@timestamp` by default.
7474

75-
`index.sort.field`: `["host.name", "@timestamp"]`::
76-
Indices are sorted by `host.name` and `@timestamp` by default. The `@timestamp` field is automatically injected if it is not present.
77-
78-
`index.sort.order`: `["desc", "desc"]`::
79-
Both `host.name` and `@timestamp` are sorted in descending (`desc`) order, prioritizing the latest data.
80-
81-
`index.sort.mode`: `["min", "min"]`::
82-
The `min` mode sorts indices by the minimum value of multi-value fields.
83-
84-
`index.sort.missing`: `["_first", "_first"]`::
85-
Missing values are sorted to appear `_first`.
86-
87-
You can override these default sort settings. For example, to sort on different fields
88-
and change the order, manually configure `index.sort.field` and `index.sort.order`. For more details, see
89-
<<index-modules-index-sorting>>.
90-
91-
When using the default sort settings, the `host.name` field is automatically injected into the index mappings as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and retrieved based on the `host.name` and `@timestamp` fields.
92-
93-
NOTE: If `subobjects` is set to `true` (default), the `host` field is mapped as an object field
94-
named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`,
75+
* If the `@timestamp` field is not present, it is automatically injected.
76+
* If the `host.name` field is not present, it is automatically injected as a `keyword` field, if possible.
77+
** If `host.name` can't be injected (for example, `host` is a keyword field) or can't be used for sorting
78+
(for example, its value is an IP address), only the `@timestamp` is used for sorting.
79+
** If `host.name` is injected and `subobjects` is set to `true` (default), the `host` field is mapped as
80+
an object field named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`,
9581
a single `host.name` field is mapped as a `keyword` field.
82+
* To prioritize the latest data, `host.name` is sorted in ascending order and `@timestamp` is sorted in
83+
descending order.
84+
85+
You can override the default sort settings by manually configuring `index.sort.field`
86+
and `index.sort.order`. For more details, see <<index-modules-index-sorting>>.
9687

97-
To apply different sort settings to an existing data stream, update the data stream's component templates, and then
98-
perform or wait for a <<data-streams-rollover,rollover>>.
88+
To modify the sort configuration of an existing data stream, update the data stream's
89+
component templates, and then perform or wait for a <<data-streams-rollover,rollover>>.
9990

100-
NOTE: In `logsdb` mode, the `@timestamp` field is automatically injected if it's not already present. If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
101-
automatically added to the list of sort fields.
91+
NOTE: If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
92+
automatically added to the list of sort fields. For best results, include it manually as the last sort
93+
field, with `desc` ordering.
10294

10395
[discrete]
10496
[[logsdb-host-name]]
10597
==== Existing data streams
10698

107-
If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied.
99+
If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied.
108100

109101
To avoid mapping conflicts, consider these options:
110102

@@ -114,7 +106,29 @@ To avoid mapping conflicts, consider these options:
114106

115107
* **Switch to a different <<index-mode-setting,index mode>>**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode.
116108

117-
IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual).
109+
IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual).
110+
111+
[discrete]
112+
[[logsdb-sort-routing]]
113+
==== Optimized routing on sort fields
114+
115+
To reduce the storage footprint of `logsdb` indexes, you can enable routing optimizations. A routing optimization uses the fields in the sort configuration (except for `@timestamp`) to route documents to shards.
116+
117+
In benchmarks,
118+
routing optimizations reduced storage requirements by 20% compared to the default `logsdb` configuration, with a negligible penalty to ingestion
119+
performance (1-4%). Routing optimizations can benefit data streams that are expected to grow substantially over
120+
time. Exact results depend on the sort configuration and the nature of the logged data.
121+
122+
To configure a routing optimization:
123+
124+
* Include the index setting `[index.logsdb.route_on_sort_fields:true]` in the data stream configuration.
125+
* <<index-modules-index-sorting, Configure index sorting>> with two or more fields, in addition to `@timestamp`.
126+
* Make sure the <<mapping-id-field,`_id`>> field is not populated in ingested documents. It should be
127+
auto-generated instead.
128+
129+
A custom sort configuration is required, to improve storage efficiency and to minimize hotspots
130+
from logging spikes that may route documents to a single shard. For best results, use a few sort fields
131+
that have a relatively low cardinality and don't co-vary (for example, `host.name` and `host.id` are not optimal).
118132

119133
[discrete]
120134
[[logsdb-specialized-codecs]]
@@ -123,7 +137,7 @@ IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-
123137
By default, `logsdb` index mode uses the `best_compression` <<index-codec,codec>>, which applies {wikipedia}/Zstd[ZSTD]
124138
compression to stored fields. You can switch to the `default` codec for faster compression with a slightly larger storage footprint.
125139

126-
The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are
140+
The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are
127141
encoded using the following sequence of codecs:
128142

129143
* **Delta encoding**:
@@ -173,9 +187,9 @@ _characters._ Using UTF-8 encoding, this results in a limit of 32764 bytes, depe
173187

174188
The mapping-level `ignore_above` setting takes precedence. If a specific field has an `ignore_above` value
175189
defined in its mapping, that value overrides the index-level `index.mapping.ignore_above` value. This default
176-
behavior helps to optimize indexing performance by preventing excessively large string values from being indexed.
190+
behavior helps to optimize indexing performance by preventing excessively large string values from being indexed.
177191

178-
If you need to customize the limit, you can override it at the mapping level or change the index level default.
192+
If you need to customize the limit, you can override it at the mapping level or change the index level default.
179193

180194
[discrete]
181195
[[logs-db-ignore-limit]]
@@ -202,7 +216,7 @@ reconstructing the original value.
202216
[[logsdb-settings-summary]]
203217
=== Settings reference
204218

205-
The `logsdb` index mode uses the following settings:
219+
The `logsdb` index mode uses the following settings:
206220

207221
* **`index.mode`**: `"logsdb"`
208222

0 commit comments

Comments
 (0)