You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* [DOCS] Update documentation for index sorting and routing for logsdb
* update
* Apply suggestions from code review
* Update logs.asciidoc
* Update docs/reference/data-streams/logs.asciidoc
* Update logs.asciidoc
---------
Co-authored-by: Marci W <[email protected]>
If you have the required https://www.elastic.co/subscriptions[subscription], `logsdb` index mode uses <<synthetic-source,synthetic `_source`>>, which omits storing the original `_source`
57
-
field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval.
57
+
field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval.
58
58
59
59
If you don't have the required https://www.elastic.co/subscriptions[subscription], `logsdb` mode uses the original `_source` field.
60
60
61
-
Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>.
61
+
Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>.
62
62
63
63
When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values
64
64
are preserved for <<synthetic-source,synthetic source>> reconstruction. In `logsdb`, the default value is `arrays`,
65
65
which retains both duplicate values and the order of entries. However, the exact structure of
66
-
array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some
67
-
log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events.
66
+
array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some
67
+
log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events.
68
68
69
69
[discrete]
70
70
[[logsdb-sort-settings]]
71
71
=== Index sort settings
72
72
73
-
In `logsdb` index mode, the following sort settings are applied by default:
73
+
In `logsdb` index mode, indices are sorted by the fields `host.name` and `@timestamp` by default.
Indices are sorted by `host.name` and `@timestamp` by default. The `@timestamp` field is automatically injected if it is not present.
77
-
78
-
`index.sort.order`: `["desc", "desc"]`::
79
-
Both `host.name` and `@timestamp` are sorted in descending (`desc`) order, prioritizing the latest data.
80
-
81
-
`index.sort.mode`: `["min", "min"]`::
82
-
The `min` mode sorts indices by the minimum value of multi-value fields.
83
-
84
-
`index.sort.missing`: `["_first", "_first"]`::
85
-
Missing values are sorted to appear `_first`.
86
-
87
-
You can override these default sort settings. For example, to sort on different fields
88
-
and change the order, manually configure `index.sort.field` and `index.sort.order`. For more details, see
89
-
<<index-modules-index-sorting>>.
90
-
91
-
When using the default sort settings, the `host.name` field is automatically injected into the index mappings as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and retrieved based on the `host.name` and `@timestamp` fields.
92
-
93
-
NOTE: If `subobjects` is set to `true` (default), the `host` field is mapped as an object field
94
-
named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`,
75
+
* If the `@timestamp` field is not present, it is automatically injected.
76
+
* If the `host.name` field is not present, it is automatically injected as a `keyword` field, if possible.
77
+
** If `host.name` can't be injected (for example, `host` is a keyword field) or can't be used for sorting
78
+
(for example, its value is an IP address), only the `@timestamp` is used for sorting.
79
+
** If `host.name` is injected and `subobjects` is set to `true` (default), the `host` field is mapped as
80
+
an object field named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`,
95
81
a single `host.name` field is mapped as a `keyword` field.
82
+
* To prioritize the latest data, `host.name` is sorted in ascending order and `@timestamp` is sorted in
83
+
descending order.
84
+
85
+
You can override the default sort settings by manually configuring `index.sort.field`
86
+
and `index.sort.order`. For more details, see <<index-modules-index-sorting>>.
96
87
97
-
To apply different sort settings to an existing data stream, update the data stream's component templates, and then
98
-
perform or wait for a <<data-streams-rollover,rollover>>.
88
+
To modify the sort configuration of an existing data stream, update the data stream's
89
+
component templates, and then perform or wait for a <<data-streams-rollover,rollover>>.
99
90
100
-
NOTE: In `logsdb` mode, the `@timestamp` field is automatically injected if it's not already present. If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
101
-
automatically added to the list of sort fields.
91
+
NOTE: If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
92
+
automatically added to the list of sort fields. For best results, include it manually as the last sort
93
+
field, with `desc` ordering.
102
94
103
95
[discrete]
104
96
[[logsdb-host-name]]
105
97
==== Existing data streams
106
98
107
-
If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied.
99
+
If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied.
108
100
109
101
To avoid mapping conflicts, consider these options:
110
102
@@ -114,7 +106,29 @@ To avoid mapping conflicts, consider these options:
114
106
115
107
* **Switch to a different <<index-mode-setting,index mode>>**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode.
116
108
117
-
IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual).
109
+
IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual).
110
+
111
+
[discrete]
112
+
[[logsdb-sort-routing]]
113
+
==== Optimized routing on sort fields
114
+
115
+
To reduce the storage footprint of `logsdb` indexes, you can enable routing optimizations. A routing optimization uses the fields in the sort configuration (except for `@timestamp`) to route documents to shards.
116
+
117
+
In benchmarks,
118
+
routing optimizations reduced storage requirements by 20% compared to the default `logsdb` configuration, with a negligible penalty to ingestion
119
+
performance (1-4%). Routing optimizations can benefit data streams that are expected to grow substantially over
120
+
time. Exact results depend on the sort configuration and the nature of the logged data.
121
+
122
+
To configure a routing optimization:
123
+
124
+
* Include the index setting `[index.logsdb.route_on_sort_fields:true]` in the data stream configuration.
125
+
* <<index-modules-index-sorting, Configure index sorting>> with two or more fields, in addition to `@timestamp`.
126
+
* Make sure the <<mapping-id-field,`_id`>> field is not populated in ingested documents. It should be
127
+
auto-generated instead.
128
+
129
+
A custom sort configuration is required, to improve storage efficiency and to minimize hotspots
130
+
from logging spikes that may route documents to a single shard. For best results, use a few sort fields
131
+
that have a relatively low cardinality and don't co-vary (for example, `host.name` and `host.id` are not optimal).
118
132
119
133
[discrete]
120
134
[[logsdb-specialized-codecs]]
@@ -123,7 +137,7 @@ IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-
123
137
By default, `logsdb` index mode uses the `best_compression` <<index-codec,codec>>, which applies {wikipedia}/Zstd[ZSTD]
124
138
compression to stored fields. You can switch to the `default` codec for faster compression with a slightly larger storage footprint.
125
139
126
-
The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are
140
+
The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are
127
141
encoded using the following sequence of codecs:
128
142
129
143
* **Delta encoding**:
@@ -173,9 +187,9 @@ _characters._ Using UTF-8 encoding, this results in a limit of 32764 bytes, depe
173
187
174
188
The mapping-level `ignore_above` setting takes precedence. If a specific field has an `ignore_above` value
175
189
defined in its mapping, that value overrides the index-level `index.mapping.ignore_above` value. This default
176
-
behavior helps to optimize indexing performance by preventing excessively large string values from being indexed.
190
+
behavior helps to optimize indexing performance by preventing excessively large string values from being indexed.
177
191
178
-
If you need to customize the limit, you can override it at the mapping level or change the index level default.
192
+
If you need to customize the limit, you can override it at the mapping level or change the index level default.
179
193
180
194
[discrete]
181
195
[[logs-db-ignore-limit]]
@@ -202,7 +216,7 @@ reconstructing the original value.
202
216
[[logsdb-settings-summary]]
203
217
=== Settings reference
204
218
205
-
The `logsdb` index mode uses the following settings:
219
+
The `logsdb` index mode uses the following settings:
0 commit comments