diff --git a/docs/reference/data-streams/logs.asciidoc b/docs/reference/data-streams/logs.asciidoc index 3af5e09889a89..7058cfe51496f 100644 --- a/docs/reference/data-streams/logs.asciidoc +++ b/docs/reference/data-streams/logs.asciidoc @@ -1,9 +1,9 @@ [[logs-data-stream]] == Logs data stream -IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted -and self-managed Elasticsearch as of version 8.17, and is enabled by default for -logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}]. +IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted +and self-managed Elasticsearch as of version 8.17, and is enabled by default for +logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}]. A logs data stream is a data stream type that stores log data more efficiently. @@ -54,57 +54,49 @@ DELETE _index_template/my-index-template === Synthetic source If you have the required https://www.elastic.co/subscriptions[subscription], `logsdb` index mode uses <>, which omits storing the original `_source` -field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval. +field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval. If you don't have the required https://www.elastic.co/subscriptions[subscription], `logsdb` mode uses the original `_source` field. -Before using synthetic source, make sure to review the <>. +Before using synthetic source, make sure to review the <>. When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values are preserved for <> reconstruction. In `logsdb`, the default value is `arrays`, which retains both duplicate values and the order of entries. However, the exact structure of -array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some -log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events. +array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some +log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events. [discrete] [[logsdb-sort-settings]] === Index sort settings -In `logsdb` index mode, the following sort settings are applied by default: +In `logsdb` index mode, indices are sorted by the fields `host.name` and `@timestamp` by default. -`index.sort.field`: `["host.name", "@timestamp"]`:: -Indices are sorted by `host.name` and `@timestamp` by default. The `@timestamp` field is automatically injected if it is not present. - -`index.sort.order`: `["desc", "desc"]`:: -Both `host.name` and `@timestamp` are sorted in descending (`desc`) order, prioritizing the latest data. - -`index.sort.mode`: `["min", "min"]`:: -The `min` mode sorts indices by the minimum value of multi-value fields. - -`index.sort.missing`: `["_first", "_first"]`:: -Missing values are sorted to appear `_first`. - -You can override these default sort settings. For example, to sort on different fields -and change the order, manually configure `index.sort.field` and `index.sort.order`. For more details, see -<>. - -When using the default sort settings, the `host.name` field is automatically injected into the index mappings as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and retrieved based on the `host.name` and `@timestamp` fields. - -NOTE: If `subobjects` is set to `true` (default), the `host` field is mapped as an object field -named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`, +* If the `@timestamp` field is not present, it is automatically injected. +* If the `host.name` field is not present, it is automatically injected as a `keyword` field, if possible. +** If `host.name` can't be injected (for example, `host` is a keyword field) or can't be used for sorting +(for example, its value is an IP address), only the `@timestamp` is used for sorting. +** If `host.name` is injected and `subobjects` is set to `true` (default), the `host` field is mapped as +an object field named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`, a single `host.name` field is mapped as a `keyword` field. +* To prioritize the latest data, `host.name` is sorted in ascending order and `@timestamp` is sorted in +descending order. + +You can override the default sort settings by manually configuring `index.sort.field` +and `index.sort.order`. For more details, see <>. -To apply different sort settings to an existing data stream, update the data stream's component templates, and then -perform or wait for a <>. +To modify the sort configuration of an existing data stream, update the data stream's +component templates, and then perform or wait for a <>. -NOTE: In `logsdb` mode, the `@timestamp` field is automatically injected if it's not already present. If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not -automatically added to the list of sort fields. +NOTE: If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not +automatically added to the list of sort fields. For best results, include it manually as the last sort +field, with `desc` ordering. [discrete] [[logsdb-host-name]] ==== Existing data streams -If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied. +If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied. To avoid mapping conflicts, consider these options: @@ -114,7 +106,29 @@ To avoid mapping conflicts, consider these options: * **Switch to a different <>**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode. -IMPORTANT: On existing data streams, `logsdb` mode is applied on <> (automatic or manual). +IMPORTANT: On existing data streams, `logsdb` mode is applied on <> (automatic or manual). + +[discrete] +[[logsdb-sort-routing]] +==== Optimized routing on sort fields + +To reduce the storage footprint of `logsdb` indexes, you can enable routing optimizations. A routing optimization uses the fields in the sort configuration (except for `@timestamp`) to route documents to shards. + +In benchmarks, +routing optimizations reduced storage requirements by 20% compared to the default `logsdb` configuration, with a negligible penalty to ingestion +performance (1-4%). Routing optimizations can benefit data streams that are expected to grow substantially over +time. Exact results depend on the sort configuration and the nature of the logged data. + +To configure a routing optimization: + + * Include the index setting `[index.logsdb.route_on_sort_fields:true]` in the data stream configuration. + * <> with two or more fields, in addition to `@timestamp`. + * Make sure the <> field is not populated in ingested documents. It should be + auto-generated instead. + +A custom sort configuration is required, to improve storage efficiency and to minimize hotspots +from logging spikes that may route documents to a single shard. For best results, use a few sort fields +that have a relatively low cardinality and don't co-vary (for example, `host.name` and `host.id` are not optimal). [discrete] [[logsdb-specialized-codecs]] @@ -123,7 +137,7 @@ IMPORTANT: On existing data streams, `logsdb` mode is applied on <>, which applies {wikipedia}/Zstd[ZSTD] compression to stored fields. You can switch to the `default` codec for faster compression with a slightly larger storage footprint. -The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are +The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are encoded using the following sequence of codecs: * **Delta encoding**: @@ -173,9 +187,9 @@ _characters._ Using UTF-8 encoding, this results in a limit of 32764 bytes, depe The mapping-level `ignore_above` setting takes precedence. If a specific field has an `ignore_above` value defined in its mapping, that value overrides the index-level `index.mapping.ignore_above` value. This default -behavior helps to optimize indexing performance by preventing excessively large string values from being indexed. +behavior helps to optimize indexing performance by preventing excessively large string values from being indexed. -If you need to customize the limit, you can override it at the mapping level or change the index level default. +If you need to customize the limit, you can override it at the mapping level or change the index level default. [discrete] [[logs-db-ignore-limit]] @@ -202,7 +216,7 @@ reconstructing the original value. [[logsdb-settings-summary]] === Settings reference -The `logsdb` index mode uses the following settings: +The `logsdb` index mode uses the following settings: * **`index.mode`**: `"logsdb"`