elastic · kkrik-es · Jan 27, 2025 · Jan 23, 2025 · Jan 23, 2025 · Jan 23, 2025
diff --git a/docs/reference/data-streams/logs.asciidoc b/docs/reference/data-streams/logs.asciidoc
@@ -1,9 +1,9 @@
 [[logs-data-stream]]
 == Logs data stream
 
-IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted 
-and self-managed Elasticsearch as of version 8.17, and is enabled by default for 
-logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}]. 
+IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted
+and self-managed Elasticsearch as of version 8.17, and is enabled by default for
+logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}].
 
 A logs data stream is a data stream type that stores log data more efficiently.
 
@@ -54,57 +54,48 @@ DELETE _index_template/my-index-template
 === Synthetic source
 
 If you have the required https://www.elastic.co/subscriptions[subscription], `logsdb` index mode uses <<synthetic-source,synthetic `_source`>>, which omits storing the original `_source`
-field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval. 
+field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval.
 
 If you don't have the required https://www.elastic.co/subscriptions[subscription], `logsdb` mode uses the original `_source` field.
 
-Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>. 
+Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>.
 
 When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values
 are preserved for <<synthetic-source,synthetic source>> reconstruction. In `logsdb`, the default value is `arrays`,
 which retains both duplicate values and the order of entries. However, the exact structure of
-array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some 
-log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events. 
+array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some
+log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events.
 
 [discrete]
 [[logsdb-sort-settings]]
 === Index sort settings
 
-In `logsdb` index mode, the following sort settings are applied by default:
+In `logsdb` index mode, indices are sorted by fields `host.name` and `@timestamp` by default. The `@timestamp` field is
+automatically injected if it is not present. The `host.name` field is automatically injected as `keyword` if it is not
+present and can be injected - this may not be possible if  `host` is a keyword field, for instance. If field
+`host.name` can't be injected or can't be used for sorting (e.g. it's an IP field), sorting is only applied to field
+`@timestamp`.
 
-`index.sort.field`: `["host.name", "@timestamp"]`::
-Indices are sorted by `host.name` and `@timestamp` by default. The `@timestamp` field is automatically injected if it is not present.
+NOTE: If `host.name` is injected and `subobjects` is set to `true` (default), the `host` field is mapped as an object
+field named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`, a single
+`host.name` field is mapped as a `keyword` field.
 
-`index.sort.order`: `["desc", "desc"]`::
-Both `host.name` and `@timestamp` are sorted in descending (`desc`) order, prioritizing the latest data.
+`host.name` and `@timestamp` are sorted in ascending and descending order respectively, prioritizing the latest data.
 
-`index.sort.mode`: `["min", "min"]`::
-The `min` mode sorts indices by the minimum value of multi-value fields.
+It is possible to override the default sort configuration by configuring settings `index.sort.field` and
+`index.sort.order`. Section <<index-modules-index-sorting>> covers this topic in detail. To modify the sort
+configuration of an existing data stream, update the data stream's component templates, and then perform or wait for a
+<<data-streams-rollover,rollover>>.
 
-`index.sort.missing`: `["_first", "_first"]`::
-Missing values are sorted to appear `_first`.
-
-You can override these default sort settings. For example, to sort on different fields
-and change the order, manually configure `index.sort.field` and `index.sort.order`. For more details, see
-<<index-modules-index-sorting>>.
-
-When using the default sort settings, the `host.name` field is automatically injected into the index mappings as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and retrieved based on the `host.name` and `@timestamp` fields.
-
-NOTE: If `subobjects` is set to `true` (default), the `host` field is mapped as an object field
-named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`,
-a single `host.name` field is mapped as a `keyword` field.
-
-To apply different sort settings to an existing data stream, update the data stream's component templates, and then 
-perform or wait for a <<data-streams-rollover,rollover>>.
-
-NOTE: In `logsdb` mode, the `@timestamp` field is automatically injected if it's not already present. If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
-automatically added to the list of sort fields.
+NOTE: If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
+automatically added to the list of sort fields. It is highly recommended to include it manually, as the last sort
+field with `desc` ordering.
 
 [discrete]
 [[logsdb-host-name]]
 ==== Existing data streams
 
-If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied. 
+If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied.
 
 To avoid mapping conflicts, consider these options:
 
@@ -114,7 +105,30 @@ To avoid mapping conflicts, consider these options:
 
 * **Switch to a different <<index-mode-setting,index mode>>**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode.
 
-IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual). 
+IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual).
+
+[discrete]
+[[logsdb-sort-routign]]
+==== Optimized routing on sort fields
+
+The storage footprint of `logsdb` indexes can further be reduced by enabling a routing optimization that relies on
+the fields in the sort configuration (except for `@timestamp`) to route documents to shards. The storage wins depend on
+the sort configuration and the nature of the logged data - we observed 20% storage reductions in our benchmarks,
+compared to the default configuration for `logsdb` mode. Combined with a negligible penalty to ingest
+performance (1-4%), this optimization is a good option for data streams that are expected to grow substantially with
+time.
+
+Configuring the routing optimization requires the following:
+
+ * Include index setting `[index.logsdb.route_on_sort_fields:true]` in the data stream configuration.
+ * <<index-modules-index-sorting, Configure index sorting>> with 2 or more fields, in addition to `@timestamp`.
+ * Make sure <<mapping-id-field, field `_id`>> is not populated in ingested documents, as it needs to get
+   auto-generated.
+
+Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a
+logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage
+efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary
+(e.g. `host.name` and `host.id` are likely a bad choice).
-Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a
-logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage
-efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary
-(e.g. `host.name` and `host.id` are likely a bad choice).
+Logging spikes can cause hotspots by producing documents that all get routed to a single 
+shard. To prevent hotspots and improve storage efficiency, your configuration should use a few sort fields that have a relatively low cardinality and don't co-vary (for example, `host.name` and `host.id` are not optimal).
-Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a
-logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage
-efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary
-(e.g. `host.name` and `host.id` are likely a bad choice).
+A custom sort configuration is required, to minimize hotspots and improve storage efficiency. For best results, use a few sort fields that have a relatively low cardinality and don't co-vary
+(for example, `host.name` and `host.id` are not optimal).
-Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a
-logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage
-efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary
-(e.g. `host.name` and `host.id` are likely a bad choice).
+Logging spikes can cause hotspots by producing documents that all get routed to a single 
+shard. To prevent hotspots and improve storage efficiency, your configuration should use a few sort fields that have a relatively low cardinality and don't co-vary (for example, `host.name` and `host.id` are not optimal).
-Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a
-logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage
-efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary
-(e.g. `host.name` and `host.id` are likely a bad choice).
+A custom sort configuration is required, to minimize hotspots and improve storage efficiency. For best results, use a few sort fields that have a relatively low cardinality and don't co-vary
+(for example, `host.name` and `host.id` are not optimal).
 
 [discrete]
 [[logsdb-specialized-codecs]]
@@ -123,7 +137,7 @@ IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-
 By default, `logsdb` index mode uses the `best_compression` <<index-codec,codec>>, which applies {wikipedia}/Zstd[ZSTD]
 compression to stored fields. You can switch to the `default` codec for faster compression with a slightly larger storage footprint.
 
-The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are 
+The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are
 encoded using the following sequence of codecs:
 
 * **Delta encoding**:
@@ -173,9 +187,9 @@ _characters._ Using UTF-8 encoding, this results in a limit of 32764 bytes, depe
 
 The mapping-level `ignore_above` setting takes precedence. If a specific field has an `ignore_above` value
 defined in its mapping, that value overrides the index-level `index.mapping.ignore_above` value. This default
-behavior helps to optimize indexing performance by preventing excessively large string values from being indexed. 
+behavior helps to optimize indexing performance by preventing excessively large string values from being indexed.
 
-If you need to customize the limit, you can override it at the mapping level or change the index level default. 
+If you need to customize the limit, you can override it at the mapping level or change the index level default.
 
 [discrete]
 [[logs-db-ignore-limit]]
@@ -202,7 +216,7 @@ reconstructing the original value.
 [[logsdb-settings-summary]]
 === Settings reference
 
-The `logsdb` index mode uses the following settings: 
+The `logsdb` index mode uses the following settings:
 
 * **`index.mode`**: `"logsdb"`