@@ -50,3 +50,123 @@ DELETE _index_template/my-index-template
5050----
5151// TEST[continued]
5252////
53+
54+ [[logsdb-default-settings]]
55+
56+ === Synthetic source
57+
58+ By default, `logsdb` mode uses <<synthetic-source,synthetic `_source`>>, which omits storing the original `_source`
59+ field and synthesizes it from doc values or stored fields upon document retrieval.
60+
61+ === LogsDB for logs data streams
62+
63+ In Elasticsearch, `logsdb` mode is applied by default for data streams whose name matches the pattern `logs-*-*`.
64+ This pattern identifies a logs data stream, and Elasticsearch automatically configures the data stream to use LogsDB.
65+
66+ Users are allowed to opt out of `logsdb` mode by overriding the `index.mode` setting in the index settings or by
67+ using composable or index templates to customize the indexing configuration. This allows for flexibility in choosing
68+ the appropriate indexing mode for different data streams if LogsDB is not desired.
69+
70+ For data streams not matching the pattern `logs-*-*` and for standalone indices, users can still use the `index.mode`
71+ setting to enable LogsDB.
72+
73+ === Index sort settings
74+
75+ The following settings are applied by default when using the `logsdb` mode for index sorting:
76+
77+ * `index.sort.field`: `["host.name", "@timestamp"]`
78+ In `logsdb` mode, indices are sorted by `host.name` and `@timestamp` fields by default. For data streams, the
79+ `@timestamp` field is automatically injected if it is not present in the indexed documents.
80+
81+ * `index.sort.order`: `["desc", "desc"]`
82+ The default sort order for both fields is descending (`desc`), prioritizing the latest data.
83+
84+ * `index.sort.mode`: `["min", "min"]`
85+ The default sort mode is `min`, ensuring that indices are sorted by the minimum value of multi-valued fields.
86+
87+ * `index.sort.missing`: `["_first", "_first"]`
88+ Missing values are sorted to appear first (`_first`) in `logsdb` mode.
89+
90+ `logsdb` mode allows users to override the default sort settings. For instance, users can specify their own fields
91+ and order for sorting by modifying the `index.sort.field` and `index.sort.order`.
92+
93+ If no custom sort settings are used, the `host.name` field is automatically injected into the mappings of the
94+ index as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and
95+ retrieved based on the `host.name` and `@timestamp` fields.
96+
97+ [NOTE]
98+ ====
99+ If `subobjects` is `true` (the default), the `host.name` field will be mapped as the `host` object with a `name`
100+ child `keyword` field. If `subobjects` is `false`, a single `host.name` field will be mapped as a `keyword` field.
101+
102+ Sort settings are final and cannot be changed after an index is created. Changing settings requires creating a new
103+ index with new settings applied to it.
104+
105+ If these setting are not appropriate for your mappings we recommend changing them. Keep in mind that sort settings will
106+ affect indexing throughput and query latency.
107+ ====
108+
109+ === Specialized codecs
110+
111+ `logsdb` mode uses the `best_compression` codec by default, which applies {wikipedia}/Zstd[ZSTD] compression to stored
112+ fields.
113+
114+ Users are allowed to override the default compression codec. If desired, they can switch to the `best_speed`
115+ codec for faster compression at the expense of slightly larger storage footprint.
116+
117+ * `index.codec`: `"best_compression"`
118+ This is the default setting, applying {wikipedia}/Zstd[ZSTD] compression to stored fields for optimal storage
119+ efficiency.
120+
121+ * `index.codec`: `"best_speed"`
122+ If faster indexing performance is required, users can opt for `best_speed` compression, which sacrifices some storage
123+ efficiency for higher indexing throughput.
124+
125+ `logsdb` mode adopts specialized codecs for `doc_values` fields that are crafted to optimize storage usage.
126+ Users can rely on these specialized codecs being applied by default when using `logsdb` mode.
127+
128+ === `ignore_malformed` and `ignore_above` settings
129+
130+ By default, LogsDB mode sets `ignore_malformed` to `true`. This setting allows documents with malformed fields to be
131+ indexed without causing indexing failures, ensuring that log data ingestion continues smoothly even when some fields
132+ contain invalid or improperly formatted data.
133+
134+ * `index.mapping.ignore_malformed`: `true`
135+ This setting ensures that malformed fields are ignored during indexing.
136+
137+ Users can override this setting by setting `ignore_malformed` to `false`. However, this is not recommended as it might
138+ result in documents with malformed fields being rejected and not indexed at all.
139+
140+ In `logsdb` mode, the `index.mapping.ignore_above` setting is applied by default at the index level to ensure efficient
141+ storage and indexing of large text fields.
142+ The mapping-level `ignore_above` setting still takes precedence. If a specific field has an `ignore_above` value
143+ defined in its mapping, that value will override the index-level `index.mapping.ignore_above` default. The index-level
144+ default for `ignore_above` is set to 8191 **characters**. If using UTF-8 encoding, this results
145+ in a limit of 32764 bytes.
146+
147+ This default behavior helps to optimize indexing performance by preventing excessively large string values from being
148+ indexed, while still allowing users to customize the limit at the mapping level as needed.
149+
150+ [NOTE]
151+ ====
152+ Synthetic source provides support for retrieving ignored fields and their values even for malformed fields.
153+ ====
154+
155+ `logsdb` mode uses a special field named `_ignored_source` that allows retrieving values for fields that have been
156+ ignored for various reasons (e.g., due to malformed data or indexing rules). This field ensures that even ignored
157+ field values can be accessed if needed.
158+
159+ The `_ignored_source` field is not returned by default and must be explicitly requested. Additionally, the field is
160+ encoded, and the encoding format may change over time, so users should not rely on the encoding or the field name
161+ remaining the same.
162+
163+ To retrieve this field, it must be explicitly requested either via the field or stored fields API using
164+ `_ignored_source` as the field name.
165+
166+ === Fields without doc values
167+
168+ When `logsdb` mode uses synthetic `_source`, and `doc_values` are disabled for a field in the mapping, Elasticsearch
169+ automatically sets the `store` setting to `true` for that field. This ensures that the field's data is still available
170+ for reconstructing the document’s source when retrieving it via <<synthetic-source,synthetic `_source`>>.
171+ This automatic adjustment allows synthetic source to work correctly, even when doc values are not enabled for certain
172+ fields.
0 commit comments