Skip to content

Commit f8336f2

Browse files
docs: improve logsdb docs including default values
1 parent 6be3036 commit f8336f2

File tree

1 file changed

+120
-0
lines changed

1 file changed

+120
-0
lines changed

docs/reference/data-streams/logs.asciidoc

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,3 +50,123 @@ DELETE _index_template/my-index-template
5050
----
5151
// TEST[continued]
5252
////
53+
54+
[[logsdb-default-settings]]
55+
56+
=== Synthetic source
57+
58+
By default, `logsdb` mode uses <<synthetic-source,synthetic `_source`>>, which omits storing the original `_source`
59+
field and synthesizes it from doc values or stored fields upon document retrieval.
60+
61+
=== LogsDB for logs data streams
62+
63+
In Elasticsearch, `logsdb` mode is applied by default for data streams whose name matches the pattern `logs-*-*`.
64+
This pattern identifies a logs data stream, and Elasticsearch automatically configures the data stream to use LogsDB.
65+
66+
Users are allowed to opt out of `logsdb` mode by overriding the `index.mode` setting in the index settings or by
67+
using composable or index templates to customize the indexing configuration. This allows for flexibility in choosing
68+
the appropriate indexing mode for different data streams if LogsDB is not desired.
69+
70+
For data streams not matching the pattern `logs-*-*` and for standalone indices, users can still use the `index.mode`
71+
setting to enable LogsDB.
72+
73+
=== Index sort settings
74+
75+
The following settings are applied by default when using the `logsdb` mode for index sorting:
76+
77+
* `index.sort.field`: `["host.name", "@timestamp"]`
78+
In `logsdb` mode, indices are sorted by `host.name` and `@timestamp` fields by default. For data streams, the
79+
`@timestamp` field is automatically injected if it is not present in the indexed documents.
80+
81+
* `index.sort.order`: `["desc", "desc"]`
82+
The default sort order for both fields is descending (`desc`), prioritizing the latest data.
83+
84+
* `index.sort.mode`: `["min", "min"]`
85+
The default sort mode is `min`, ensuring that indices are sorted by the minimum value of multi-valued fields.
86+
87+
* `index.sort.missing`: `["_first", "_first"]`
88+
Missing values are sorted to appear first (`_first`) in `logsdb` mode.
89+
90+
`logsdb` mode allows users to override the default sort settings. For instance, users can specify their own fields
91+
and order for sorting by modifying the `index.sort.field` and `index.sort.order`.
92+
93+
If no custom sort settings are used, the `host.name` field is automatically injected into the mappings of the
94+
index as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and
95+
retrieved based on the `host.name` and `@timestamp` fields.
96+
97+
[NOTE]
98+
====
99+
If `subobjects` is `true` (the default), the `host.name` field will be mapped as the `host` object with a `name`
100+
child `keyword` field. If `subobjects` is `false`, a single `host.name` field will be mapped as a `keyword` field.
101+
102+
Sort settings are final and cannot be changed after an index is created. Changing settings requires creating a new
103+
index with new settings applied to it.
104+
105+
If these setting are not appropriate for your mappings we recommend changing them. Keep in mind that sort settings will
106+
affect indexing throughput and query latency.
107+
====
108+
109+
=== Specialized codecs
110+
111+
`logsdb` mode uses the `best_compression` codec by default, which applies {wikipedia}/Zstd[ZSTD] compression to stored
112+
fields.
113+
114+
Users are allowed to override the default compression codec. If desired, they can switch to the `best_speed`
115+
codec for faster compression at the expense of slightly larger storage footprint.
116+
117+
* `index.codec`: `"best_compression"`
118+
This is the default setting, applying {wikipedia}/Zstd[ZSTD] compression to stored fields for optimal storage
119+
efficiency.
120+
121+
* `index.codec`: `"best_speed"`
122+
If faster indexing performance is required, users can opt for `best_speed` compression, which sacrifices some storage
123+
efficiency for higher indexing throughput.
124+
125+
`logsdb` mode adopts specialized codecs for `doc_values` fields that are crafted to optimize storage usage.
126+
Users can rely on these specialized codecs being applied by default when using `logsdb` mode.
127+
128+
=== `ignore_malformed` and `ignore_above` settings
129+
130+
By default, LogsDB mode sets `ignore_malformed` to `true`. This setting allows documents with malformed fields to be
131+
indexed without causing indexing failures, ensuring that log data ingestion continues smoothly even when some fields
132+
contain invalid or improperly formatted data.
133+
134+
* `index.mapping.ignore_malformed`: `true`
135+
This setting ensures that malformed fields are ignored during indexing.
136+
137+
Users can override this setting by setting `ignore_malformed` to `false`. However, this is not recommended as it might
138+
result in documents with malformed fields being rejected and not indexed at all.
139+
140+
In `logsdb` mode, the `index.mapping.ignore_above` setting is applied by default at the index level to ensure efficient
141+
storage and indexing of large text fields.
142+
The mapping-level `ignore_above` setting still takes precedence. If a specific field has an `ignore_above` value
143+
defined in its mapping, that value will override the index-level `index.mapping.ignore_above` default. The index-level
144+
default for `ignore_above` is set to 8191 **characters**. If using UTF-8 encoding, this results
145+
in a limit of 32764 bytes.
146+
147+
This default behavior helps to optimize indexing performance by preventing excessively large string values from being
148+
indexed, while still allowing users to customize the limit at the mapping level as needed.
149+
150+
[NOTE]
151+
====
152+
Synthetic source provides support for retrieving ignored fields and their values even for malformed fields.
153+
====
154+
155+
`logsdb` mode uses a special field named `_ignored_source` that allows retrieving values for fields that have been
156+
ignored for various reasons (e.g., due to malformed data or indexing rules). This field ensures that even ignored
157+
field values can be accessed if needed.
158+
159+
The `_ignored_source` field is not returned by default and must be explicitly requested. Additionally, the field is
160+
encoded, and the encoding format may change over time, so users should not rely on the encoding or the field name
161+
remaining the same.
162+
163+
To retrieve this field, it must be explicitly requested either via the field or stored fields API using
164+
`_ignored_source` as the field name.
165+
166+
=== Fields without doc values
167+
168+
When `logsdb` mode uses synthetic `_source`, and `doc_values` are disabled for a field in the mapping, Elasticsearch
169+
automatically sets the `store` setting to `true` for that field. This ensures that the field's data is still available
170+
for reconstructing the document’s source when retrieving it via <<synthetic-source,synthetic `_source`>>.
171+
This automatic adjustment allows synthetic source to work correctly, even when doc values are not enabled for certain
172+
fields.

0 commit comments

Comments
 (0)