Skip to content

Commit 1c32934

Browse files
authored
[8.9] Clarify data stream recommendations and best practices (elastic#107233) (elastic#107234)
* Clarify data stream recommendations and best practices (elastic#107233) * Clarify data stream recommendations and best practices Our documentation around data streams versus aliases could be interpreted in a way where someone doing *any* updates thinks they need to use an alias with indices instead of a data stream. This commit enhances the documentation around these areas to determine the correct abstraction in a more concrete way. It also tries to clarify that data streams still allow updates to the backing indices, and that a difference is last-write-wins versus first-write-wins. * Remove dlm link
1 parent f0eed82 commit 1c32934

File tree

3 files changed

+44
-19
lines changed

3 files changed

+44
-19
lines changed

docs/reference/data-streams/data-streams.asciidoc

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,27 @@ automate the management of these backing indices. For example, you can use
1818
hardware and delete unneeded indices. {ilm-init} can help you reduce costs and
1919
overhead as your data grows.
2020

21+
22+
[discrete]
23+
[[should-you-use-a-data-stream]]
24+
== Should you use a data stream?
25+
26+
To determine whether you should use a data stream for your data, you should consider the format of
27+
the data, and your expected interaction. A good candidate for using a data stream will match the
28+
following criteria:
29+
30+
* Your data contains a timestamp field, or one could be automatically generated.
31+
* You mostly perform indexing requests, with occasional updates and deletes.
32+
* You index documents without an `_id`, or when indexing documents with an explicit `_id` you expect first-write-wins behavior.
33+
34+
For most time series data use-cases, a data stream will be a good fit. However, if you find that
35+
your data doesn't fit into these categories (for example, if you frequently send multiple documents
36+
using the same `_id` expecting last-write-wins), you may want to use an index alias with a write
37+
index instead. See documentation for <<manage-time-series-data-without-data-streams,managing time
38+
series data without a data stream>> for more information.
39+
40+
Keep in mind that some features such as <<tsds,Time Series Data Streams (TSDS)>> require a data stream.
41+
2142
[discrete]
2243
[[backing-indices]]
2344
== Backing indices
@@ -116,19 +137,19 @@ You should not derive any intelligence from the backing indices names.
116137

117138
[discrete]
118139
[[data-streams-append-only]]
119-
== Append-only
140+
== Append-only (mostly)
120141

121-
Data streams are designed for use cases where existing data is rarely,
122-
if ever, updated. You cannot send update or deletion requests for existing
123-
documents directly to a data stream. Instead, use the
142+
Data streams are designed for use cases where existing data is rarely updated. You cannot send
143+
update or deletion requests for existing documents directly to a data stream. However, you can still
144+
<<update-delete-docs-in-a-backing-index,update or delete documents>> in a data stream by submitting
145+
requests directly to the document's backing index.
146+
147+
If you need to update a larger number of documents in a data stream, you can use the
124148
<<update-docs-in-a-data-stream-by-query,update by query>> and
125149
<<delete-docs-in-a-data-stream-by-query,delete by query>> APIs.
126150

127-
If needed, you can <<update-delete-docs-in-a-backing-index,update or delete
128-
documents>> by submitting requests directly to the document's backing index.
129-
130-
TIP: If you frequently update or delete existing time series data, use an index
131-
alias with a write index instead of a data stream. See
151+
TIP: If you frequently send multiple documents using the same `_id` expecting last-write-wins, you
152+
may want to use an index alias with a write index instead. See
132153
<<manage-time-series-data-without-data-streams>>.
133154

134155
include::set-up-a-data-stream.asciidoc[]

docs/reference/ilm/ilm-tutorial.asciidoc

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -282,14 +282,15 @@ DELETE /_index_template/timeseries_template
282282
[[manage-time-series-data-without-data-streams]]
283283
=== Manage time series data without data streams
284284

285-
Even though <<data-streams, data streams>> are a convenient way to scale
286-
and manage time series data, they are designed to be append-only. We recognise there
287-
might be use-cases where data needs to be updated or deleted in place and the
288-
data streams don't support delete and update requests directly,
289-
so the index APIs would need to be used directly on the data stream's backing indices.
290-
291-
In these cases, you can use an index alias to manage indices containing the time series data
292-
and periodically roll over to a new index.
285+
Even though <<data-streams, data streams>> are a convenient way to scale and manage time series
286+
data, they are designed to be append-only. We recognise there might be use-cases where data needs to
287+
be updated or deleted in place and the data streams don't support delete and update requests
288+
directly, so the index APIs would need to be used directly on the data stream's backing indices. In
289+
these cases we still recommend using a data stream.
290+
291+
If you frequently send multiple documents using the same `_id` expecting last-write-wins, you can
292+
use an index alias instead of a data stream to manage indices containing the time series data and
293+
periodically roll over to a new index.
293294

294295
To automate rollover and management of time series indices with {ilm-init} using an index
295296
alias, you:

docs/reference/ilm/set-up-lifecycle-policy.asciidoc

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,8 @@ To use a policy that triggers the rollover action,
7676
you need to configure the policy in the index template used to create each new index.
7777
You specify the name of the policy and the alias used to reference the rolling indices.
7878

79+
TIP: An `index.lifecycle.rollover_alias` setting is only required if using {ilm} with an alias. It is unnecessary when using <<data-streams,Data Streams>>.
80+
7981
You can use the {kib} Create template wizard to create a template. To access the
8082
wizard, open the menu and go to *Stack Management > Index Management*. In the
8183
*Index Templates* tab, click *Create template*.
@@ -123,8 +125,9 @@ DELETE _index_template/my_template
123125
[[create-initial-index]]
124126
==== Create an initial managed index
125127

126-
When you set up policies for your own rolling indices, you need to manually create the first index
127-
managed by a policy and designate it as the write index.
128+
When you set up policies for your own rolling indices, if you are not using the recommended
129+
<<data-streams,data streams>>, you need to manually create the first index managed by a policy and
130+
designate it as the write index.
128131

129132
IMPORTANT: When you enable {ilm} for {beats} or the {ls} {es} output plugin,
130133
the necessary policies and configuration changes are applied automatically.

0 commit comments

Comments
 (0)