Skip to content

Commit 026c775

Browse files
authored
Add/align 'Data streams' content (#457)
This adds `Manage data` / `The Elasticsearch data store` / `Data streams` as well as a "Manage data streams" page that has the UI steps. I've also removed the "logsdb.md" and "tsdb.md" pages (in red) since they are duplicates of the "Logs data stream" and "Time series data stream (TSDS)" pages (in green): <img width="531" alt="screen" src="https://github.com/user-attachments/assets/a2f49459-7ba7-4ded-87d1-256e7781db79" /> **NOTE:** As proposed [here](elastic/docs-projects#324 (comment)), after this is merged I'd like to bump the "Data streams" section one level higher, rather than have it nested under "index types". --- Previews: - [Data streams](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/457/manage-data/data-store/index-types/data-streams) - [Manage a data stream](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/457/manage-data/data-store/index-types/manage-data-stream) Closes: elastic/docs-projects#379
1 parent 0d95099 commit 026c775

File tree

11 files changed

+121
-524
lines changed

11 files changed

+121
-524
lines changed

manage-data/data-store/index-types/data-streams.md

Lines changed: 95 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,26 +3,109 @@ mapped_urls:
33
- https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html
44
- https://www.elastic.co/guide/en/elasticsearch/reference/current/index-mgmt.html#manage-data-streams
55
- https://www.elastic.co/guide/en/serverless/current/index-management.html#index-management-manage-data-streams
6+
7+
applies:
8+
stack: all
9+
serverless: all
10+
hosted: all
611
---
712

8-
# Data streams
13+
# Data streams [data-streams]
14+
15+
A data stream lets you store append-only time series data across multiple indices while giving you a single named resource for requests. Data streams are well-suited for logs, events, metrics, and other continuously generated data.
16+
17+
You can submit indexing and search requests directly to a data stream. The stream automatically routes the request to backing indices that store the stream’s data. You can use [{{ilm}} ({{ilm-init}})](../../../manage-data/lifecycle/index-lifecycle-management.md) to automate the management of these backing indices. For example, you can use {{ilm-init}} to automatically move older backing indices to less expensive hardware and delete unneeded indices. {{ilm-init}} can help you reduce costs and overhead as your data grows.
18+
19+
20+
## Should you use a data stream? [should-you-use-a-data-stream]
21+
22+
To determine whether you should use a data stream for your data, you should consider the format of the data, and your expected interaction. A good candidate for using a data stream will match the following criteria:
23+
24+
* Your data contains a timestamp field, or one could be automatically generated.
25+
* You mostly perform indexing requests, with occasional updates and deletes.
26+
* You index documents without an `_id`, or when indexing documents with an explicit `_id` you expect first-write-wins behavior.
27+
28+
For most time series data use-cases, a data stream will be a good fit. However, if you find that your data doesn’t fit into these categories (for example, if you frequently send multiple documents using the same `_id` expecting last-write-wins), you may want to use an index alias with a write index instead. See documentation for [managing time series data without a data stream](../../../manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams) for more information.
29+
30+
Keep in mind that some features such as [Time Series Data Streams (TSDS)](../../../manage-data/data-store/index-types/time-series-data-stream-tsds.md) and [data stream lifecycles](../../../manage-data/lifecycle/data-stream.md) require a data stream.
31+
32+
33+
## Backing indices [backing-indices]
34+
35+
A data stream consists of one or more [hidden](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-hidden), auto-generated backing indices.
36+
37+
:::{image} ../../../images/elasticsearch-reference-data-streams-diagram.svg
38+
:alt: data streams diagram
39+
:::
40+
41+
A data stream requires a matching [index template](../../../manage-data/data-store/templates.md). The template contains the mappings and settings used to configure the stream’s backing indices.
42+
43+
Every document indexed to a data stream must contain a `@timestamp` field, mapped as a [`date`](https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html) or [`date_nanos`](https://www.elastic.co/guide/en/elasticsearch/reference/current/date_nanos.html) field type. If the index template doesn’t specify a mapping for the `@timestamp` field, {{es}} maps `@timestamp` as a `date` field with default options.
44+
45+
The same index template can be used for multiple data streams. You cannot delete an index template in use by a data stream.
46+
47+
The name pattern for the backing indices is an implementation detail and no intelligence should be derived from it. The only invariant the holds is that each data stream generation index will have a unique name.
48+
49+
50+
## Read requests [data-stream-read-requests]
51+
52+
When you submit a read request to a data stream, the stream routes the request to all its backing indices.
53+
54+
:::{image} ../../../images/elasticsearch-reference-data-streams-search-request.svg
55+
:alt: data streams search request
56+
:::
57+
58+
59+
## Write index [data-stream-write-index]
60+
61+
The most recently created backing index is the data stream’s write index. The stream adds new documents to this index only.
62+
63+
:::{image} ../../../images/elasticsearch-reference-data-streams-index-request.svg
64+
:alt: data streams index request
65+
:::
66+
67+
You cannot add new documents to other backing indices, even by sending requests directly to the index.
68+
69+
You also cannot perform operations on a write index that may hinder indexing, such as:
70+
71+
* [Clone](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-clone)
72+
* [Delete](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-delete)
73+
* [Shrink](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-shrink)
74+
* [Split](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-split)
75+
76+
77+
## Rollover [data-streams-rollover]
78+
79+
A [rollover](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-rollover) creates a new backing index that becomes the stream’s new write index.
80+
81+
We recommend using [{{ilm-init}}](../../../manage-data/lifecycle/index-lifecycle-management.md) to automatically roll over data streams when the write index reaches a specified age or size. If needed, you can also [manually roll over](../../../manage-data/data-store/index-types/use-data-stream.md#manually-roll-over-a-data-stream) a data stream.
82+
83+
84+
## Generation [data-streams-generation]
85+
86+
Each data stream tracks its generation: a six-digit, zero-padded integer starting at `000001`.
87+
88+
When a backing index is created, the index is named using the following convention:
89+
90+
```text
91+
.ds-<data-stream>-<yyyy.MM.dd>-<generation>
92+
```
93+
94+
`<yyyy.MM.dd>` is the backing index’s creation date. Backing indices with a higher generation contain more recent data. For example, the `web-server-logs` data stream has a generation of `34`. The stream’s most recent backing index, created on 7 March 2099, is named `.ds-web-server-logs-2099.03.07-000034`.
995

10-
% What needs to be done: Align serverless/stateful
96+
Some operations, such as a [shrink](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-shrink) or [restore](../../../deploy-manage/tools/snapshot-and-restore/restore-snapshot.md), can change a backing index’s name. These name changes do not remove a backing index from its data stream.
1197

12-
% GitHub issue: docs-projects#379
98+
The generation of the data stream can change without a new index being added to the data stream (e.g. when an existing backing index is shrunk). This means the backing indices for some generations will never exist. You should not derive any intelligence from the backing indices names.
1399

14-
% Scope notes: Combine content from linked sources including aligning serverless and stateful content.
15100

16-
% Use migrated content from existing pages that map to this page:
101+
## Append-only (mostly) [data-streams-append-only]
17102

18-
% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/data-streams.md
19-
% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/index-mgmt.md
20-
% - [ ] ./raw-migrated-files/docs-content/serverless/index-management.md
103+
Data streams are designed for use cases where existing data is rarely updated. You cannot send update or deletion requests for existing documents directly to a data stream. However, you can still [update or delete documents](../../../manage-data/data-store/index-types/use-data-stream.md#update-delete-docs-in-a-backing-index) in a data stream by submitting requests directly to the document’s backing index.
21104

22-
% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
105+
If you need to update a larger number of documents in a data stream, you can use the [update by query](../../../manage-data/data-store/index-types/use-data-stream.md#update-docs-in-a-data-stream-by-query) and [delete by query](../../../manage-data/data-store/index-types/use-data-stream.md#delete-docs-in-a-data-stream-by-query) APIs.
23106

24-
$$$data-streams-append-only$$$
107+
::::{tip}
108+
If you frequently send multiple documents using the same `_id` expecting last-write-wins, you may want to use an index alias with a write index instead. See [Manage time series data without data streams](../../../manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams).
109+
::::
25110

26-
$$$data-stream-write-index$$$
27111

28-
$$$data-streams-rollover$$$

0 commit comments

Comments
 (0)