diff --git a/manage-data/data-store/index-types/data-streams.md b/manage-data/data-store/index-types/data-streams.md index 0d53e5a3e2..2e94dd59c4 100644 --- a/manage-data/data-store/index-types/data-streams.md +++ b/manage-data/data-store/index-types/data-streams.md @@ -3,26 +3,109 @@ mapped_urls: - https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html - https://www.elastic.co/guide/en/elasticsearch/reference/current/index-mgmt.html#manage-data-streams - https://www.elastic.co/guide/en/serverless/current/index-management.html#index-management-manage-data-streams + +applies: + stack: all + serverless: all + hosted: all --- -# Data streams +# Data streams [data-streams] + +A data stream lets you store append-only time series data across multiple indices while giving you a single named resource for requests. Data streams are well-suited for logs, events, metrics, and other continuously generated data. + +You can submit indexing and search requests directly to a data stream. The stream automatically routes the request to backing indices that store the stream’s data. You can use [{{ilm}} ({{ilm-init}})](../../../manage-data/lifecycle/index-lifecycle-management.md) to automate the management of these backing indices. For example, you can use {{ilm-init}} to automatically move older backing indices to less expensive hardware and delete unneeded indices. {{ilm-init}} can help you reduce costs and overhead as your data grows. + + +## Should you use a data stream? [should-you-use-a-data-stream] + +To determine whether you should use a data stream for your data, you should consider the format of the data, and your expected interaction. A good candidate for using a data stream will match the following criteria: + +* Your data contains a timestamp field, or one could be automatically generated. +* You mostly perform indexing requests, with occasional updates and deletes. +* You index documents without an `_id`, or when indexing documents with an explicit `_id` you expect first-write-wins behavior. + +For most time series data use-cases, a data stream will be a good fit. However, if you find that your data doesn’t fit into these categories (for example, if you frequently send multiple documents using the same `_id` expecting last-write-wins), you may want to use an index alias with a write index instead. See documentation for [managing time series data without a data stream](../../../manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams) for more information. + +Keep in mind that some features such as [Time Series Data Streams (TSDS)](../../../manage-data/data-store/index-types/time-series-data-stream-tsds.md) and [data stream lifecycles](../../../manage-data/lifecycle/data-stream.md) require a data stream. + + +## Backing indices [backing-indices] + +A data stream consists of one or more [hidden](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-hidden), auto-generated backing indices. + +:::{image} ../../../images/elasticsearch-reference-data-streams-diagram.svg +:alt: data streams diagram +::: + +A data stream requires a matching [index template](../../../manage-data/data-store/templates.md). The template contains the mappings and settings used to configure the stream’s backing indices. + +Every document indexed to a data stream must contain a `@timestamp` field, mapped as a [`date`](https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html) or [`date_nanos`](https://www.elastic.co/guide/en/elasticsearch/reference/current/date_nanos.html) field type. If the index template doesn’t specify a mapping for the `@timestamp` field, {{es}} maps `@timestamp` as a `date` field with default options. + +The same index template can be used for multiple data streams. You cannot delete an index template in use by a data stream. + +The name pattern for the backing indices is an implementation detail and no intelligence should be derived from it. The only invariant the holds is that each data stream generation index will have a unique name. + + +## Read requests [data-stream-read-requests] + +When you submit a read request to a data stream, the stream routes the request to all its backing indices. + +:::{image} ../../../images/elasticsearch-reference-data-streams-search-request.svg +:alt: data streams search request +::: + + +## Write index [data-stream-write-index] + +The most recently created backing index is the data stream’s write index. The stream adds new documents to this index only. + +:::{image} ../../../images/elasticsearch-reference-data-streams-index-request.svg +:alt: data streams index request +::: + +You cannot add new documents to other backing indices, even by sending requests directly to the index. + +You also cannot perform operations on a write index that may hinder indexing, such as: + +* [Clone](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-clone) +* [Delete](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-delete) +* [Shrink](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-shrink) +* [Split](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-split) + + +## Rollover [data-streams-rollover] + +A [rollover](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-rollover) creates a new backing index that becomes the stream’s new write index. + +We recommend using [{{ilm-init}}](../../../manage-data/lifecycle/index-lifecycle-management.md) to automatically roll over data streams when the write index reaches a specified age or size. If needed, you can also [manually roll over](../../../manage-data/data-store/index-types/use-data-stream.md#manually-roll-over-a-data-stream) a data stream. + + +## Generation [data-streams-generation] + +Each data stream tracks its generation: a six-digit, zero-padded integer starting at `000001`. + +When a backing index is created, the index is named using the following convention: + +```text +.ds--- +``` + +`` is the backing index’s creation date. Backing indices with a higher generation contain more recent data. For example, the `web-server-logs` data stream has a generation of `34`. The stream’s most recent backing index, created on 7 March 2099, is named `.ds-web-server-logs-2099.03.07-000034`. -% What needs to be done: Align serverless/stateful +Some operations, such as a [shrink](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-shrink) or [restore](../../../deploy-manage/tools/snapshot-and-restore/restore-snapshot.md), can change a backing index’s name. These name changes do not remove a backing index from its data stream. -% GitHub issue: docs-projects#379 +The generation of the data stream can change without a new index being added to the data stream (e.g. when an existing backing index is shrunk). This means the backing indices for some generations will never exist. You should not derive any intelligence from the backing indices names. -% Scope notes: Combine content from linked sources including aligning serverless and stateful content. -% Use migrated content from existing pages that map to this page: +## Append-only (mostly) [data-streams-append-only] -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/data-streams.md -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/index-mgmt.md -% - [ ] ./raw-migrated-files/docs-content/serverless/index-management.md +Data streams are designed for use cases where existing data is rarely updated. You cannot send update or deletion requests for existing documents directly to a data stream. However, you can still [update or delete documents](../../../manage-data/data-store/index-types/use-data-stream.md#update-delete-docs-in-a-backing-index) in a data stream by submitting requests directly to the document’s backing index. -% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): +If you need to update a larger number of documents in a data stream, you can use the [update by query](../../../manage-data/data-store/index-types/use-data-stream.md#update-docs-in-a-data-stream-by-query) and [delete by query](../../../manage-data/data-store/index-types/use-data-stream.md#delete-docs-in-a-data-stream-by-query) APIs. -$$$data-streams-append-only$$$ +::::{tip} +If you frequently send multiple documents using the same `_id` expecting last-write-wins, you may want to use an index alias with a write index instead. See [Manage time series data without data streams](../../../manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams). +:::: -$$$data-stream-write-index$$$ -$$$data-streams-rollover$$$ diff --git a/manage-data/data-store/index-types/logsdb.md b/manage-data/data-store/index-types/logsdb.md deleted file mode 100644 index 422b2ef679..0000000000 --- a/manage-data/data-store/index-types/logsdb.md +++ /dev/null @@ -1,181 +0,0 @@ ---- -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/logs-data-stream.html ---- - -# logsdb [logs-data-stream] - -::::{important} -The {{es}} `logsdb` index mode is generally available in Elastic Cloud Hosted and self-managed Elasticsearch as of version 8.17, and is enabled by default for logs in [{{serverless-full}}](https://www.elastic.co/elasticsearch/serverless). -:::: - - -A logs data stream is a data stream type that stores log data more efficiently. - -In benchmarks, log data stored in a logs data stream used ~2.5 times less disk space than a regular data stream. The exact impact varies by data set. - - -## Create a logs data stream [how-to-use-logsds] - -To create a logs data stream, set your [template](../templates.md) `index.mode` to `logsdb`: - -```console -PUT _index_template/my-index-template -{ - "index_patterns": ["logs-*"], - "data_stream": { }, - "template": { - "settings": { - "index.mode": "logsdb" <1> - } - }, - "priority": 101 <2> -} -``` - -1. The index mode setting. -2. The index template priority. By default, Elasticsearch ships with a `logs-*-*` index template with a priority of 100. To make sure your index template takes priority over the default `logs-*-*` template, set its `priority` to a number higher than 100. For more information, see [Avoid index pattern collisions](../templates.md#avoid-index-pattern-collisions). - - -After the index template is created, new indices that use the template will be configured as a logs data stream. You can start indexing data and [using the data stream](use-data-stream.md). - -You can also set the index mode and adjust other template settings in [the Elastic UI](../../lifecycle/index-lifecycle-management/index-management-in-kibana.md). - - -## Synthetic source [logsdb-synthetic-source] - -If you have the required [subscription](https://www.elastic.co/subscriptions), `logsdb` index mode uses [synthetic `_source`](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#synthetic-source), which omits storing the original `_source` field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval. - -If you don’t have the required [subscription](https://www.elastic.co/subscriptions), `logsdb` mode uses the original `_source` field. - -Before using synthetic source, make sure to review the [restrictions](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#synthetic-source-restrictions). - -When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values are preserved for [synthetic source](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#synthetic-source) reconstruction. In `logsdb`, the default value is `arrays`, which retains both duplicate values and the order of entries. However, the exact structure of array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events. - - -## Index sort settings [logsdb-sort-settings] - -In `logsdb` index mode, indices are sorted by the fields `host.name` and `@timestamp` by default. - -* If the `@timestamp` field is not present, it is automatically injected. -* If the `host.name` field is not present, it is automatically injected as a `keyword` field, if possible. - - * If `host.name` can’t be injected (for example, `host` is a keyword field) or can’t be used for sorting (for example, its value is an IP address), only the `@timestamp` is used for sorting. - * If `host.name` is injected and `subobjects` is set to `true` (default), the `host` field is mapped as an object field named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`, a single `host.name` field is mapped as a `keyword` field. - -* To prioritize the latest data, `host.name` is sorted in ascending order and `@timestamp` is sorted in descending order. - -You can override the default sort settings by manually configuring `index.sort.field` and `index.sort.order`. For more details, see [*Index Sorting*](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-index-sorting.html). - -To modify the sort configuration of an existing data stream, update the data stream’s component templates, and then perform or wait for a [rollover](data-streams.md#data-streams-rollover). - -::::{note} -If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not automatically added to the list of sort fields. For best results, include it manually as the last sort field, with `desc` ordering. -:::: - - - -### Existing data streams [logsdb-host-name] - -If you’re enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it’s included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied. - -To avoid mapping conflicts, consider these options: - -* **Adjust mappings:** Check your existing mappings to ensure that `host.name` is mapped as a keyword. -* **Change sorting:** If needed, you can remove `host.name` from the sort settings and use a different set of fields. Sorting by `@timestamp` can be a good fallback. -* **Switch to a different [index mode](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-mode-setting)**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode. - -::::{important} -On existing data streams, `logsdb` mode is applied on [rollover](data-streams.md#data-streams-rollover) (automatic or manual). -:::: - - - -### Optimized routing on sort fields [logsdb-sort-routing] - -To reduce the storage footprint of `logsdb` indexes, you can enable routing optimizations. A routing optimization uses the fields in the sort configuration (except for `@timestamp`) to route documents to shards. - -In benchmarks, routing optimizations reduced storage requirements by 20% compared to the default `logsdb` configuration, with a negligible penalty to ingestion performance (1-4%). Routing optimizations can benefit data streams that are expected to grow substantially over time. Exact results depend on the sort configuration and the nature of the logged data. - -To configure a routing optimization: - -* Include the index setting `[index.logsdb.route_on_sort_fields:true]` in the data stream configuration. -* [Configure index sorting](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-index-sorting.html) with two or more fields, in addition to `@timestamp`. -* Make sure the [`_id`](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html) field is not populated in ingested documents. It should be auto-generated instead. - -A custom sort configuration is required, to improve storage efficiency and to minimize hotspots from logging spikes that may route documents to a single shard. For best results, use a few sort fields that have a relatively low cardinality and don’t co-vary (for example, `host.name` and `host.id` are not optimal). - - -## Specialized codecs [logsdb-specialized-codecs] - -By default, `logsdb` index mode uses the `best_compression` [codec](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-codec), which applies [ZSTD](https://en.wikipedia.org/wiki/Zstd) compression to stored fields. You can switch to the `default` codec for faster compression with a slightly larger storage footprint. - -The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are encoded using the following sequence of codecs: - -* **Delta encoding**: Stores the difference between consecutive values instead of the actual values. -* **Offset encoding**: Stores the difference from a base value rather than between consecutive values. -* **Greatest Common Divisor (GCD) encoding**: Finds the greatest common divisor of a set of values and stores the differences as multiples of the GCD. -* **Frame Of Reference (FOR) encoding**: Determines the smallest number of bits required to encode a block of values and uses bit-packing to fit such values into larger 64-bit blocks. - -Each encoding is evaluated according to heuristics determined by the data distribution. For example, the algorithm checks whether the data is monotonically non-decreasing or non-increasing. If so, delta encoding is applied; otherwise, the process continues with the next encoding method (offset). - -Encoding is specific to each Lucene segment and is reapplied when segments are merged. The merged Lucene segment might use a different encoding than the original segments, depending on the characteristics of the merged data. - -For keyword fields, **Run Length Encoding (RLE)** is applied to the ordinals, which represent positions in the Lucene segment-level keyword dictionary. This compression is used when multiple consecutive documents share the same keyword. - - -## `ignore` settings [logsdb-ignored-settings] - -The `logsdb` index mode uses the following `ignore` settings. You can override these settings as needed. - - -### `ignore_malformed` [logsdb-ignore-malformed] - -By default, `logsdb` index mode sets `ignore_malformed` to `true`. With this setting, documents with malformed fields can be indexed without causing ingestion failures. - - -### `ignore_above` [logs-db-ignore-above] - -In `logsdb` index mode, the `index.mapping.ignore_above` setting is applied by default at the index level to ensure efficient storage and indexing of large keyword fields.The index-level default for `ignore_above` is 8191 *characters.* Using UTF-8 encoding, this results in a limit of 32764 bytes, depending on character encoding. - -The mapping-level `ignore_above` setting takes precedence. If a specific field has an `ignore_above` value defined in its mapping, that value overrides the index-level `index.mapping.ignore_above` value. This default behavior helps to optimize indexing performance by preventing excessively large string values from being indexed. - -If you need to customize the limit, you can override it at the mapping level or change the index level default. - - -### `ignore_dynamic_beyond_limit` [logs-db-ignore-limit] - -In `logsdb` index mode, the setting `index.mapping.total_fields.ignore_dynamic_beyond_limit` is set to `true` by default. This setting allows dynamically mapped fields to be added on top of statically defined fields, even when the total number of fields exceeds the `index.mapping.total_fields.limit`. Instead of triggering an index failure, additional dynamically mapped fields are ignored so that ingestion can continue. - -::::{note} -When automatically injected, `host.name` and `@timestamp` count toward the limit of mapped fields. If `host.name` is mapped with `subobjects: true`, it has two fields. When mapped with `subobjects: false`, `host.name` has only one field. -:::: - - - -## Fields without `doc_values` [logsdb-nodocvalue-fields] - -When the `logsdb` index mode uses synthetic `_source` and `doc_values` are disabled for a field in the mapping, {{es}} might set the `store` setting to `true` for that field. This ensures that the field’s data remains accessible for reconstructing the document’s source when using [synthetic source](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#synthetic-source). - -For example, this adjustment occurs with text fields when `store` is `false` and no suitable multi-field is available for reconstructing the original value. - - -## Settings reference [logsdb-settings-summary] - -The `logsdb` index mode uses the following settings: - -* **`index.mode`**: `"logsdb"` -* **`index.mapping.synthetic_source_keep`**: `"arrays"` -* **`index.sort.field`**: `["host.name", "@timestamp"]` -* **`index.sort.order`**: `["desc", "desc"]` -* **`index.sort.mode`**: `["min", "min"]` -* **`index.sort.missing`**: `["_first", "_first"]` -* **`index.codec`**: `"best_compression"` -* **`index.mapping.ignore_malformed`**: `true` -* **`index.mapping.ignore_above`**: `8191` -* **`index.mapping.total_fields.ignore_dynamic_beyond_limit`**: `true` - - -## Notes about upgrading to Logsdb [upgrade-to-logsdb-notes] - -TODO: add notes. diff --git a/manage-data/data-store/index-types/manage-data-stream.md b/manage-data/data-store/index-types/manage-data-stream.md new file mode 100644 index 0000000000..309d14f64f --- /dev/null +++ b/manage-data/data-store/index-types/manage-data-stream.md @@ -0,0 +1,18 @@ +# Manage a data stream [index-management-manage-data-streams] + +Investigate your data streams and address lifecycle management needs in the **Data Streams** view. + +The value in the **Indices** column indicates the number of backing indices. Click this number to drill down into details. + +A value in the data retention column indicates that the data stream is managed by a data stream lifecycle policy. This value is the time period for which your data is guaranteed to be stored. Data older than this period can be deleted by {{es}} at a later time. + +In {{es-serverless}}, indices matching the `logs-*-*` pattern use the logsDB index mode by default. The logsDB index mode creates a [logs data stream](../../../manage-data/data-store/index-types/logs-data-stream.md). + +:::{image} ../../../images/serverless-management-data-stream.png +:alt: Data stream details +:class: screenshot +::: + +* To view more information about a data stream, such as its generation or its current index lifecycle policy, click the stream’s name. From this view, you can navigate to **Discover** to further explore data within the data stream. +* To view information about the stream’s backing indices, click the number in the **Indices** column. +* [preview] To modify the data retention value, select an index, open the **Manage** menu, and click **Edit data retention**. diff --git a/manage-data/data-store/index-types/set-up-data-stream.md b/manage-data/data-store/index-types/set-up-data-stream.md index c0fbfbfb7d..5c5e23e4cb 100644 --- a/manage-data/data-store/index-types/set-up-data-stream.md +++ b/manage-data/data-store/index-types/set-up-data-stream.md @@ -197,6 +197,7 @@ You can also manually create the stream using the [create data stream API](https PUT _data_stream/my-data-stream ``` +After it's been created, you can view and manage this and other data streams from the **Stack Management > Index Management** view. Refer to [Manage a data stream](./manage-data-stream.md) for details. ## Secure the data stream [secure-data-stream] diff --git a/manage-data/data-store/index-types/tsdb.md b/manage-data/data-store/index-types/tsdb.md deleted file mode 100644 index d25af6caef..0000000000 --- a/manage-data/data-store/index-types/tsdb.md +++ /dev/null @@ -1,218 +0,0 @@ ---- -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html ---- - -# tsdb [tsds] - -A time series data stream (TSDS) models timestamped metrics data as one or more time series. - -You can use a TSDS to store metrics data more efficiently. In our benchmarks, metrics data stored in a TSDS used 70% less disk space than a regular data stream. The exact impact will vary per data set. - - -## When to use a TSDS [when-to-use-tsds] - -Both a [regular data stream](data-streams.md) and a TSDS can store timestamped metrics data. Only use a TSDS if you typically add metrics data to {{es}} in near real-time and `@timestamp` order. - -A TSDS is only intended for metrics data. For other timestamped data, such as logs or traces, use a [logs data stream](logs-data-stream.md) or regular data stream. - - -## Differences from a regular data stream [differences-from-regular-data-stream] - -A TSDS works like a regular data stream with some key differences: - -* The matching index template for a TSDS requires a `data_stream` object with the [`index.mode: time_series`](time-series-data-stream-tsds.md#time-series-mode) option. This option enables most TSDS-related functionality. -* In addition to a `@timestamp`, each document in a TSDS must contain one or more [dimension fields](time-series-data-stream-tsds.md#time-series-dimension). The matching index template for a TSDS must contain mappings for at least one `keyword` dimension. - - TSDS documents also typically contain one or more [metric fields](time-series-data-stream-tsds.md#time-series-metric). - -* {{es}} generates a hidden [`_tsid`](time-series-data-stream-tsds.md#tsid) metadata field for each document in a TSDS. -* A TSDS uses [time-bound backing indices](time-series-data-stream-tsds.md#time-bound-indices) to store data from the same time period in the same backing index. -* The matching index template for a TSDS must contain the `index.routing_path` index setting. A TSDS uses this setting to perform [dimension-based routing](time-series-data-stream-tsds.md#dimension-based-routing). -* A TSDS uses internal [index sorting](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-index-sorting.html) to order shard segments by `_tsid` and `@timestamp`. -* TSDS documents only support auto-generated document `_id` values. For TSDS documents, the document `_id` is a hash of the document’s dimensions and `@timestamp`. A TSDS doesn’t support custom document `_id` values. -* A TSDS uses [synthetic `_source`](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#synthetic-source), and as a result is subject to some [restrictions](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#synthetic-source-restrictions) and [modifications](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#synthetic-source-modifications) applied to the `_source` field. - -::::{note} -A time series index can contain fields other than dimensions or metrics. -:::: - - - -## What is a time series? [time-series] - -A time series is a sequence of observations for a specific entity. Together, these observations let you track changes to the entity over time. For example, a time series can track: - -* CPU and disk usage for a computer -* The price of a stock -* Temperature and humidity readings from a weather sensor. - -:::{image} ../../../images/elasticsearch-reference-time-series-chart.svg -:alt: time series chart -:title: Time series of weather sensor readings plotted as a graph -::: - -In a TSDS, each {{es}} document represents an observation, or data point, in a specific time series. Although a TSDS can contain multiple time series, a document can only belong to one time series. A time series can’t span multiple data streams. - - -### Dimensions [time-series-dimension] - -Dimensions are field names and values that, in combination, identify a document’s time series. In most cases, a dimension describes some aspect of the entity you’re measuring. For example, documents related to the same weather sensor may always have the same `sensor_id` and `location` values. - -A TSDS document is uniquely identified by its time series and timestamp, both of which are used to generate the document `_id`. So, two documents with the same dimensions and the same timestamp are considered to be duplicates. When you use the `_bulk` endpoint to add documents to a TSDS, a second document with the same timestamp and dimensions overwrites the first. When you use the `PUT //_create/<_id>` format to add an individual document and a document with the same `_id` already exists, an error is generated. - -You mark a field as a dimension using the boolean `time_series_dimension` mapping parameter. The following field types support the `time_series_dimension` parameter: - -* [`keyword`](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html#keyword-field-type) -* [`ip`](https://www.elastic.co/guide/en/elasticsearch/reference/current/ip.html) -* [`byte`](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html) -* [`short`](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html) -* [`integer`](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html) -* [`long`](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html) -* [`unsigned_long`](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html) -* [`boolean`](https://www.elastic.co/guide/en/elasticsearch/reference/current/boolean.html) - -For a flattened field, use the `time_series_dimensions` parameter to configure an array of fields as dimensions. For details refer to [`flattened`](https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html#flattened-params). - -Dimension definitions can be simplified through [pass-through](https://www.elastic.co/guide/en/elasticsearch/reference/current/passthrough.html#passthrough-dimensions) fields. - - -### Metrics [time-series-metric] - -Metrics are fields that contain numeric measurements, as well as aggregations and/or downsampling values based off of those measurements. While not required, documents in a TSDS typically contain one or more metric fields. - -Metrics differ from dimensions in that while dimensions generally remain constant, metrics are expected to change over time, even if rarely or slowly. - -To mark a field as a metric, you must specify a metric type using the `time_series_metric` mapping parameter. The following field types support the `time_series_metric` parameter: - -* [`aggregate_metric_double`](https://www.elastic.co/guide/en/elasticsearch/reference/current/aggregate-metric-double.html) -* [`histogram`](https://www.elastic.co/guide/en/elasticsearch/reference/current/histogram.html) -* All [numeric field types](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html) - -Accepted metric types vary based on the field type: - -:::::{dropdown} Valid values for `time_series_metric` -`counter` -: A cumulative metric that only monotonically increases or resets to `0` (zero). For example, a count of errors or completed tasks. - - A counter field has additional semantic meaning, because it represents a cumulative counter. This works well with the `rate` aggregation, since a rate can be derived from a cumulative monotonically increasing counter. However a number of aggregations (for example `sum`) compute results that don’t make sense for a counter field, because of its cumulative nature. - - Only numeric and `aggregate_metric_double` fields support the `counter` metric type. - - -::::{note} -Due to the cumulative nature of counter fields, the following aggregations are supported and expected to provide meaningful results with the `counter` field: `rate`, `histogram`, `range`, `min`, `max`, `top_metrics` and `variable_width_histogram`. In order to prevent issues with existing integrations and custom dashboards, we also allow the following aggregations, even if the result might be meaningless on counters: `avg`, `box plot`, `cardinality`, `extended stats`, `median absolute deviation`, `percentile ranks`, `percentiles`, `stats`, `sum` and `value count`. -:::: - - -`gauge` -: A metric that represents a single numeric that can arbitrarily increase or decrease. For example, a temperature or available disk space. - - Only numeric and `aggregate_metric_double` fields support the `gauge` metric type. - - -`null` (Default) -: Not a time series metric. - -::::: - - - -## Time series mode [time-series-mode] - -The matching index template for a TSDS must contain a `data_stream` object with the `index_mode: time_series` option. This option ensures the TSDS creates backing indices with an [`index.mode`](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds-index-settings.html#index-mode) setting of `time_series`. This setting enables most TSDS-related functionality in the backing indices. - -If you convert an existing data stream to a TSDS, only backing indices created after the conversion have an `index.mode` of `time_series`. You can’t change the `index.mode` of an existing backing index. - - -### `_tsid` metadata field [tsid] - -When you add a document to a TSDS, {{es}} automatically generates a `_tsid` metadata field for the document. The `_tsid` is an object containing the document’s dimensions. Documents in the same TSDS with the same `_tsid` are part of the same time series. - -The `_tsid` field is not queryable or updatable. You also can’t retrieve a document’s `_tsid` using a [get document](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-get) request. However, you can use the `_tsid` field in aggregations and retrieve the `_tsid` value in searches using the [`fields` parameter](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-fields.html#search-fields-param). - -::::{warning} -The format of the `_tsid` field shouldn’t be relied upon. It may change from version to version. -:::: - - - -### Time-bound indices [time-bound-indices] - -In a TSDS, each backing index, including the most recent backing index, has a range of accepted `@timestamp` values. This range is defined by the [`index.time_series.start_time`](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds-index-settings.html#index-time-series-start-time) and [`index.time_series.end_time`](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds-index-settings.html#index-time-series-end-time) index settings. - -When you add a document to a TSDS, {{es}} adds the document to the appropriate backing index based on its `@timestamp` value. As a result, a TSDS can add documents to any TSDS backing index that can receive writes. This applies even if the index isn’t the most recent backing index. - -:::{image} ../../../images/elasticsearch-reference-time-bound-indices.svg -:alt: time bound indices -::: - -::::{tip} -Some {{ilm-init}} actions mark the source index as read-only, or expect the index to not be actively written anymore in order to provide good performance. These actions are: - [Delete](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-delete.html) - [Downsample](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-downsample.html) - [Force merge](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-forcemerge.html) - [Read only](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-readonly.html) - [Searchable snapshot](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-searchable-snapshot.html) - [Shrink](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-shrink.html) {{ilm-cap}} will **not** proceed with executing these actions until the upper time-bound for accepting writes, represented by the [`index.time_series.end_time`](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds-index-settings.html#index-time-series-end-time) index setting, has lapsed. -:::: - - -If no backing index can accept a document’s `@timestamp` value, {{es}} rejects the document. - -{{es}} automatically configures `index.time_series.start_time` and `index.time_series.end_time` settings as part of the index creation and rollover process. - - -### Look-ahead time [tsds-look-ahead-time] - -Use the [`index.look_ahead_time`](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds-index-settings.html#index-look-ahead-time) index setting to configure how far into the future you can add documents to an index. When you create a new write index for a TSDS, {{es}} calculates the index’s `index.time_series.end_time` value as: - -`now + index.look_ahead_time` - -At the time series poll interval (controlled via `time_series.poll_interval` setting), {{es}} checks if the write index has met the rollover criteria in its index lifecycle policy. If not, {{es}} refreshes the `now` value and updates the write index’s `index.time_series.end_time` to: - -`now + index.look_ahead_time + time_series.poll_interval` - -This process continues until the write index rolls over. When the index rolls over, {{es}} sets a final `index.time_series.end_time` value for the index. This value borders the `index.time_series.start_time` for the new write index. This ensures the `@timestamp` ranges for neighboring backing indices always border but never overlap. - - -### Look-back time [tsds-look-back-time] - -Use the [`index.look_back_time`](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds-index-settings.html#index-look-back-time) index setting to configure how far in the past you can add documents to an index. When you create a data stream for a TSDS, {{es}} calculates the index’s `index.time_series.start_time` value as: - -`now - index.look_back_time` - -This setting is only used when a data stream gets created and controls the `index.time_series.start_time` index setting of the first backing index. Configuring this index setting can be useful to accept documents with `@timestamp` field values that are older than 2 hours (the `index.look_back_time` default). - - -### Accepted time range for adding data [tsds-accepted-time-range] - -A TSDS is designed to ingest current metrics data. When the TSDS is first created the initial backing index has: - -* an `index.time_series.start_time` value set to `now - index.look_back_time` -* an `index.time_series.end_time` value set to `now + index.look_ahead_time` - -Only data that falls inside that range can be indexed. - -You can use the [get data stream API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-get-data-stream) to check the accepted time range for writing to any TSDS. - - -### Dimension-based routing [dimension-based-routing] - -Within each TSDS backing index, {{es}} uses the [`index.routing_path`](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds-index-settings.html#index-routing-path) index setting to route documents with the same dimensions to the same shards. - -When you create the matching index template for a TSDS, you must specify one or more dimensions in the `index.routing_path` setting. Each document in a TSDS must contain one or more dimensions that match the `index.routing_path` setting. - -The `index.routing_path` setting accepts wildcard patterns (for example `dim.*`) and can dynamically match new fields. However, {{es}} will reject any mapping updates that add scripted, runtime, or non-dimension fields that match the `index.routing_path` value. - -[Pass-through](https://www.elastic.co/guide/en/elasticsearch/reference/current/passthrough.html#passthrough-dimensions) fields may be configured as dimension containers. In this case, their sub-fields get included to the routing path automatically. - -TSDS documents don’t support a custom `_routing` value. Similarly, you can’t require a `_routing` value in mappings for a TSDS. - - -### Index sorting [tsds-index-sorting] - -{{es}} uses [compression algorithms](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-codec) to compress repeated values. This compression works best when repeated values are stored near each other — in the same index, on the same shard, and side-by-side in the same shard segment. - -Most time series data contains repeated values. Dimensions are repeated across documents in the same time series. The metric values of a time series may also change slowly over time. - -Internally, each TSDS backing index uses [index sorting](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-index-sorting.html) to order its shard segments by `_tsid` and `@timestamp`. This makes it more likely that these repeated values are stored near each other for better compression. A TSDS doesn’t support any [`index.sort.*`](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-index-sorting.html) index settings. - - -## What’s next? [tsds-whats-next] - -Now that you know the basics, you’re ready to [create a TSDS](set-up-tsds.md) or [convert an existing data stream to a TSDS](set-up-tsds.md). diff --git a/manage-data/data-store/templates/index-template-management.md b/manage-data/data-store/templates/index-template-management.md index a3e6e8d637..8efc7faf98 100644 --- a/manage-data/data-store/templates/index-template-management.md +++ b/manage-data/data-store/templates/index-template-management.md @@ -17,7 +17,7 @@ Create, edit, clone, and delete your index templates in the **Index Templates** :class: screenshot ::: -In {{serverless-full}}, the default **logs** template uses the logsDB index mode to create a [logs data stream](../index-types/logsdb.md). +In {{serverless-full}}, the default **logs** template uses the logsDB index mode to create a [logs data stream](../index-types/logs-data-stream.md). If you don’t have any templates, you can create one using the **Create template** wizard. diff --git a/manage-data/toc.yml b/manage-data/toc.yml index 6d686b598a..8e1efcb511 100644 --- a/manage-data/toc.yml +++ b/manage-data/toc.yml @@ -11,6 +11,7 @@ toc: - file: data-store/index-types/set-up-data-stream.md - file: data-store/index-types/use-data-stream.md - file: data-store/index-types/modify-data-stream.md + - file: data-store/index-types/manage-data-stream.md - file: data-store/index-types/time-series-data-stream-tsds.md children: - file: data-store/index-types/set-up-tsds.md @@ -20,8 +21,6 @@ toc: - file: data-store/index-types/run-downsampling-using-data-stream-lifecycle.md - file: data-store/index-types/reindex-tsds.md - file: data-store/index-types/logs-data-stream.md - - file: data-store/index-types/logsdb.md - - file: data-store/index-types/tsdb.md - file: data-store/index-types/vectordb.md - file: data-store/mapping.md children: diff --git a/raw-migrated-files/docs-content/serverless/index-management.md b/raw-migrated-files/docs-content/serverless/index-management.md index 01b8a850ab..78e34f259b 100644 --- a/raw-migrated-files/docs-content/serverless/index-management.md +++ b/raw-migrated-files/docs-content/serverless/index-management.md @@ -21,11 +21,11 @@ The **{{index-manage-app}}** page contains an overview of your indices. * To drill down into the index mappings, settings, and statistics, click an index name. From this view, you can navigate to **Discover** to further explore the documents in the index. -## Manage data streams [index-management-manage-data-streams] +## Manage data streams Investigate your data streams and address lifecycle management needs in the **Data Streams** view. -In {{es-serverless}}, indices matching the `logs-*-*` pattern use the logsDB index mode by default. The logsDB index mode creates a [logs data stream](../../../manage-data/data-store/index-types/logsdb.md). +In {{es-serverless}}, indices matching the `logs-*-*` pattern use the logsDB index mode by default. The logsDB index mode creates a [logs data stream](../../../manage-data/data-store/index-types/logs-data-stream.md). The value in the **Indices** column indicates the number of backing indices. Click this number to drill down into details. @@ -53,7 +53,7 @@ Create, edit, clone, and delete your index templates in the **Index Templates** :class: screenshot ::: -The default **logs** template uses the logsDB index mode to create a [logs data stream](../../../manage-data/data-store/index-types/logsdb.md). +The default **logs** template uses the logsDB index mode to create a [logs data stream](../../../manage-data/data-store/index-types/logs-data-stream.md). If you don’t have any templates, you can create one using the **Create template** wizard. diff --git a/raw-migrated-files/elasticsearch/elasticsearch-reference/data-streams.md b/raw-migrated-files/elasticsearch/elasticsearch-reference/data-streams.md deleted file mode 100644 index f648c339ff..0000000000 --- a/raw-migrated-files/elasticsearch/elasticsearch-reference/data-streams.md +++ /dev/null @@ -1,104 +0,0 @@ ---- -navigation_title: "Data streams" ---- - -# Data streams [data-streams] - - -A data stream lets you store append-only time series data across multiple indices while giving you a single named resource for requests. Data streams are well-suited for logs, events, metrics, and other continuously generated data. - -You can submit indexing and search requests directly to a data stream. The stream automatically routes the request to backing indices that store the stream’s data. You can use [{{ilm}} ({{ilm-init}})](../../../manage-data/lifecycle/index-lifecycle-management.md) to automate the management of these backing indices. For example, you can use {{ilm-init}} to automatically move older backing indices to less expensive hardware and delete unneeded indices. {{ilm-init}} can help you reduce costs and overhead as your data grows. - - -## Should you use a data stream? [should-you-use-a-data-stream] - -To determine whether you should use a data stream for your data, you should consider the format of the data, and your expected interaction. A good candidate for using a data stream will match the following criteria: - -* Your data contains a timestamp field, or one could be automatically generated. -* You mostly perform indexing requests, with occasional updates and deletes. -* You index documents without an `_id`, or when indexing documents with an explicit `_id` you expect first-write-wins behavior. - -For most time series data use-cases, a data stream will be a good fit. However, if you find that your data doesn’t fit into these categories (for example, if you frequently send multiple documents using the same `_id` expecting last-write-wins), you may want to use an index alias with a write index instead. See documentation for [managing time series data without a data stream](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams) for more information. - -Keep in mind that some features such as [Time Series Data Streams (TSDS)](../../../manage-data/data-store/index-types/tsdb.md) and [data stream lifecycles](../../../manage-data/lifecycle/data-stream.md) require a data stream. - - -## Backing indices [backing-indices] - -A data stream consists of one or more [hidden](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-hidden), auto-generated backing indices. - -:::{image} ../../../images/elasticsearch-reference-data-streams-diagram.svg -:alt: data streams diagram -::: - -A data stream requires a matching [index template](../../../manage-data/data-store/templates.md). The template contains the mappings and settings used to configure the stream’s backing indices. - -Every document indexed to a data stream must contain a `@timestamp` field, mapped as a [`date`](https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html) or [`date_nanos`](https://www.elastic.co/guide/en/elasticsearch/reference/current/date_nanos.html) field type. If the index template doesn’t specify a mapping for the `@timestamp` field, {{es}} maps `@timestamp` as a `date` field with default options. - -The same index template can be used for multiple data streams. You cannot delete an index template in use by a data stream. - -The name pattern for the backing indices is an implementation detail and no intelligence should be derived from it. The only invariant the holds is that each data stream generation index will have a unique name. - - -## Read requests [data-stream-read-requests] - -When you submit a read request to a data stream, the stream routes the request to all its backing indices. - -:::{image} ../../../images/elasticsearch-reference-data-streams-search-request.svg -:alt: data streams search request -::: - - -## Write index [data-stream-write-index] - -The most recently created backing index is the data stream’s write index. The stream adds new documents to this index only. - -:::{image} ../../../images/elasticsearch-reference-data-streams-index-request.svg -:alt: data streams index request -::: - -You cannot add new documents to other backing indices, even by sending requests directly to the index. - -You also cannot perform operations on a write index that may hinder indexing, such as: - -* [Clone](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-clone) -* [Delete](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-delete) -* [Shrink](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-shrink) -* [Split](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-split) - - -## Rollover [data-streams-rollover] - -A [rollover](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-rollover) creates a new backing index that becomes the stream’s new write index. - -We recommend using [{{ilm-init}}](../../../manage-data/lifecycle/index-lifecycle-management.md) to automatically roll over data streams when the write index reaches a specified age or size. If needed, you can also [manually roll over](../../../manage-data/data-store/index-types/use-data-stream.md#manually-roll-over-a-data-stream) a data stream. - - -## Generation [data-streams-generation] - -Each data stream tracks its generation: a six-digit, zero-padded integer starting at `000001`. - -When a backing index is created, the index is named using the following convention: - -```text -.ds--- -``` - -`` is the backing index’s creation date. Backing indices with a higher generation contain more recent data. For example, the `web-server-logs` data stream has a generation of `34`. The stream’s most recent backing index, created on 7 March 2099, is named `.ds-web-server-logs-2099.03.07-000034`. - -Some operations, such as a [shrink](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-shrink) or [restore](../../../deploy-manage/tools/snapshot-and-restore/restore-snapshot.md), can change a backing index’s name. These name changes do not remove a backing index from its data stream. - -The generation of the data stream can change without a new index being added to the data stream (e.g. when an existing backing index is shrunk). This means the backing indices for some generations will never exist. You should not derive any intelligence from the backing indices names. - - -## Append-only (mostly) [data-streams-append-only] - -Data streams are designed for use cases where existing data is rarely updated. You cannot send update or deletion requests for existing documents directly to a data stream. However, you can still [update or delete documents](../../../manage-data/data-store/index-types/use-data-stream.md#update-delete-docs-in-a-backing-index) in a data stream by submitting requests directly to the document’s backing index. - -If you need to update a larger number of documents in a data stream, you can use the [update by query](../../../manage-data/data-store/index-types/use-data-stream.md#update-docs-in-a-data-stream-by-query) and [delete by query](../../../manage-data/data-store/index-types/use-data-stream.md#delete-docs-in-a-data-stream-by-query) APIs. - -::::{tip} -If you frequently send multiple documents using the same `_id` expecting last-write-wins, you may want to use an index alias with a write index instead. See [Manage time series data without data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams). -:::: - - diff --git a/raw-migrated-files/toc.yml b/raw-migrated-files/toc.yml index 7004b43b8f..2939e81a8e 100644 --- a/raw-migrated-files/toc.yml +++ b/raw-migrated-files/toc.yml @@ -387,7 +387,7 @@ toc: - file: docs-content/serverless/security-dashboards-overview.md - file: docs-content/serverless/security-data-quality-dash.md - file: docs-content/serverless/security-data-views-in-sec.md - - file: docs-content/serverless/security-detection-engine-overview.md + - file: docs-content/serverless/security-detection-engine-overview.md - file: docs-content/serverless/security-detection-entity-dashboard.md - file: docs-content/serverless/security-detection-response-dashboard.md - file: docs-content/serverless/security-detections-requirements.md @@ -476,7 +476,6 @@ toc: - file: elasticsearch/elasticsearch-reference/change-passwords-native-users.md - file: elasticsearch/elasticsearch-reference/configuring-stack-security.md - file: elasticsearch/elasticsearch-reference/data-management.md - - file: elasticsearch/elasticsearch-reference/data-streams.md - file: elasticsearch/elasticsearch-reference/defining-roles.md - file: elasticsearch/elasticsearch-reference/document-level-security.md - file: elasticsearch/elasticsearch-reference/documents-indices.md diff --git a/solutions/security/detect-and-alert.md b/solutions/security/detect-and-alert.md index 516972dddb..b4dca2c01e 100644 --- a/solutions/security/detect-and-alert.md +++ b/solutions/security/detect-and-alert.md @@ -141,5 +141,5 @@ Depending on your privileges and whether detection system indices have already b ## Using logsdb index mode [detections-logsdb-index-mode] -To learn how your rules and alerts are affected by using the [logsdb index mode](/manage-data/data-store/index-types/logsdb.md), refer to [*Using logsdb index mode with {{elastic-sec}}*](/solutions/security/detect-and-alert/using-logsdb-index-mode-with-elastic-security.md). +To learn how your rules and alerts are affected by using the [logsdb index mode](/manage-data/data-store/index-types/logs-data-stream.md), refer to [*Using logsdb index mode with {{elastic-sec}}*](/solutions/security/detect-and-alert/using-logsdb-index-mode-with-elastic-security.md).