diff --git a/explore-analyze/find-and-organize/data-views.md b/explore-analyze/find-and-organize/data-views.md index 220b01f334..5f6cb7c6bd 100644 --- a/explore-analyze/find-and-organize/data-views.md +++ b/explore-analyze/find-and-organize/data-views.md @@ -119,7 +119,7 @@ A {{data-source}} can match one rollup index. For a combination rollup {{data-s rollup_logstash,kibana_sample_data_logs ``` -For an example, refer to [Create and visualize rolled up data](../../manage-data/lifecycle/rollup.md#rollup-data-tutorial). +For an example, refer to [Create and visualize rolled up data](/manage-data/lifecycle/rollup/getting-started-kibana.md#rollup-data-tutorial). ### Use {{data-sources}} with {{ccs}} [management-cross-cluster-search] diff --git a/manage-data/lifecycle/rollup.md b/manage-data/lifecycle/rollup.md index f240328db6..bec79054f6 100644 --- a/manage-data/lifecycle/rollup.md +++ b/manage-data/lifecycle/rollup.md @@ -2,23 +2,56 @@ mapped_urls: - https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-rollup.html - https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-overview.html - - https://www.elastic.co/guide/en/kibana/current/data-rollups.html --- # Rollup -% What needs to be done: Refine +::::{admonition} Deprecated in 8.11.0. +:class: warning -% GitHub issue: docs-projects#377 +Rollups will be removed in a future version. Please [migrate](/manage-data/lifecycle/rollup/migrating-from-rollup-to-downsampling.md) to [downsampling](/manage-data/data-store/index-types/downsampling-time-series-data-stream.md) instead. +:::: -% Scope notes: Combine linked resources. +Keeping historical data around for analysis is extremely useful but often avoided due to the financial cost of archiving massive amounts of data. For example, your system may be generating 500 documents every second. That will generate 43 million documents per day, and nearly 16 billion documents a year. Retention periods are thus driven by financial realities rather than by the usefulness of extensive historical data. -% Use migrated content from existing pages that map to this page: +While your analysts and data scientists may wish you stored that data indefinitely for analysis, time is never-ending and so your storage requirements will continue to grow without bound. Retention policies are therefore often dictated by the simple calculation of storage costs over time, and what the organization is willing to pay to retain historical data. Often these policies start deleting data after a few months or years. -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/xpack-rollup.md -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/rollup-overview.md -% - [ ] ./raw-migrated-files/kibana/kibana/data-rollups.md +Storage cost is a fixed quantity. It takes X money to store Y data. But the utility of a piece of data often changes with time. Sensor data gathered at millisecond granularity is extremely useful right now, reasonably useful if from a few weeks ago, and only marginally useful if older than a few months. -% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): +So while the cost of storing a millisecond of sensor data from ten years ago is fixed, the value of that individual sensor reading often diminishes with time. It’s not useless — it could easily contribute to a useful analysis — but it’s reduced value often leads to deletion rather than paying the fixed storage cost. -$$$rollup-data-tutorial$$$ \ No newline at end of file + +## Rollup stores historical data at reduced granularity [_rollup_stores_historical_data_at_reduced_granularity] + +That’s where Rollup comes into play. The Rollup functionality summarizes old, high-granularity data into a reduced granularity format for long-term storage. By "rolling" the data up into a single summary document, historical data can be compressed greatly compared to the raw data. + +For example, consider the system that’s generating 43 million documents every day. The second-by-second data is useful for real-time analysis, but historical analysis looking over ten years of data are likely to be working at a larger interval such as hourly or daily trends. + +If we compress the 43 million documents into hourly summaries, we can save vast amounts of space. The Rollup feature automates this process of summarizing historical data. + +Details about setting up and configuring Rollup are covered in [Create Job API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-rollup-put-job). + + +## Rollup uses standard Query DSL [_rollup_uses_standard_query_dsl] + +The Rollup feature exposes a new search endpoint (`/_rollup_search` vs the standard `/_search`) which knows how to search over rolled-up data. Importantly, this endpoint accepts 100% normal {{es}} Query DSL. Your application does not need to learn a new DSL to inspect historical data, it can simply reuse existing queries and dashboards. + +There are some limitations to the functionality available; not all queries and aggregations are supported, certain search features (highlighting, etc) are disabled, and available fields depend on how the rollup was configured. These limitations are covered more in [Rollup Search limitations](/manage-data/lifecycle/rollup/rollup-search-limitations.md). + +But if your queries, aggregations and dashboards only use the available functionality, redirecting them to historical data is trivial. + + +## Rollup merges "live" and "rolled" data [_rollup_merges_live_and_rolled_data] + +A useful feature of Rollup is the ability to query both "live", realtime data in addition to historical "rolled" data in a single query. + +For example, your system may keep a month of raw data. After a month, it is rolled up into historical summaries using Rollup and the raw data is deleted. + +If you were to query the raw data, you’d only see the most recent month. And if you were to query the rolled up data, you would only see data older than a month. The RollupSearch endpoint, however, supports querying both at the same time. It will take the results from both data sources and merge them together. If there is overlap between the "live" and "rolled" data, live data is preferred to increase accuracy. + + +## Rollup is multi-interval aware [_rollup_is_multi_interval_aware] + +Finally, Rollup is capable of intelligently utilizing the best interval available. If you’ve worked with summarizing features of other products, you’ll find that they can be limiting. If you configure rollups at daily intervals…​ your queries and charts can only work with daily intervals. If you need a monthly interval, you have to create another rollup that explicitly stores monthly averages, etc. + +The Rollup feature stores data in such a way that queries can identify the smallest available interval and use that for their processing. If you store rollups at a daily interval, queries can be executed on daily or longer intervals (weekly, monthly, etc) without the need to explicitly configure a new rollup job. This helps alleviate one of the major disadvantages of a rollup system; reduced flexibility relative to raw data. diff --git a/manage-data/lifecycle/rollup/getting-started-with-rollups.md b/manage-data/lifecycle/rollup/getting-started-api.md similarity index 96% rename from manage-data/lifecycle/rollup/getting-started-with-rollups.md rename to manage-data/lifecycle/rollup/getting-started-api.md index 858bf30fbd..2680b3e903 100644 --- a/manage-data/lifecycle/rollup/getting-started-with-rollups.md +++ b/manage-data/lifecycle/rollup/getting-started-api.md @@ -1,22 +1,18 @@ --- -navigation_title: "Getting started" +navigation_title: "Get started using the API" mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-getting-started.html --- - - -# Getting started with rollups [rollup-getting-started] - +# Get started with rollups using the API ::::{admonition} Deprecated in 8.11.0. :class: warning -Rollups will be removed in a future version. Please [migrate](migrating-from-rollup-to-downsampling.md) to [downsampling](../../data-store/index-types/downsampling-time-series-data-stream.md) instead. +Rollups will be removed in a future version. Please [migrate](migrating-from-rollup-to-downsampling.md) to [downsampling](/manage-data/data-store/index-types/downsampling-time-series-data-stream.md) instead. :::: - -::::{warning} +::::{warning} From 8.15.0 invoking the put job API in a cluster with no rollup usage will fail with a message about Rollup’s deprecation and planned removal. A cluster either needs to contain a rollup job or a rollup index in order for the put job API to be allowed to execute. :::: @@ -35,7 +31,7 @@ Imagine you have a series of daily indices that hold sensor data (`sensor-2017-0 ``` -## Creating a rollup job [_creating_a_rollup_job] +## Creating a rollup job [_creating_a_rollup_job] We’d like to rollup these documents into hourly summaries, which will allow us to generate reports and dashboards with any time interval one hour or greater. A rollup job might look like this: @@ -109,7 +105,7 @@ After you execute the above command and create the job, you’ll receive the fol ``` -## Starting the job [_starting_the_job] +## Starting the job [_starting_the_job] After the job is created, it will be sitting in an inactive state. Jobs need to be started before they begin processing data (this allows you to stop them later as a way to temporarily pause, without deleting the configuration). @@ -120,7 +116,7 @@ POST _rollup/job/sensor/_start ``` -## Searching the rolled results [_searching_the_rolled_results] +## Searching the rolled results [_searching_the_rolled_results] After the job has run and processed some data, we can use the [Rollup search](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-rollup-rollup-search) endpoint to do some searching. The Rollup feature is designed so that you can use the same Query DSL syntax that you are accustomed to…​ it just happens to run on the rolled up data instead. @@ -275,7 +271,7 @@ Which returns a corresponding response: In addition to being more complicated (date histogram and a terms aggregation, plus an additional average metric), you’ll notice the date_histogram uses a `7d` interval instead of `60m`. -## Conclusion [_conclusion] +## Conclusion [_conclusion] This quickstart should have provided a concise overview of the core functionality that Rollup exposes. There are more tips and things to consider when setting up Rollups, which you can find throughout the rest of this section. You may also explore the [REST API](https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-api-quickref.html) for an overview of what is available. diff --git a/raw-migrated-files/kibana/kibana/data-rollups.md b/manage-data/lifecycle/rollup/getting-started-kibana.md similarity index 92% rename from raw-migrated-files/kibana/kibana/data-rollups.md rename to manage-data/lifecycle/rollup/getting-started-kibana.md index 3713a71b00..34c27309d6 100644 --- a/raw-migrated-files/kibana/kibana/data-rollups.md +++ b/manage-data/lifecycle/rollup/getting-started-kibana.md @@ -1,12 +1,17 @@ -# Rollup Jobs [data-rollups] +--- +navigation_title: "Get started in Kibana" +mapped_pages: + - https://www.elastic.co/guide/en/kibana/current/data-rollups.html +--- + +# Get started with the rollups in {{kib}} ::::{admonition} Deprecated in 8.11.0. :class: warning -Rollups are deprecated and will be removed in a future version. Use [downsampling](../../../manage-data/data-store/index-types/downsampling-time-series-data-stream.md) instead. +Rollups are deprecated and will be removed in a future version. Use [downsampling](/manage-data/data-store/index-types/downsampling-time-series-data-stream.md) instead. :::: - A rollup job is a periodic task that aggregates data from indices specified by an index pattern, and then rolls it into a new index. Rollup indices are a good way to compactly store months or years of historical data for use in visualizations and reports. You can go to the **Rollup Jobs** page using the navigation menu or the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). @@ -16,16 +21,12 @@ You can go to the **Rollup Jobs** page using the navigation menu or the [global :class: screenshot ::: -Before using this feature, you should be familiar with how rollups work. [Rolling up historical data](../../../manage-data/lifecycle/rollup.md) is a good source for more detailed information. - - ## Required permissions [_required_permissions_4] The `manage_rollup` cluster privilege is required to access **Rollup jobs**. To add the privilege, go to the **Roles** management page using the navigation menu or the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). - ## Create a rollup job [create-and-manage-rollup-job] {{kib}} makes it easy for you to create a rollup job by walking you through the process. You fill in the name, data flow, and how often you want to roll up the data. Then you define a date histogram aggregation for the rollup job and optionally define terms, histogram, and metrics aggregations. @@ -37,7 +38,6 @@ When defining the index pattern, you must enter a name that is different than th :class: screenshot ::: - ## Start, stop, and delete rollup jobs [manage-rollup-job] Once you’ve saved a rollup job, you’ll see it the **Rollup Jobs** overview page, where you can drill down for further investigation. The **Manage** menu enables you to start, stop, and delete the rollup job. You must first stop a rollup job before deleting it. @@ -79,11 +79,11 @@ As you walk through the **Create rollup job** UI, enter the data: | Histogram interval | 1000 | | Metrics | bytes (average) | -On the **Review and save*** page, click ***Start job now*** and ***Save**. +On the **Review and save** page, click **Start job now** and **Save**. The terms, histogram, and metrics fields reflect the key information to retain in the rolled up data: where visitors are from (geo.src), what operating system they are using (machine.os.keyword), and how much data is being sent (bytes). -You can now use the rolled up data for analysis at a fraction of the storage cost of the original index. The original data can live side by side with the new rollup index, or you can remove or archive it using [{{ilm}} ({{ilm-init}})](../../../manage-data/lifecycle/index-lifecycle-management.md). +You can now use the rolled up data for analysis at a fraction of the storage cost of the original index. The original data can live side by side with the new rollup index, or you can remove or archive it using [{{ilm}} ({{ilm-init}})](/manage-data/lifecycle/index-lifecycle-management.md). ### Visualize the rolled up data [_visualize_the_rolled_up_data] diff --git a/manage-data/toc.yml b/manage-data/toc.yml index 6d9436a006..2fd971e5df 100644 --- a/manage-data/toc.yml +++ b/manage-data/toc.yml @@ -141,7 +141,8 @@ toc: - file: lifecycle/curator.md - file: lifecycle/rollup.md children: - - file: lifecycle/rollup/getting-started-with-rollups.md + - file: lifecycle/rollup/getting-started-api.md + - file: lifecycle/rollup/getting-started-kibana.md - file: lifecycle/rollup/understanding-groups.md - file: lifecycle/rollup/rollup-aggregation-limitations.md - file: lifecycle/rollup/rollup-search-limitations.md diff --git a/raw-migrated-files/elasticsearch/elasticsearch-reference/rollup-overview.md b/raw-migrated-files/elasticsearch/elasticsearch-reference/rollup-overview.md deleted file mode 100644 index 5a2b5fb0da..0000000000 --- a/raw-migrated-files/elasticsearch/elasticsearch-reference/rollup-overview.md +++ /dev/null @@ -1,58 +0,0 @@ ---- -navigation_title: "Overview" ---- - -# {{rollup-cap}} overview [rollup-overview] - - -::::{admonition} Deprecated in 8.11.0. -:class: warning - -Rollups will be removed in a future version. Please [migrate](../../../manage-data/lifecycle/rollup/migrating-from-rollup-to-downsampling.md) to [downsampling](../../../manage-data/data-store/index-types/downsampling-time-series-data-stream.md) instead. -:::: - - -Time-based data (documents that are predominantly identified by their timestamp) often have associated retention policies to manage data growth. For example, your system may be generating 500 documents every second. That will generate 43 million documents per day, and nearly 16 billion documents a year. - -While your analysts and data scientists may wish you stored that data indefinitely for analysis, time is never-ending and so your storage requirements will continue to grow without bound. Retention policies are therefore often dictated by the simple calculation of storage costs over time, and what the organization is willing to pay to retain historical data. Often these policies start deleting data after a few months or years. - -Storage cost is a fixed quantity. It takes X money to store Y data. But the utility of a piece of data often changes with time. Sensor data gathered at millisecond granularity is extremely useful right now, reasonably useful if from a few weeks ago, and only marginally useful if older than a few months. - -So while the cost of storing a millisecond of sensor data from ten years ago is fixed, the value of that individual sensor reading often diminishes with time. It’s not useless — it could easily contribute to a useful analysis — but it’s reduced value often leads to deletion rather than paying the fixed storage cost. - - -## Rollup stores historical data at reduced granularity [_rollup_stores_historical_data_at_reduced_granularity] - -That’s where Rollup comes into play. The Rollup functionality summarizes old, high-granularity data into a reduced granularity format for long-term storage. By "rolling" the data up into a single summary document, historical data can be compressed greatly compared to the raw data. - -For example, consider the system that’s generating 43 million documents every day. The second-by-second data is useful for real-time analysis, but historical analysis looking over ten years of data are likely to be working at a larger interval such as hourly or daily trends. - -If we compress the 43 million documents into hourly summaries, we can save vast amounts of space. The Rollup feature automates this process of summarizing historical data. - -Details about setting up and configuring Rollup are covered in [Create Job API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-rollup-put-job). - - -## Rollup uses standard Query DSL [_rollup_uses_standard_query_dsl] - -The Rollup feature exposes a new search endpoint (`/_rollup_search` vs the standard `/_search`) which knows how to search over rolled-up data. Importantly, this endpoint accepts 100% normal {{es}} Query DSL. Your application does not need to learn a new DSL to inspect historical data, it can simply reuse existing queries and dashboards. - -There are some limitations to the functionality available; not all queries and aggregations are supported, certain search features (highlighting, etc) are disabled, and available fields depend on how the rollup was configured. These limitations are covered more in [Rollup Search limitations](../../../manage-data/lifecycle/rollup/rollup-search-limitations.md). - -But if your queries, aggregations and dashboards only use the available functionality, redirecting them to historical data is trivial. - - -## Rollup merges "live" and "rolled" data [_rollup_merges_live_and_rolled_data] - -A useful feature of Rollup is the ability to query both "live", realtime data in addition to historical "rolled" data in a single query. - -For example, your system may keep a month of raw data. After a month, it is rolled up into historical summaries using Rollup and the raw data is deleted. - -If you were to query the raw data, you’d only see the most recent month. And if you were to query the rolled up data, you would only see data older than a month. The RollupSearch endpoint, however, supports querying both at the same time. It will take the results from both data sources and merge them together. If there is overlap between the "live" and "rolled" data, live data is preferred to increase accuracy. - - -## Rollup is multi-interval aware [_rollup_is_multi_interval_aware] - -Finally, Rollup is capable of intelligently utilizing the best interval available. If you’ve worked with summarizing features of other products, you’ll find that they can be limiting. If you configure rollups at daily intervals…​ your queries and charts can only work with daily intervals. If you need a monthly interval, you have to create another rollup that explicitly stores monthly averages, etc. - -The Rollup feature stores data in such a way that queries can identify the smallest available interval and use that for their processing. If you store rollups at a daily interval, queries can be executed on daily or longer intervals (weekly, monthly, etc) without the need to explicitly configure a new rollup job. This helps alleviate one of the major disadvantages of a rollup system; reduced flexibility relative to raw data. - diff --git a/raw-migrated-files/elasticsearch/elasticsearch-reference/xpack-rollup.md b/raw-migrated-files/elasticsearch/elasticsearch-reference/xpack-rollup.md deleted file mode 100644 index 9e75fc69c4..0000000000 --- a/raw-migrated-files/elasticsearch/elasticsearch-reference/xpack-rollup.md +++ /dev/null @@ -1,28 +0,0 @@ -# Rolling up historical data [xpack-rollup] - -::::{admonition} Deprecated in 8.11.0. -:class: warning - -Rollups will be removed in a future version. Please [migrate](../../../manage-data/lifecycle/rollup/migrating-from-rollup-to-downsampling.md) to [downsampling](../../../manage-data/data-store/index-types/downsampling-time-series-data-stream.md) instead. -:::: - - -Keeping historical data around for analysis is extremely useful but often avoided due to the financial cost of archiving massive amounts of data. Retention periods are thus driven by financial realities rather than by the usefulness of extensive historical data. - -The {{stack}} {{rollup-features}} provide a means to summarize and store historical data so that it can still be used for analysis, but at a fraction of the storage cost of raw data. - -* [Overview](../../../manage-data/lifecycle/rollup.md) -* [Getting started](../../../manage-data/lifecycle/rollup/getting-started-with-rollups.md) -* [API quick reference](https://www.elastic.co/guide/en/elasticsearch/reference/current/rollup-api-quickref.html) -* [Understanding rollup grouping](../../../manage-data/lifecycle/rollup/understanding-groups.md) -* [Rollup aggregation limitations](../../../manage-data/lifecycle/rollup/rollup-aggregation-limitations.md) -* [Rollup search limitations](../../../manage-data/lifecycle/rollup/rollup-search-limitations.md) -* [Migrating to downsampling](../../../manage-data/lifecycle/rollup/migrating-from-rollup-to-downsampling.md) - - - - - - - - diff --git a/raw-migrated-files/toc.yml b/raw-migrated-files/toc.yml index 60b4da22d2..2866dc78a2 100644 --- a/raw-migrated-files/toc.yml +++ b/raw-migrated-files/toc.yml @@ -557,7 +557,6 @@ toc: - file: elasticsearch/elasticsearch-reference/overview-index-lifecycle-management.md - file: elasticsearch/elasticsearch-reference/recovery-prioritization.md - file: elasticsearch/elasticsearch-reference/role-mapping-resources.md - - file: elasticsearch/elasticsearch-reference/rollup-overview.md - file: elasticsearch/elasticsearch-reference/saml-guide-stack.md - file: elasticsearch/elasticsearch-reference/saml-realm.md - file: elasticsearch/elasticsearch-reference/scalability.md @@ -580,7 +579,6 @@ toc: - file: elasticsearch/elasticsearch-reference/starting-elasticsearch.md - file: elasticsearch/elasticsearch-reference/stopping-elasticsearch.md - file: elasticsearch/elasticsearch-reference/xpack-autoscaling.md - - file: elasticsearch/elasticsearch-reference/xpack-rollup.md - file: ingest-docs/fleet/index.md children: - file: ingest-docs/fleet/beats-agent-comparison.md @@ -593,7 +591,6 @@ toc: - file: kibana/kibana/apm-settings-kb.md - file: kibana/kibana/connect-to-elasticsearch.md - file: kibana/kibana/console-kibana.md - - file: kibana/kibana/data-rollups.md - file: kibana/kibana/elasticsearch-mutual-tls.md - file: kibana/kibana/esql.md - file: kibana/kibana/install.md