diff --git a/explore-analyze/transforms.md b/explore-analyze/transforms.md index cf26d282bf..c0ac128b93 100644 --- a/explore-analyze/transforms.md +++ b/explore-analyze/transforms.md @@ -4,15 +4,19 @@ mapped_urls: - https://www.elastic.co/guide/en/serverless/current/transforms.html - https://www.elastic.co/guide/en/elasticsearch/reference/current/data-rollup-transform.html --- +# Transforming data [transforms] -# Transforms +{{transforms-cap}} enable you to convert existing {{es}} indices into summarized indices, which provide opportunities for new insights and analytics. For example, you can use {{transforms}} to pivot your data into entity-centric indices that summarize the behavior of users or sessions or other entities in your data. Or you can use {{transforms}} to find the latest document among all the documents that have a certain unique key. -% What needs to be done: Align serverless/stateful - -% Scope notes: views in last 6 months: ~90/week - -% Use migrated content from existing pages that map to this page: - -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/transforms.md -% - [ ] ./raw-migrated-files/docs-content/serverless/transforms.md -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/data-rollup-transform.md \ No newline at end of file +* [Overview](transforms/transform-overview.md) +* [Setup](transforms/transform-setup.md) +* [When to use {{transforms}}](transforms/transform-usage.md) +* [Generating alerts for {{transforms}}](transforms/transform-alerts.md) +* [{{transforms-cap}} at scale](transforms/transform-scale.md) +* [How checkpoints work](transforms/transform-checkpoints.md) +* [API quick reference](transforms/transform-api-quickref.md) +* [Tutorial: Transforming the eCommerce sample data](transforms/ecommerce-transforms.md) +* [Examples](transforms/transform-examples.md) +* [Painless examples](transforms/transform-painless-examples.md) +* [Troubleshooting {{transforms}}](../troubleshoot/elasticsearch/transform-troubleshooting.md) +* [Limitations](transforms/transform-limitations.md) diff --git a/explore-analyze/transforms/ecommerce-transforms.md b/explore-analyze/transforms/ecommerce-transforms.md index 155c5409ac..f7f7c3774a 100644 --- a/explore-analyze/transforms/ecommerce-transforms.md +++ b/explore-analyze/transforms/ecommerce-transforms.md @@ -14,34 +14,32 @@ mapped_pages: 3. Choose the pivot type of {{transform}} and play with various options for grouping and aggregating the data. - There are two types of {{transforms}}, but first we’ll try out *pivoting* your data, which involves using at least one field to group it and applying at least one aggregation. You can preview what the transformed data will look like, so go ahead and play with it! You can also enable histogram charts to get a better understanding of the distribution of values in your data. + There are two types of {{transforms}}, but first we’ll try out *pivoting* your data, which involves using at least one field to group it and applying at least one aggregation. You can preview what the transformed data will look like, so go ahead and play with it! You can also enable histogram charts to get a better understanding of the distribution of values in your data. - For example, you might want to group the data by product ID and calculate the total number of sales for each product and its average price. Alternatively, you might want to look at the behavior of individual customers and calculate how much each customer spent in total and how many different categories of products they purchased. Or you might want to take the currencies or geographies into consideration. What are the most interesting ways you can transform and interpret this data? + For example, you might want to group the data by product ID and calculate the total number of sales for each product and its average price. Alternatively, you might want to look at the behavior of individual customers and calculate how much each customer spent in total and how many different categories of products they purchased. Or you might want to take the currencies or geographies into consideration. What are the most interesting ways you can transform and interpret this data? - Go to **Management** > **Stack Management** > **Data** > **Transforms** in {{kib}} and use the wizard to create a {{transform}}: + Go to **Management** > **Stack Management** > **Data** > **Transforms** in {{kib}} and use the wizard to create a {{transform}}: + :::{image} ../../images/elasticsearch-reference-ecommerce-pivot1.png + :alt: Creating a simple {{transform}} in {kib} + :class: screenshot + ::: - :::{image} ../../images/elasticsearch-reference-ecommerce-pivot1.png - :alt: Creating a simple {{transform}} in {kib} - :class: screenshot - ::: + Group the data by customer ID and add one or more aggregations to learn more about each customer’s orders. For example, let’s calculate the sum of products they purchased, the total price of their purchases, the maximum number of products that they purchased in a single order, and their total number of orders. We’ll accomplish this by using the [`sum` aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html) on the `total_quantity` and `taxless_total_price` fields, the [`max` aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-max-aggregation.html) on the `total_quantity` field, and the [`cardinality` aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html) on the `order_id` field: - Group the data by customer ID and add one or more aggregations to learn more about each customer’s orders. For example, let’s calculate the sum of products they purchased, the total price of their purchases, the maximum number of products that they purchased in a single order, and their total number of orders. We’ll accomplish this by using the [`sum` aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html) on the `total_quantity` and `taxless_total_price` fields, the [`max` aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-max-aggregation.html) on the `total_quantity` field, and the [`cardinality` aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html) on the `order_id` field: + :::{image} ../../images/elasticsearch-reference-ecommerce-pivot2.png + :alt: Adding multiple aggregations to a {{transform}} in {kib} + :class: screenshot + ::: - :::{image} ../../images/elasticsearch-reference-ecommerce-pivot2.png - :alt: Adding multiple aggregations to a {{transform}} in {kib} - :class: screenshot - ::: + ::::{tip} + If you’re interested in a subset of the data, you can optionally include a [query](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#request-body-search-query) element. In this example, we’ve filtered the data so that we’re only looking at orders with a `currency` of `EUR`. Alternatively, we could group the data by that field too. If you want to use more complex queries, you can create your {{dataframe}} from a [saved search](../discover/save-open-search.md). + :::: - ::::{tip} - If you’re interested in a subset of the data, you can optionally include a [query](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#request-body-search-query) element. In this example, we’ve filtered the data so that we’re only looking at orders with a `currency` of `EUR`. Alternatively, we could group the data by that field too. If you want to use more complex queries, you can create your {{dataframe}} from a [saved search](../discover/save-open-search.md). - :::: + If you prefer, you can use the [preview {{transforms}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/preview-transform.html). - - If you prefer, you can use the [preview {{transforms}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/preview-transform.html). - - ::::{dropdown} API example - ```console - POST _transform/_preview + ::::{dropdown} API example + ```console + POST _transform/_preview { "source": { "index": "kibana_sample_data_ecommerce", @@ -85,33 +83,30 @@ mapped_pages: } } } - ``` + ``` - :::: + :::: 4. When you are satisfied with what you see in the preview, create the {{transform}}. - - 1. Supply a {{transform}} ID, the name of the destination index and optionally a description. If the destination index does not exist, it will be created automatically when you start the {{transform}}. - 2. Decide whether you want the {{transform}} to run once or continuously. Since this sample data index is unchanging, let’s use the default behavior and just run the {{transform}} once. If you want to try it out, however, go ahead and click on **Continuous mode**. You must choose a field that the {{transform}} can use to check which entities have changed. In general, it’s a good idea to use the ingest timestamp field. In this example, however, you can use the `order_date` field. - 3. Optionally, you can configure a retention policy that applies to your {{transform}}. Select a date field that is used to identify old documents in the destination index and provide a maximum age. Documents that are older than the configured value are removed from the destination index. - - :::{image} ../../images/elasticsearch-reference-ecommerce-pivot3.png - :alt: Adding transfrom ID and retention policy to a {{transform}} in {kib} - :class: screenshot - ::: - - In {{kib}}, before you finish creating the {{transform}}, you can copy the preview {{transform}} API request to your clipboard. This information is useful later when you’re deciding whether you want to manually create the destination index. - - :::{image} ../../images/elasticsearch-reference-ecommerce-pivot4.png - :alt: Copy the Dev Console statement of the transform preview to the clipboard - :class: screenshot - ::: - - If you prefer, you can use the [create {{transforms}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-transform.html). - - ::::{dropdown} API example - ```console - PUT _transform/ecommerce-customer-transform + 1. Supply a {{transform}} ID, the name of the destination index and optionally a description. If the destination index does not exist, it will be created automatically when you start the {{transform}}. + 2. Decide whether you want the {{transform}} to run once or continuously. Since this sample data index is unchanging, let’s use the default behavior and just run the {{transform}} once. If you want to try it out, however, go ahead and click on **Continuous mode**. You must choose a field that the {{transform}} can use to check which entities have changed. In general, it’s a good idea to use the ingest timestamp field. In this example, however, you can use the `order_date` field. + 3. Optionally, you can configure a retention policy that applies to your {{transform}}. Select a date field that is used to identify old documents in the destination index and provide a maximum age. Documents that are older than the configured value are removed from the destination index. + :::{image} ../../images/elasticsearch-reference-ecommerce-pivot3.png + :alt: Adding transfrom ID and retention policy to a {{transform}} in {kib} + :class: screenshot + ::: + + In {{kib}}, before you finish creating the {{transform}}, you can copy the preview {{transform}} API request to your clipboard. This information is useful later when you’re deciding whether you want to manually create the destination index. + :::{image} ../../images/elasticsearch-reference-ecommerce-pivot4.png + :alt: Copy the Dev Console statement of the transform preview to the clipboard + :class: screenshot + ::: + + If you prefer, you can use the [create {{transforms}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-transform.html). + + ::::{dropdown} API example + ```console + PUT _transform/ecommerce-customer-transform { "source": { "index": [ @@ -168,21 +163,18 @@ mapped_pages: } } } - ``` + ``` - :::: + :::: 5. Optional: Create the destination index. + If the destination index does not exist, it is created the first time you start your {{transform}}. A pivot transform deduces the mappings for the destination index from the source indices and the transform aggregations. If there are fields in the destination index that are derived from scripts (for example, if you use [`scripted_metrics`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html) or [`bucket_scripts`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html) aggregations), they’re created with [dynamic mappings](../../manage-data/data-store/mapping/dynamic-mapping.md). You can use the preview {{transform}} API to preview the mappings it will use for the destination index. In {{kib}}, if you copied the API request to your clipboard, paste it into the console, then refer to the `generated_dest_index` object in the API response. + ::::{note} + {{transforms-cap}} might have more configuration options provided by the APIs than the options available in {{kib}}. For example, you can set an ingest pipeline for `dest` by calling the [Create {{transform}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-transform.html). For all the {{transform}} configuration options, refer to the [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-apis.html). + :::: - If the destination index does not exist, it is created the first time you start your {{transform}}. A pivot transform deduces the mappings for the destination index from the source indices and the transform aggregations. If there are fields in the destination index that are derived from scripts (for example, if you use [`scripted_metrics`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html) or [`bucket_scripts`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html) aggregations), they’re created with [dynamic mappings](../../manage-data/data-store/mapping/dynamic-mapping.md). You can use the preview {{transform}} API to preview the mappings it will use for the destination index. In {{kib}}, if you copied the API request to your clipboard, paste it into the console, then refer to the `generated_dest_index` object in the API response. - - ::::{note} - {{transforms-cap}} might have more configuration options provided by the APIs than the options available in {{kib}}. For example, you can set an ingest pipeline for `dest` by calling the [Create {{transform}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-transform.html). For all the {{transform}} configuration options, refer to the [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-apis.html). - :::: - - - ::::{dropdown} API example - ```console-result + ::::{dropdown} API example + ```console-result { "preview" : [ { @@ -248,18 +240,17 @@ mapped_pages: "aliases" : { } } } - ``` - - :::: + ``` + :::: - In some instances the deduced mappings might be incompatible with the actual data. For example, numeric overflows might occur or dynamically mapped fields might contain both numbers and strings. To avoid this problem, create your destination index before you start the {{transform}}. For more information, see the [create index API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html). + In some instances the deduced mappings might be incompatible with the actual data. For example, numeric overflows might occur or dynamically mapped fields might contain both numbers and strings. To avoid this problem, create your destination index before you start the {{transform}}. For more information, see the [create index API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html). - ::::{dropdown} API example - You can use the information from the {{transform}} preview to create the destination index. For example: + ::::{dropdown} API example + You can use the information from the {{transform}} preview to create the destination index. For example: - ```console - PUT /ecommerce-customers + ```console + PUT /ecommerce-customers { "mappings": { "properties": { @@ -290,61 +281,53 @@ mapped_pages: } } } - ``` + ``` - :::: + :::: 6. Start the {{transform}}. + ::::{tip} + Even though resource utilization is automatically adjusted based on the cluster load, a {{transform}} increases search and indexing load on your cluster while it runs. If you’re experiencing an excessive load, however, you can stop it. + :::: - ::::{tip} - Even though resource utilization is automatically adjusted based on the cluster load, a {{transform}} increases search and indexing load on your cluster while it runs. If you’re experiencing an excessive load, however, you can stop it. - :::: + You can start, stop, reset, and manage {{transforms}} in {{kib}}: + :::{image} ../../images/elasticsearch-reference-manage-transforms.png + :alt: Managing {{transforms}} in {kib} + :class: screenshot + ::: + Alternatively, you can use the [start {{transforms}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-transform.html), [stop {{transforms}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/stop-transform.html) and [reset {{transforms}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/reset-transform.html) APIs. - You can start, stop, reset, and manage {{transforms}} in {{kib}}: + If you reset a {{transform}}, all checkpoints, states, and the destination index (if it was created by the {{transform}}) are deleted. The {{transform}} is ready to start again as if it had just been created. - :::{image} ../../images/elasticsearch-reference-manage-transforms.png - :alt: Managing {{transforms}} in {kib} - :class: screenshot - ::: + ::::{dropdown} API example + ```console + POST _transform/ecommerce-customer-transform/_start + ``` - Alternatively, you can use the [start {{transforms}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-transform.html), [stop {{transforms}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/stop-transform.html) and [reset {{transforms}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/reset-transform.html) APIs. + :::: - If you reset a {{transform}}, all checkpoints, states, and the destination index (if it was created by the {{transform}}) are deleted. The {{transform}} is ready to start again as if it had just been created. - - ::::{dropdown} API example - ```console - POST _transform/ecommerce-customer-transform/_start - ``` - - :::: - - - ::::{tip} - If you chose a batch {{transform}}, it is a single operation that has a single checkpoint. You cannot restart it when it’s complete. {{ctransforms-cap}} differ in that they continually increment and process checkpoints as new source data is ingested. - :::: + ::::{tip} + If you chose a batch {{transform}}, it is a single operation that has a single checkpoint. You cannot restart it when it’s complete. {{ctransforms-cap}} differ in that they continually increment and process checkpoints as new source data is ingested. + :::: 7. Explore the data in your new index. - - For example, use the **Discover** application in {{kib}}: - - :::{image} ../../images/elasticsearch-reference-ecommerce-results.png - :alt: Exploring the new index in {kib} - :class: screenshot - ::: + For example, use the **Discover** application in {{kib}}: + :::{image} ../../images/elasticsearch-reference-ecommerce-results.png + :alt: Exploring the new index in {kib} + :class: screenshot + ::: 8. Optional: Create another {{transform}}, this time using the `latest` method. - - This method populates the destination index with the latest documents for each unique key value. For example, you might want to find the latest orders (sorted by the `order_date` field) for each customer or for each country and region. - - :::{image} ../../images/elasticsearch-reference-ecommerce-latest1.png - :alt: Creating a latest {{transform}} in {kib} - :class: screenshot - ::: - - ::::{dropdown} API example - ```console - POST _transform/_preview + This method populates the destination index with the latest documents for each unique key value. For example, you might want to find the latest orders (sorted by the `order_date` field) for each customer or for each country and region. + :::{image} ../../images/elasticsearch-reference-ecommerce-latest1.png + :alt: Creating a latest {{transform}} in {kib} + :class: screenshot + ::: + + ::::{dropdown} API example + ```console + POST _transform/_preview { "source": { "index": "kibana_sample_data_ecommerce", @@ -361,16 +344,14 @@ mapped_pages: "sort": "order_date" } } - ``` + ``` - :::: + :::: - - ::::{tip} - If the destination index does not exist, it is created the first time you start your {{transform}}. Unlike pivot {{transforms}}, however, latest {{transforms}} do not deduce mapping definitions when they create the index. Instead, they use dynamic mappings. To use explicit mappings, create the destination index before you start the {{transform}}. - :::: + ::::{tip} + If the destination index does not exist, it is created the first time you start your {{transform}}. Unlike pivot {{transforms}}, however, latest {{transforms}} do not deduce mapping definitions when they create the index. Instead, they use dynamic mappings. To use explicit mappings, create the destination index before you start the {{transform}}. + :::: 9. If you do not want to keep a {{transform}}, you can delete it in {{kib}} or use the [delete {{transform}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-transform.html). By default, when you delete a {{transform}}, its destination index and {{kib}} index patterns remain. Now that you’ve created simple {{transforms}} for {{kib}} sample data, consider possible use cases for your own data. For more ideas, see [When to use {{transforms}}](transform-usage.md) and [Examples](transform-examples.md). - diff --git a/explore-analyze/transforms/transform-alerts.md b/explore-analyze/transforms/transform-alerts.md index 33f1d5c574..e8e3100be2 100644 --- a/explore-analyze/transforms/transform-alerts.md +++ b/explore-analyze/transforms/transform-alerts.md @@ -14,11 +14,10 @@ You can create {{transform}} rules under **{{stack-manage-app}} > {{rules-ui}}** 1. Click **Create rule** and select the {{transform}} health rule type. 2. Give a name to the rule and optionally provide tags. 3. Select the {{transform}} or {{transforms}} to include. You can also use a special character (`*`) to apply the rule to all your {{transforms}}. {{transforms-cap}} created after the rule are automatically included. - - :::{image} ../../images/elasticsearch-reference-transform-check-config.png - :alt: Selecting health check - :class: screenshot - ::: + :::{image} ../../images/elasticsearch-reference-transform-check-config.png + :alt: Selecting health check + :class: screenshot + ::: 4. The following health checks are available and enabled by default: @@ -33,7 +32,6 @@ You can create {{transform}} rules under **{{stack-manage-app}} > {{rules-ui}}** As the last step in the rule creation process, define its actions. - ## Defining actions [defining-actions] You can add one or more actions to your rule to generate notifications when its conditions are met and when they are no longer met. In particular, this rule type supports: @@ -55,7 +53,6 @@ After you select a connector, you must set the action frequency. You can choose If you choose a custom action interval, it cannot be shorter than the rule’s check interval. :::: - Alternatively, you can set the action frequency such that actions run for each alert. Choose how often the action runs (at each check interval, only when the alert status changes, or at a custom action interval). You must also choose an action group, which indicates whether the action runs when the issue is detected or when it is recovered. You can further refine the conditions under which actions run by specifying that actions only run when they match a KQL query or when an alert occurs within a specific time frame. @@ -71,7 +68,6 @@ After you save the configurations, the rule appears in the **{{rules-ui}}** list The name of an alert is always the same as the {{transform}} ID of the associated {{transform}} that triggered it. You can mute the notifications for a particular {{transform}} on the page of the rule that lists the individual alerts. You can open it via **{{rules-ui}}** by selecting the rule name. - ## Action variables [transform-action-variables] The following variables are specific to the {{transform}} health rule type. You can also specify [variables common to all rules](../alerts/kibana/rule-action-variables.md). @@ -103,5 +99,4 @@ The following variables are specific to the {{transform}} health rule type. You {{/context.results}} ``` - For more examples, refer to [Rule action variables](../alerts/kibana/rule-action-variables.md). diff --git a/explore-analyze/transforms/transform-api-quickref.md b/explore-analyze/transforms/transform-api-quickref.md index f1a884b96a..e84a76974b 100644 --- a/explore-analyze/transforms/transform-api-quickref.md +++ b/explore-analyze/transforms/transform-api-quickref.md @@ -23,4 +23,3 @@ _transform/ * [Update {{transforms}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/update-transform.html) For the full list, see [*{{transform-cap}} APIs*](https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-apis.html). - diff --git a/explore-analyze/transforms/transform-checkpoints.md b/explore-analyze/transforms/transform-checkpoints.md index 3d872cb1af..fe5127f488 100644 --- a/explore-analyze/transforms/transform-checkpoints.md +++ b/explore-analyze/transforms/transform-checkpoints.md @@ -4,11 +4,8 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-checkpoints.html --- - - # How checkpoints work [transform-checkpoints] - Each time a {{transform}} examines the source indices and creates or updates the destination index, it generates a *checkpoint*. If your {{transform}} runs only once, there is logically only one checkpoint. If your {{transform}} runs continuously, however, it creates checkpoints as it ingests and transforms new source data. The `sync` property of the {{transform}} configures checkpointing by specifying a time field. @@ -31,15 +28,12 @@ To create a checkpoint, the {{ctransform}}: The {{transform}} applies changes related to either new or changed entities or time buckets to the destination index. The set of changes can be paginated. The {{transform}} performs a composite aggregation similarly to the batch {{transform}} operation, however it also injects query filters based on the previous step to reduce the amount of work. After all changes have been applied, the checkpoint is complete. - This checkpoint process involves both search and indexing activity on the cluster. We have attempted to favor control over performance while developing {{transforms}}. We decided it was preferable for the {{transform}} to take longer to complete, rather than to finish quickly and take precedence in resource consumption. That being said, the cluster still requires enough resources to support both the composite aggregation search and the indexing of its results. ::::{tip} If the cluster experiences unsuitable performance degradation due to the {{transform}}, stop the {{transform}} and refer to [Performance considerations](transform-overview.md#transform-performance). :::: - - ## Using the ingest timestamp for syncing the {{transform}} [sync-field-ingest-timestamp] In most cases, it is strongly recommended to use the ingest timestamp of the source indices for syncing the {{transform}}. This is the most optimal way for {{transforms}} to be able to identify new changes. If your data source follows the [ECS standard](https://www.elastic.co/guide/en/ecs/{{ecs_version}}/ecs-reference.html), you might already have an [`event.ingested`](https://www.elastic.co/guide/en/ecs/{{ecs_version}}/ecs-event.html#field-event-ingested) field. In this case, use `event.ingested` as the `sync`.`time`.`field` property of your {{transform}}. @@ -65,7 +59,6 @@ After you created the ingest pipeline, apply it to the source indices of your {{ Refer to [Add a pipeline to an indexing request](../../manage-data/ingest/transform-enrich/ingest-pipelines.md#add-pipeline-to-indexing-request) and [Ingest pipelines](../../manage-data/ingest/transform-enrich/ingest-pipelines.md) to learn more about how to use an ingest pipeline. - ## Change detection heuristics [ml-transform-checkpoint-heuristics] When the {{transform}} runs in continuous mode, it updates the documents in the destination index as new data comes in. The {{transform}} uses a set of heuristics called change detection to update the destination index with fewer operations. @@ -74,7 +67,6 @@ In this example, the data is grouped by host names. Change detection detects whi Another heuristic can be applied for time buckets when a `date_histogram` is used to group by time buckets. Change detection detects which time buckets have changed and only update those. - ## Error handling [ml-transform-checkpoint-errors] Failures in {{transforms}} tend to be related to searching or indexing. To increase the resiliency of {{transforms}}, the cursor positions of the aggregated search and the changed entities search are tracked in memory and persisted periodically. diff --git a/explore-analyze/transforms/transform-examples.md b/explore-analyze/transforms/transform-examples.md index 2474dfa683..9292838f76 100644 --- a/explore-analyze/transforms/transform-examples.md +++ b/explore-analyze/transforms/transform-examples.md @@ -4,11 +4,8 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-examples.html --- - - # Examples [transform-examples] - These examples demonstrate how to use {{transforms}} to derive useful insights from your data. All the examples use one of the [{{kib}} sample datasets](https://www.elastic.co/guide/en/kibana/current/add-sample-data.html). For a more detailed, step-by-step example, see [Tutorial: Transforming the eCommerce sample data](ecommerce-transforms.md). * [Finding your best customers](#example-best-customers) @@ -58,12 +55,10 @@ POST _transform/_preview 1. The destination index for the {{transform}}. It is ignored by `_preview`. 2. Two `group_by` fields is selected. This means the {{transform}} contains a unique row per `user` and `customer_id` combination. Within this data set, both these fields are unique. By including both in the {{transform}}, it gives more context to the final results. - ::::{note} In the example above, condensed JSON formatting is used for easier readability of the pivot object. :::: - The preview {{transforms}} API enables you to see the layout of the {{transform}} in advance, populated with some sample values. For example: ```js @@ -85,7 +80,6 @@ The preview {{transforms}} API enables you to see the layout of the {{transform} ::::: - This {{transform}} makes it easier to answer questions such as: * Which customers spend the most? @@ -95,7 +89,6 @@ This {{transform}} makes it easier to answer questions such as: It’s possible to answer these questions using aggregations alone, however {{transforms}} allow us to persist this data as a customer centric index. This enables us to analyze data at scale and gives more flexibility to explore and navigate data from a customer centric perspective. In some cases, it can even make creating visualizations much simpler. - ## Finding air carriers with the most delays [example-airline] This example uses the Flights sample data set to find out which air carrier had the most delays. First, filter the source data such that it excludes all the cancelled flights by using a query filter. Then transform the data to contain the distinct number of flights, the sum of delayed minutes, and the sum of the flight minutes by air carrier. Finally, use a [`bucket_script`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html) to determine what percentage of the flight time was actually delay. @@ -143,7 +136,6 @@ POST _transform/_preview 3. The data is grouped by the `Carrier` field which contains the airline name. 4. This `bucket_script` performs calculations on the results that are returned by the aggregation. In this particular example, it calculates what percentage of travel time was taken up by delays. - The preview shows you that the new index would contain data like this for each carrier: ```js @@ -169,8 +161,6 @@ This {{transform}} makes it easier to answer questions such as: This data is fictional and does not reflect actual delays or flight stats for any of the featured destination or origin airports. :::: - - ## Finding suspicious client IPs [example-clientips] This example uses the web log sample data set to identify suspicious client IPs. It transforms the data such that the new index contains the sum of bytes and the number of distinct URLs, agents, incoming requests by location, and geographic destinations for each client IP. It also uses filter aggregations to count the specific types of HTTP responses that each client IP receives. Ultimately, the example below transforms web log data into an entity centric index where the entity is `clientip`. @@ -235,7 +225,6 @@ PUT _transform/suspicious_client_ips 4. Filter aggregation that counts the occurrences of successful (`200`) responses in the `response` field. The following two aggregations (`error404` and `error5xx`) count the error responses by error codes, matching an exact value or a range of response codes. 5. This `bucket_script` calculates the duration of the `clientip` access based on the results of the aggregation. - After you create the {{transform}}, you must start it: ```console @@ -285,7 +274,6 @@ The search result shows you data like this for each client IP: Like other Kibana sample data sets, the web log sample dataset contains timestamps relative to when you installed it, including timestamps in the future. The {{ctransform}} will pick up the data points once they are in the past. If you installed the web log sample dataset some time ago, you can uninstall and reinstall it and the timestamps will change. :::: - This {{transform}} makes it easier to answer questions such as: * Which client IPs are transferring the most amounts of data? @@ -293,7 +281,6 @@ This {{transform}} makes it easier to answer questions such as: * Which client IPs have high error rates? * Which client IPs are interacting with a high number of destination countries? - ## Finding the last log event for each IP address [example-last-log] This example uses the web log sample data set to find the last log from an IP address. Let’s use the `latest` type of {{transform}} in continuous mode. It copies the most recent document for each unique key from the source index to the destination index and updates the destination index as new data comes into the source index. @@ -357,7 +344,6 @@ PUT _transform/last-log-from-clientip 4. Contains the time field and delay settings used to synchronize the source and destination indices. 5. Specifies the retention policy for the transform. Documents that are older than the configured value will be removed from the destination index. - After you create the {{transform}}, start it: ```console @@ -366,7 +352,6 @@ POST _transform/last-log-from-clientip/_start :::: - After the {{transform}} processes the data, search the destination index: ```console @@ -425,7 +410,6 @@ This {{transform}} makes it easier to answer questions such as: * What was the most recent log event associated with a specific IP address? - ## Finding client IPs that sent the most bytes to the server [example-bytes] This example uses the web log sample data set to find the client IP that sent the most bytes to the server in every hour. The example uses a `pivot` {{transform}} with a [`top_metrics`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-metrics.html) aggregation. @@ -477,7 +461,6 @@ POST _transform/_preview 2. Calculates the maximum value of the `bytes` field. 3. Specifies the fields (`clientip` and `geo.src`) of the top document to return and the sorting method (document with the highest `bytes` value). - The API call above returns a response similar to this: ```js @@ -518,7 +501,6 @@ The API call above returns a response similar to this: } ``` - ## Getting customer name and email address by customer ID [example-customer-names] This example uses the ecommerce sample data set to create an entity-centric index based on customer ID, and to get the customer name and email address by using the `top_metrics` aggregation. @@ -566,7 +548,6 @@ POST _transform/_preview 1. The data is grouped by a `terms` aggregation on the `customer_id` field. 2. Specifies the fields to return (email and name fields) in a descending order by the order date. - The API returns a response that is similar to this: ```js @@ -600,5 +581,3 @@ The API returns a response that is similar to this: ] } ``` - - diff --git a/explore-analyze/transforms/transform-limitations.md b/explore-analyze/transforms/transform-limitations.md index ff8af87e48..20081600fe 100644 --- a/explore-analyze/transforms/transform-limitations.md +++ b/explore-analyze/transforms/transform-limitations.md @@ -4,31 +4,24 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-limitations.html --- - - # Limitations [transform-limitations] - The following limitations and known problems apply to the 9.0.0-beta1 release of the Elastic {{transform}} feature. The limitations are grouped into the following categories: * [Configuration limitations](#transform-config-limitations) apply to the configuration process of the {{transforms}}. * [Operational limitations](#transform-operational-limitations) affect the behavior of the {{transforms}} that are running. * [Limitations in {{kib}}](#transform-ui-limitations) only apply to {{transforms}} managed via the user interface. - ## Configuration limitations [transform-config-limitations] - ### Field names prefixed with underscores are omitted from latest {{transforms}} [transforms-underscore-limitation] If you use the `latest` type of {{transform}} and the source index has field names that start with an underscore (_) character, they are assumed to be internal fields. Those fields are omitted from the documents in the destination index. - ### {{transforms-cap}} support {{ccs}} if the remote cluster is configured properly [transforms-ccs-limitation] If you use [{{ccs}}](../../solutions/search/cross-cluster-search.md), the remote cluster must support the search and aggregations you use in your {{transforms}}. {{transforms-cap}} validate their configuration; if you use {{ccs}} and the validation fails, make sure that the remote cluster supports the query and aggregations you use. - ### Using scripts in {{transforms}} [transform-painless-limitation] {{transforms-cap}} support scripting in every case when aggregations support them. However, there are certain factors you might want to consider when using scripts in {{transforms}}: @@ -37,25 +30,20 @@ If you use [{{ccs}}](../../solutions/search/cross-cluster-search.md), the remote * Scripted fields may increase the runtime of the {{transform}}. * {{transforms-cap}} cannot optimize queries when you use scripts for all the groupings defined in `group_by`, you will receive a warning message when you use scripts this way. - ### Deprecation warnings for Painless scripts in {{transforms}} [transform-painless-warning-limitation] If a {{transform}} contains Painless scripts that use deprecated syntax, deprecation warnings are displayed when the {{transform}} is previewed or started. However, it is not possible to check for deprecation warnings across all {{transforms}} as a bulk action because running the required queries might be a resource intensive process. Therefore any deprecation warnings due to deprecated Painless syntax are not available in the Upgrade Assistant. - ### {{transforms-cap}} perform better on indexed fields [transform-runtime-field-limitation] {{transforms-cap}} sort data by a user-defined time field, which is frequently accessed. If the time field is a [runtime field](../../manage-data/data-store/mapping/runtime-fields.md), the performance impact of calculating field values at query time can significantly slow the {{transform}}. Use an indexed field as a time field when using {{transforms}}. - ### {{ctransform-cap}} scheduling limitations [transform-scheduling-limitations] A {{ctransform}} periodically checks for changes to source data. The functionality of the scheduler is currently limited to a basic periodic timer which can be within the `frequency` range from 1s to 1h. The default is 1m. This is designed to run little and often. When choosing a `frequency` for this timer consider your ingest rate along with the impact that the {{transform}} search/index operations has other users in your cluster. Also note that retries occur at `frequency` interval. - ## Operational limitations [transform-operational-limitations] - ### Aggregation responses may be incompatible with destination index mappings [transform-aggresponse-limitations] When a pivot {{transform}} is first started, it will deduce the mappings required for the destination index. This process is based on the field types of the source index and the aggregations used. If the fields are derived from [`scripted_metrics`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html) or [`bucket_scripts`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html), [dynamic mappings](../../manage-data/data-store/mapping/dynamic-mapping.md) will be used. In some instances the deduced mappings may be incompatible with the actual data. For example, numeric overflows might occur or dynamically mapped fields might contain both numbers and strings. Please check {{es}} logs if you think this may have occurred. @@ -64,12 +52,10 @@ You can view the deduced mappings by using the [preview transform API](https://w If it’s required, you may define custom mappings prior to starting the {{transform}} by creating a custom destination index using the [create index API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html). As deduced mappings cannot be overwritten by an index template, use the create index API to define custom mappings. The index templates only apply to fields derived from scripts that use dynamic mappings. - ### Batch {{transforms}} may not account for changed documents [transform-batch-limitations] A batch {{transform}} uses a [composite aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html) which allows efficient pagination through all buckets. Composite aggregations do not yet support a search context, therefore if the source data is changed (deleted, updated, added) while the batch {{dataframe}} is in progress, then the results may not include these changes. - ### {{ctransform-cap}} consistency does not account for deleted or updated documents [transform-consistency-limitations] While the process for {{transforms}} allows the continual recalculation of the {{transform}} as new data is being ingested, it does also have some limitations. @@ -80,12 +66,10 @@ If the indices that fall within the scope of the source index pattern are remove Depending on your use case, you may wish to recreate the {{transform}} entirely after deletions. Alternatively, if your use case is tolerant to historical archiving, you may wish to include a max ingest timestamp in your aggregation. This will allow you to exclude results that have not been recently updated when viewing the destination index. - ### Deleting a {{transform}} does not delete the destination index or {{kib}} index pattern [transform-deletion-limitations] When deleting a {{transform}} using `DELETE _transform/index` neither the destination index nor the {{kib}} index pattern, should one have been created, are deleted. These objects must be deleted separately. - ### Handling dynamic adjustment of aggregation page size [transform-aggregation-page-limitations] During the development of {{transforms}}, control was favoured over performance. In the design considerations, it is preferred for the {{transform}} to take longer to complete quietly in the background rather than to finish quickly and take precedence in resource consumption. @@ -96,7 +80,6 @@ For a batch {{transform}}, the number of buckets requested is only ever adjusted The {{transform}} retrieves data in batches which means it calculates several buckets at once. Per default this is 500 buckets per search/index operation. The default can be changed using `max_page_search_size` and the minimum value is 10. If failures still occur once the number of buckets requested has been reduced to its minimum, then the {{transform}} will be set to a failed state. - ### Handling dynamic adjustments for many terms [transform-dynamic-adjustments-limitations] For each checkpoint, entities are identified that have changed since the last time the check was performed. This list of changed entities is supplied as a [terms query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html) to the {{transform}} composite aggregation, one page at a time. Then updates are applied to the destination index for each page of entities. @@ -107,14 +90,12 @@ The index setting [`index.max_terms_count`](https://www.elastic.co/guide/en/elas Using smaller values for `max_page_search_size` may result in a longer duration for the {{transform}} checkpoint to complete. - ### Handling of failed {{transforms}} [transform-failed-limitations] Failed {{transforms}} remain as a persistent task and should be handled appropriately, either by deleting it or by resolving the root cause of the failure and re-starting. When using the API to delete a failed {{transform}}, first stop it using `_stop?force=true`, then delete it. - ### {{ctransforms-cap}} may give incorrect results if documents are not yet available to search [transform-availability-limitations] After a document is indexed, there is a very small delay until it is available to search. @@ -123,37 +104,30 @@ A {{ctransform}} periodically checks for changed entities between the time since If using a `sync.time.field` that represents the data ingest time and using a zero second or very small `sync.time.delay`, then it is more likely that this issue will occur. - ### Support for date nanoseconds data type [transform-date-nanos] If your data uses the [date nanosecond data type](https://www.elastic.co/guide/en/elasticsearch/reference/current/date_nanos.html), aggregations are nonetheless on millisecond resolution. This limitation also affects the aggregations in your {{transforms}}. - ### Data streams as destination indices are not supported [transform-data-streams-destination] {{transforms-cap}} update data in the destination index which requires writing into the destination. [Data streams](../../manage-data/data-store/index-types/data-streams.md) are designed to be append-only, which means you cannot send update or delete requests directly to a data stream. For this reason, data streams are not supported as destination indices for {{transforms}}. - ### ILM as destination index may cause duplicated documents [transform-ilm-destination] [ILM](../../manage-data/lifecycle/index-lifecycle-management.md) is not recommended to use as a {{transform}} destination index. {{transforms-cap}} update documents in the current destination, and cannot delete documents in the indices previously used by ILM. This may lead to duplicated documents when you use {{transforms}} combined with ILM in case of a rollover. If you use ILM to have time-based indices, please consider using the [Date index name](https://www.elastic.co/guide/en/elasticsearch/reference/current/date-index-name-processor.html) instead. The processor works without duplicated documents if your {{transform}} contains a `group_by` based on `date_histogram`. - ## Limitations in {{kib}} [transform-ui-limitations] - ### {{transforms-cap}} are visible in all {{kib}} spaces [transform-space-limitations] [Spaces](../../deploy-manage/manage-spaces.md) enable you to organize your source and destination indices and other saved objects in {{kib}} and to see only the objects that belong to your space. However, a {{transform}} is a long running task which is managed on cluster level and therefore not limited in scope to certain spaces. Space awareness can be implemented for a {{data-source}} under **Stack Management > Kibana** which allows privileges to the {{transform}} destination index. - ### Up to 1,000 {{transforms}} are listed in {{kib}} [transform-kibana-limitations] The {{transforms}} management page in {{kib}} lists up to 1000 {{transforms}}. - ### {{kib}} might not support every {{transform}} configuration option [transform-ui-support] There might be configuration options available via the {{transform}} APIs that are not supported in {{kib}}. For an exhaustive list of configuration options, refer to the [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-apis.html). diff --git a/explore-analyze/transforms/transform-overview.md b/explore-analyze/transforms/transform-overview.md index dfc8cc164c..5deaf634f1 100644 --- a/explore-analyze/transforms/transform-overview.md +++ b/explore-analyze/transforms/transform-overview.md @@ -4,20 +4,17 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-overview.html --- - - # Overview [transform-overview] - You can choose either of the following methods to transform your data: [pivot](#pivot-transform-overview) or [latest](#latest-transform-overview). ::::{important} + * All {{transforms}} leave your source index intact. They create a new index that is dedicated to the transformed data. * {{transforms-cap}} might have more configuration options provided by the APIs than the options available in {{kib}}. For all the {{transform}} configuration options, refer to the [API documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-apis.html). :::: - {{transforms-cap}} are persistent tasks; they are stored in cluster state which makes them resilient for node failures. Refer to [How checkpoints work](transform-checkpoints.md) and [Error handling](transform-checkpoints.md#ml-transform-checkpoint-errors) to learn more about the machinery behind {{transforms}}. ## Pivot {{transforms}} [pivot-transform-overview] @@ -45,7 +42,6 @@ If you want to check the sales in the different categories in your last fiscal y :class: screenshot ::: - ## Latest {{transforms}} [latest-transform-overview] You can use the `latest` type of {{transform}} to copy the most recent documents into a new index. You must identify one or more fields as the unique key for grouping your data, as well as a date field that sorts the data chronologically. For example, you can use this type of {{transform}} to keep track of the latest purchase for each customer or the latest event for each host. @@ -57,7 +53,6 @@ You can use the `latest` type of {{transform}} to copy the most recent documents As in the case of a pivot, a latest {{transform}} can run once or continuously. It performs a composite aggregation on the data in the source index and stores the output in the destination index. If the {{transform}} runs continuously, new unique key values are automatically added to the destination index and the most recent documents for existing key values are automatically updated at each checkpoint. - ## Performance considerations [transform-performance] {{transforms-cap}} perform search aggregations on the source indices then index the results into the destination index. Therefore, a {{transform}} never takes less time or uses less resources than the aggregation and indexing processes. @@ -71,5 +66,3 @@ If you prefer to spread out the impact on your cluster (at the cost of a slower ``` documents_processed / search_time_in_ms * 1000 ``` - - diff --git a/explore-analyze/transforms/transform-painless-examples.md b/explore-analyze/transforms/transform-painless-examples.md index 54ebbb506c..28d203cc12 100644 --- a/explore-analyze/transforms/transform-painless-examples.md +++ b/explore-analyze/transforms/transform-painless-examples.md @@ -4,16 +4,12 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-painless-examples.html --- - - # Painless examples [transform-painless-examples] - ::::{important} The examples that use the `scripted_metric` aggregation are not supported on {{es}} Serverless. :::: - These examples demonstrate how to use Painless in {{transforms}}. You can learn more about the Painless scripting language in the [Painless guide](https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-guide.html). * [Getting top hits by using scripted metric aggregation](#painless-top-hits) @@ -23,22 +19,21 @@ These examples demonstrate how to use Painless in {{transforms}}. You can learn * [Comparing indices by using scripted metric aggregations](#painless-compare) * [Getting web session details by using scripted metric aggregation](#painless-web-session) -::::{note} +::::{note} + * While the context of the following examples is the {{transform}} use case, the Painless scripts in the snippets below can be used in other {{es}} search aggregations, too. * All the following examples use scripts, {{transforms}} cannot deduce mappings of output fields when the fields are created by a script. {{transforms-cap}} don’t create any mappings in the destination index for these fields, which means they get dynamically mapped. Create the destination index prior to starting the {{transform}} in case you want explicit mappings. :::: - ## Getting top hits by using scripted metric aggregation [painless-top-hits] This snippet shows how to find the latest document, in other words the document with the latest timestamp. From a technical perspective, it helps to achieve the function of a [Top hits](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html) by using scripted metric aggregation in a {{transform}}, which provides a metric output. -::::{important} +::::{important} This example uses a `scripted_metric` aggregation which is not supported on {{es}} Serverless. :::: - ```js "aggregations": { "latest_doc": { @@ -68,7 +63,6 @@ This example uses a `scripted_metric` aggregation which is not supported on {{es 3. The `combine_script` returns `state` from each shard. 4. The `reduce_script` iterates through the value of `s.timestamp_latest` returned by each shard and returns the document with the latest timestamp (`last_doc`). In the response, the top hit (in other words, the `latest_doc`) is nested below the `latest_doc` field. - Check the [scope of scripts](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html#scripted-metric-aggregation-scope) for detailed explanation on the respective scripts. You can retrieve the last value in a similar way: @@ -97,8 +91,7 @@ You can retrieve the last value in a similar way: } ``` - -#### Getting top hits by using stored scripts [top-hits-stored-scripts] +### Getting top hits by using stored scripts [top-hits-stored-scripts] You can also use the power of [stored scripts](https://www.elastic.co/guide/en/elasticsearch/reference/current/create-stored-script-api.html) to get the latest value. Stored scripts are updatable, enable collaboration, and avoid duplication across queries. @@ -178,8 +171,6 @@ You can also use the power of [stored scripts](https://www.elastic.co/guide/en/e 1. The parameter `field_with_last_value` can be set any field that you want the latest value for. - - ## Getting time features by using aggregations [painless-time-features] This snippet shows how to extract time based features by using Painless in a {{transform}}. The snippet uses an index where `@timestamp` is defined as a `date` type field. @@ -219,8 +210,6 @@ This snippet shows how to extract time based features by using Painless in a {{t 7. Sets `date` based on the timestamp of the document. 8. Returns the month value from `date`. - - ## Getting duration by using bucket script [painless-bucket-script] This example shows you how to get the duration of a session by client IP from a data log by using [bucket script](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html). The example uses the {{kib}} sample web logs dataset. @@ -268,19 +257,16 @@ PUT _transform/data_log 2. The bucket path is a map of script variables and their associated path to the buckets you want to use for the variable. In this particular case, `min` and `max` are variables mapped to `time_frame.gte.value` and `time_frame.lte.value`. 3. Finally, the script substracts the start date of the session from the end date which results in the duration of the session. - - ## Counting HTTP responses by using scripted metric aggregation [painless-count-http] You can count the different HTTP response types in a web log data set by using scripted metric aggregation as part of the {{transform}}. You can achieve a similar function with filter aggregations, check the [Finding suspicious client IPs](transform-examples.md#example-clientips) example for details. The example below assumes that the HTTP response codes are stored as keywords in the `response` field of the documents. -::::{important} +::::{important} This example uses a `scripted_metric` aggregation which is not supported on {{es}} Serverless. :::: - ```js "aggregations": { <1> "responses.counts": { <2> @@ -320,17 +306,14 @@ This example uses a `scripted_metric` aggregation which is not supported on {{es 6. The `combine_script` returns `state.responses` from each shard. 7. The `reduce_script` creates a `counts` array with the `error`, `success`, and `other` properties, then iterates through the value of `responses` returned by each shard and assigns the different response types to the appropriate properties of the `counts` object; error responses to the error counts, success responses to the success counts, and other responses to the other counts. Finally, returns the `counts` array with the response counts. - - ## Comparing indices by using scripted metric aggregations [painless-compare] This example shows how to compare the content of two indices by a {{transform}} that uses a scripted metric aggregation. -::::{important} +::::{important} This example uses a `scripted_metric` aggregation which is not supported on {{es}} Serverless. :::: - ```console POST _transform/_preview { @@ -385,8 +368,6 @@ POST _transform/_preview 6. The `combine_script` returns `state` from each shard. 7. The `reduce_script` checks if the size of the indices are equal. If they are not equal, than it reports back a `count_mismatch`. Then it iterates through all the values of the two indices and compare them. If the values are equal, then it returns a `match`, otherwise returns a `mismatch`. - - ## Getting web session details by using scripted metric aggregation [painless-web-session] This example shows how to derive multiple features from a single transaction. Let’s take a look on the example source document from the data: @@ -430,14 +411,12 @@ This example shows how to derive multiple features from a single transaction. Le :::: - By using the `sessionid` as a group-by field, you are able to enumerate events through the session and get more details of the session by using scripted metric aggregation. -::::{important} +::::{important} This example uses a `scripted_metric` aggregation which is not supported on {{es}} Serverless. :::: - ```js POST _transform/_preview { @@ -513,7 +492,6 @@ POST _transform/_preview 5. The `combine_script` returns `state.docs` from each shard. 6. The `reduce_script` defines various objects like `min_time`, `max_time`, and `duration` based on the document fields, then declares a `ret` object, and copies the source document by using `new HashMap ()`. Next, the script defines `first_time`, `last_time`, `duration` and other fields inside the `ret` object based on the corresponding object defined earlier, finally returns `ret`. - The API call results in a similar response: ```js @@ -545,5 +523,3 @@ The API call results in a similar response: } ... ``` - - diff --git a/explore-analyze/transforms/transform-scale.md b/explore-analyze/transforms/transform-scale.md index 93cd50cde6..569a23c5c5 100644 --- a/explore-analyze/transforms/transform-scale.md +++ b/explore-analyze/transforms/transform-scale.md @@ -4,11 +4,8 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-scale.html --- - - # Transforms at scale [transform-scale] - {{transforms-cap}} convert existing {{es}} indices into summarized indices, which provide opportunities for new insights and analytics. The search and index operations performed by {{transforms}} use standard {{es}} features so similar considerations for working with {{es}} at scale are often applicable to {{transforms}}. If you experience performance issues, start by identifying the bottleneck areas (search, indexing, processing, or storage) then review the relevant considerations in this guide to improve performance. It also helps to understand how {{transforms}} work as different considerations apply depending on whether or not your transform is running in continuous mode or in batch. In this guide, you’ll learn how to: @@ -27,17 +24,14 @@ The following considerations are not sequential – the numbers help to navigate The keywords in parenthesis at the end of each recommendation title indicates the bottleneck area that may be improved by following the given recommendation. - ## Measure {{transforms}} performance [measure-performance] In order to optimize {{transform}} performance, start by identifying the areas where most work is being done. The **Stats** interface of the **{{transforms-cap}}** page in {{kib}} contains information that covers three main areas: indexing, searching, and processing time (alternatively, you can use the [{{transforms}} stats API](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-transform-stats.html)). If, for example, the results show that the highest proportion of time is spent on search, then prioritize efforts on optimizing the search query of the {{transform}}. {{transforms-cap}} also has [Rally support](https://esrally.readthedocs.io) that makes it possible to run performance checks on {{transforms}} configurations if it is required. If you optimized the crucial factors and you still experience performance issues, you may also want to consider improving your hardware. - ## 1. Optimize `frequency` (index) [frequency] In a {{ctransform}}, the `frequency` configuration option sets the interval between checks for changes in the source indices. If changes are detected, then the source data is searched and the changes are applied to the destination index. Depending on your use case, you may wish to reduce the frequency at which changes are applied. By setting `frequency` to a higher value (maximum is one hour), the workload can be spread over time at the cost of less up-to-date data. - ## 2. Increase the number of shards of the destination index (index) [increase-shards-dest-index] Depending on the size of the destination index, you may consider increasing its shard count. {{transforms-cap}} use one shard by default when creating the destination index. To override the index settings, create the destination index before starting the {{transform}}. For more information about how the number of shards affects scalability and resilience, refer to [Get ready for production](../../deploy-manage/index.md) @@ -46,15 +40,12 @@ Depending on the size of the destination index, you may consider increasing its Use the [Preview {{transform}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/preview-transform.html) to check the settings that the {{transform}} would use to create the destination index. You can copy and adjust these in order to create the destination index prior to starting the {{transform}}. :::: - - ## 3. Profile and optimize your search queries (search) [search-queries] If you have defined a {{transform}} source index `query`, ensure it is as efficient as possible. Use the **Search Profiler** under **Dev Tools** in {{kib}} to get detailed timing information about the execution of individual components in the search request. Alternatively, you can use the [Profile](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html). The results give you insight into how search requests are executed at a low level so that you can understand why certain requests are slow, and take steps to improve them. {{transforms-cap}} execute standard {{es}} search requests. There are different ways to write {{es}} queries, and some of them are more efficient than others. Consult [*Tune for search speed*](../../deploy-manage/production-guidance/optimize-performance/search-speed.md) to learn more about {{es}} performance tuning. - ## 4. Limit the scope of the source query (search) [limit-source-query] Imagine your {{ctransform}} is configured to group by `IP` and calculate the sum of `bytes_sent`. For each checkpoint, a {{ctransform}} detects changes in the source data since the previous checkpoint, identifying the IPs for which new data has been ingested. Then it performs a second search, filtered for this group of IPs, in order to calculate the total `bytes_sent`. If this second search matches many shards, then this could be resource intensive. Consider limiting the scope that the source index pattern and query will match. @@ -72,31 +63,26 @@ Consider using [date math](https://www.elastic.co/guide/en/elasticsearch/referen }, ``` - ## 5. Optimize the sharding strategy for the source index (search) [optimize-shading-strategy] There is no one-size-fits-all sharding strategy. A strategy that works in one environment may not scale in another. A good sharding strategy must account for your infrastructure, use case, and performance expectations. Too few shards may mean that the benefits of distributing the workload cannot be realised; however too many shards may impact your cluster health. To learn more about sizing your shards, read this [guide](../../deploy-manage/production-guidance/optimize-performance/size-shards.md). - ## 6. Tune `max_page_search_size` (search) [tune-max-page-search-size] The `max_page_search_size` {{transform}} configuration option defines the number of buckets that are returned for each search request. The default value is 500. If you increase this value, you get better throughput at the cost of higher latency and memory usage. The ideal value of this parameter is highly dependent on your use case. If your {{transform}} executes memory-intensive aggregations – for example, cardinality or percentiles – then increasing `max_page_search_size` requires more available memory. If memory limits are exceeded, a circuit breaker exception occurs. - ## 7. Use indexed fields in your source indices (search) [indexed-fields-in-source] Runtime fields and scripted fields are not indexed fields; their values are only extracted or computed at search time. While these fields provide flexibility in how you access your data, they increase performance costs at search time. If {{transform}} performance using runtime fields or scripted fields is a concern, you may wish to consider using indexed fields instead. For performance reasons, we do not recommend using a runtime field as the time field that synchronizes a {{ctransform}}. - ## 8. Use index sorting (search, process) [index-sorting-group-by-ordering] Index sorting enables you to store documents on disk in a specific order which can improve query efficiency. The ideal sorting logic depends on your use case, but the rule of thumb may be to sort the fields in descending order (high to low cardinality) starting with the time-based fields. Index sorting can be defined only once at index creation. If you don’t already have index sorting on the index that you want to use as a source, consider reindexing it to a new, sorted index. - ## 9. Disable the `_source` field on the destination index (storage) [disable-source-dest] The [`_source` field](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html) contains the original JSON document body that was passed at index time. The `_source` field itself is not indexed (and thus is not searchable), but it is still stored in the index and incurs a storage overhead. Consider disabling `_source` to save storage space if you have a large destination index. Disabling `_source` is only possible during index creation. @@ -105,8 +91,6 @@ The [`_source` field](https://www.elastic.co/guide/en/elasticsearch/reference/cu When the `_source` field is disabled, a number of features are not supported. Consult [Disabling the `_source` field](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#disable-source-field) to understand the consequences before disabling it. :::: - - ## Further reading [_further_reading] * [*Tune for search speed*](../../deploy-manage/production-guidance/optimize-performance/search-speed.md) diff --git a/explore-analyze/transforms/transform-setup.md b/explore-analyze/transforms/transform-setup.md index 30d8873f11..6ddb69c9d0 100644 --- a/explore-analyze/transforms/transform-setup.md +++ b/explore-analyze/transforms/transform-setup.md @@ -4,12 +4,8 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-setup.html --- - - # Setup [transform-setup] - - ## Requirements overview [requirements-overview] To use {{transforms}}, you must have: @@ -17,11 +13,8 @@ To use {{transforms}}, you must have: * at least one [{{transform}} node](../../deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md#transform-node-role), * management features visible in the {{kib}} space, and * security privileges that: - - * grant use of {{transforms}}, and - * grant access to source and destination indices - - + * grant use of {{transforms}}, and + * grant access to source and destination indices ## Security privileges [transform-privileges] @@ -30,7 +23,6 @@ Assigning security privileges affects how users access {{transforms}}. Consider * **[{{es}} API user](#transform-es-security-privileges)**: uses an {{es}} client, cURL, or {{kib}} **{{dev-tools-app}}** to access {{transforms}} via {{es}} APIs. This scenario requires {{es}} security privileges. * **[{{kib}} user](#transform-kib-security-privileges)**: uses {{transforms}} in {{kib}}. This scenario requires {{kib}} feature privileges *and* {{es}} security privileges. - ### {{es}} API user [transform-es-security-privileges] To *manage* {{transforms}}, you must meet all of the following requirements: @@ -45,7 +37,6 @@ To view only the configuration and status of {{transforms}}, you must have: For more information about {{es}} roles and privileges, refer to [Built-in roles](../../deploy-manage/users-roles/cluster-or-deployment-auth/built-in-roles.md) and [Security privileges](../../deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md). - ### {{kib}} user [transform-kib-security-privileges] Within a {{kib}} space, for full access to {{transforms}}, you must meet all of the following requirements: @@ -70,7 +61,6 @@ Within a {{kib}} space, for read-only access to {{transforms}}, you must meet al For more information and {{kib}} security features, see [{{kib}} role management](../../deploy-manage/users-roles/cluster-or-deployment-auth/defining-roles.md) and [{{kib}} privileges](../../deploy-manage/users-roles/cluster-or-deployment-auth/kibana-privileges.md). - ## {{kib}} spaces [transform-kib-spaces] [Spaces](../../deploy-manage/manage-spaces.md) enable you to organize your source and destination indices and other saved objects in {{kib}} and to see only the objects that belong to your space. However, a {{transform}} is a long running task which is managed on cluster level and therefore not limited in scope to certain spaces. Space awareness can be implemented for a {{data-source}} under **Stack Management > Kibana** which allows privileges to the {{transform}} destination index. diff --git a/explore-analyze/transforms/transform-usage.md b/explore-analyze/transforms/transform-usage.md index d9927491d6..9d3646365a 100644 --- a/explore-analyze/transforms/transform-usage.md +++ b/explore-analyze/transforms/transform-usage.md @@ -24,5 +24,3 @@ You might want to consider using {{transforms}} instead of aggregations when: * You want to create summary tables to optimize queries. For example, if you have a high level dashboard that is accessed by a large number of users and it uses a complex aggregation over a large dataset, it may be more efficient to create a {{transform}} to cache results. Thus, each user doesn’t need to run the aggregation query. - - diff --git a/raw-migrated-files/docs-content/serverless/transforms.md b/raw-migrated-files/docs-content/serverless/transforms.md deleted file mode 100644 index 3541a5391a..0000000000 --- a/raw-migrated-files/docs-content/serverless/transforms.md +++ /dev/null @@ -1,30 +0,0 @@ -# {{transforms-app}} [transforms] - -This content applies to: [![Elasticsearch](../../../images/serverless-es-badge.svg "")](../../../solutions/search.md) [![Observability](../../../images/serverless-obs-badge.svg "")](../../../solutions/observability.md) [![Security](../../../images/serverless-sec-badge.svg "")](../../../solutions/security/elastic-security-serverless.md) - -{{transforms-cap}} enable you to convert existing {{es}} indices into summarized indices, which provide opportunities for new insights and analytics. - -For example, you can use {{transforms}} to pivot your data into entity-centric indices that summarize the behavior of users or sessions or other entities in your data. Or you can use {{transforms}} to find the latest document among all the documents that have a certain unique key. - -For more information, check out: - -* [When to use transforms](../../../explore-analyze/transforms/transform-usage.md) -* [Generating alerts for transforms](../../../explore-analyze/transforms/transform-alerts.md) -* [Transforms at scale](../../../explore-analyze/transforms/transform-scale.md) -* [How checkpoints work](../../../explore-analyze/transforms/transform-checkpoints.md) -* [Examples](../../../explore-analyze/transforms/transform-examples.md) -* [Painless examples](../../../explore-analyze/transforms/transform-painless-examples.md) -* [Troubleshooting transforms](../../../troubleshoot/elasticsearch/transform-troubleshooting.md) -* [Limitations](../../../explore-analyze/transforms/transform-limitations.md) - - -## Create and manage {{transforms}} [transforms-create-and-manage-transforms] - -In **{{project-settings}} → {{manage-app}} → {{transforms-app}}**, you can create, edit, stop, start, reset, and delete {{transforms}}: - -:::{image} ../../../images/serverless-transform-management.png -:alt: {{transforms-app}} app -:class: screenshot -::: - -When you create a {{transform}}, you must choose between two types: *pivot* and *latest*. You must also decide whether you want the {{transform}} to run once or continuously. For more information, go to [{{transforms-cap}} overview](../../../explore-analyze/transforms/transform-overview.md). diff --git a/raw-migrated-files/elasticsearch/elasticsearch-reference/data-rollup-transform.md b/raw-migrated-files/elasticsearch/elasticsearch-reference/data-rollup-transform.md deleted file mode 100644 index 3aaf02c150..0000000000 --- a/raw-migrated-files/elasticsearch/elasticsearch-reference/data-rollup-transform.md +++ /dev/null @@ -1,20 +0,0 @@ -# Roll up or transform your data [data-rollup-transform] - -{{es}} offers the following methods for manipulating your data: - -* [Rolling up your historical data](../../../manage-data/lifecycle/rollup.md) - - ::::{admonition} Deprecated in 8.11.0. - :class: warning - - Rollups will be removed in a future version. Use [downsampling](../../../manage-data/data-store/index-types/downsampling-time-series-data-stream.md) instead. - :::: - - - The {{stack}} {rollup-features} provide a means to summarize and store historical data so that it can still be used for analysis, but at a fraction of the storage cost of raw data. - -* [Transforming your data](../../../explore-analyze/transforms.md) - - {{transforms-cap}} enable you to convert existing {{es}} indices into summarized indices, which provide opportunities for new insights and analytics. - - diff --git a/raw-migrated-files/elasticsearch/elasticsearch-reference/transforms.md b/raw-migrated-files/elasticsearch/elasticsearch-reference/transforms.md deleted file mode 100644 index e68f7a855b..0000000000 --- a/raw-migrated-files/elasticsearch/elasticsearch-reference/transforms.md +++ /dev/null @@ -1,28 +0,0 @@ -# Transforming data [transforms] - -{{transforms-cap}} enable you to convert existing {{es}} indices into summarized indices, which provide opportunities for new insights and analytics. For example, you can use {{transforms}} to pivot your data into entity-centric indices that summarize the behavior of users or sessions or other entities in your data. Or you can use {{transforms}} to find the latest document among all the documents that have a certain unique key. - -* [Overview](../../../explore-analyze/transforms/transform-overview.md) -* [Setup](../../../explore-analyze/transforms/transform-setup.md) -* [When to use {{transforms}}](../../../explore-analyze/transforms/transform-usage.md) -* [Generating alerts for {{transforms}}](../../../explore-analyze/transforms/transform-alerts.md) -* [{{transforms-cap}} at scale](../../../explore-analyze/transforms/transform-scale.md) -* [How checkpoints work](../../../explore-analyze/transforms/transform-checkpoints.md) -* [API quick reference](../../../explore-analyze/transforms/transform-api-quickref.md) -* [Tutorial: Transforming the eCommerce sample data](../../../explore-analyze/transforms/ecommerce-transforms.md) -* [Examples](../../../explore-analyze/transforms/transform-examples.md) -* [Painless examples](../../../explore-analyze/transforms/transform-painless-examples.md) -* [*Troubleshooting {{transforms}}*](../../../troubleshoot/elasticsearch/transform-troubleshooting.md) -* [Limitations](../../../explore-analyze/transforms/transform-limitations.md) - - - - - - - - - - - - diff --git a/raw-migrated-files/toc.yml b/raw-migrated-files/toc.yml index 0393e2ae7b..0dad9a8002 100644 --- a/raw-migrated-files/toc.yml +++ b/raw-migrated-files/toc.yml @@ -569,7 +569,6 @@ toc: - file: docs-content/serverless/slo-troubleshoot-slos.md - file: docs-content/serverless/spaces.md - file: docs-content/serverless/tags.md - - file: docs-content/serverless/transforms.md - file: docs-content/serverless/what-is-observability-serverless.md - file: elasticsearch-hadoop/elasticsearch-hadoop/index.md children: @@ -595,7 +594,6 @@ toc: - file: elasticsearch/elasticsearch-reference/change-passwords-native-users.md - file: elasticsearch/elasticsearch-reference/configuring-stack-security.md - file: elasticsearch/elasticsearch-reference/data-management.md - - file: elasticsearch/elasticsearch-reference/data-rollup-transform.md - file: elasticsearch/elasticsearch-reference/data-streams.md - file: elasticsearch/elasticsearch-reference/data-tiers.md - file: elasticsearch/elasticsearch-reference/defining-roles.md @@ -658,7 +656,6 @@ toc: - file: elasticsearch/elasticsearch-reference/snapshots-restore-snapshot.md - file: elasticsearch/elasticsearch-reference/starting-elasticsearch.md - file: elasticsearch/elasticsearch-reference/stopping-elasticsearch.md - - file: elasticsearch/elasticsearch-reference/transforms.md - file: elasticsearch/elasticsearch-reference/xpack-alerting.md - file: elasticsearch/elasticsearch-reference/xpack-autoscaling.md - file: elasticsearch/elasticsearch-reference/xpack-rollup.md