Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 14 additions & 10 deletions explore-analyze/transforms.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,19 @@ mapped_urls:
- https://www.elastic.co/guide/en/serverless/current/transforms.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/data-rollup-transform.html
---
# Transforming data [transforms]

# Transforms
{{transforms-cap}} enable you to convert existing {{es}} indices into summarized indices, which provide opportunities for new insights and analytics. For example, you can use {{transforms}} to pivot your data into entity-centric indices that summarize the behavior of users or sessions or other entities in your data. Or you can use {{transforms}} to find the latest document among all the documents that have a certain unique key.

% What needs to be done: Align serverless/stateful

% Scope notes: views in last 6 months: ~90/week

% Use migrated content from existing pages that map to this page:

% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/transforms.md
% - [ ] ./raw-migrated-files/docs-content/serverless/transforms.md
% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/data-rollup-transform.md
* [Overview](transforms/transform-overview.md)
* [Setup](transforms/transform-setup.md)
* [When to use {{transforms}}](transforms/transform-usage.md)
* [Generating alerts for {{transforms}}](transforms/transform-alerts.md)
* [{{transforms-cap}} at scale](transforms/transform-scale.md)
* [How checkpoints work](transforms/transform-checkpoints.md)
* [API quick reference](transforms/transform-api-quickref.md)
* [Tutorial: Transforming the eCommerce sample data](transforms/ecommerce-transforms.md)
* [Examples](transforms/transform-examples.md)
* [Painless examples](transforms/transform-painless-examples.md)
* [Troubleshooting {{transforms}}](../troubleshoot/elasticsearch/transform-troubleshooting.md)
* [Limitations](transforms/transform-limitations.md)
207 changes: 94 additions & 113 deletions explore-analyze/transforms/ecommerce-transforms.md

Large diffs are not rendered by default.

13 changes: 4 additions & 9 deletions explore-analyze/transforms/transform-alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,10 @@ You can create {{transform}} rules under **{{stack-manage-app}} > {{rules-ui}}**
1. Click **Create rule** and select the {{transform}} health rule type.
2. Give a name to the rule and optionally provide tags.
3. Select the {{transform}} or {{transforms}} to include. You can also use a special character (`*`) to apply the rule to all your {{transforms}}. {{transforms-cap}} created after the rule are automatically included.

:::{image} ../../images/elasticsearch-reference-transform-check-config.png
:alt: Selecting health check
:class: screenshot
:::
:::{image} ../../images/elasticsearch-reference-transform-check-config.png
:alt: Selecting health check
:class: screenshot
:::

4. The following health checks are available and enabled by default:

Expand All @@ -33,7 +32,6 @@ You can create {{transform}} rules under **{{stack-manage-app}} > {{rules-ui}}**

As the last step in the rule creation process, define its actions.


## Defining actions [defining-actions]

You can add one or more actions to your rule to generate notifications when its conditions are met and when they are no longer met. In particular, this rule type supports:
Expand All @@ -55,7 +53,6 @@ After you select a connector, you must set the action frequency. You can choose
If you choose a custom action interval, it cannot be shorter than the rule’s check interval.
::::


Alternatively, you can set the action frequency such that actions run for each alert. Choose how often the action runs (at each check interval, only when the alert status changes, or at a custom action interval). You must also choose an action group, which indicates whether the action runs when the issue is detected or when it is recovered.

You can further refine the conditions under which actions run by specifying that actions only run when they match a KQL query or when an alert occurs within a specific time frame.
Expand All @@ -71,7 +68,6 @@ After you save the configurations, the rule appears in the **{{rules-ui}}** list

The name of an alert is always the same as the {{transform}} ID of the associated {{transform}} that triggered it. You can mute the notifications for a particular {{transform}} on the page of the rule that lists the individual alerts. You can open it via **{{rules-ui}}** by selecting the rule name.


## Action variables [transform-action-variables]

The following variables are specific to the {{transform}} health rule type. You can also specify [variables common to all rules](../alerts/kibana/rule-action-variables.md).
Expand Down Expand Up @@ -103,5 +99,4 @@ The following variables are specific to the {{transform}} health rule type. You
{{/context.results}}
```


For more examples, refer to [Rule action variables](../alerts/kibana/rule-action-variables.md).
1 change: 0 additions & 1 deletion explore-analyze/transforms/transform-api-quickref.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,3 @@ _transform/
* [Update {{transforms}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/update-transform.html)

For the full list, see [*{{transform-cap}} APIs*](https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-apis.html).

8 changes: 0 additions & 8 deletions explore-analyze/transforms/transform-checkpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,8 @@ mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-checkpoints.html
---



# How checkpoints work [transform-checkpoints]


Each time a {{transform}} examines the source indices and creates or updates the destination index, it generates a *checkpoint*.

If your {{transform}} runs only once, there is logically only one checkpoint. If your {{transform}} runs continuously, however, it creates checkpoints as it ingests and transforms new source data. The `sync` property of the {{transform}} configures checkpointing by specifying a time field.
Expand All @@ -31,15 +28,12 @@ To create a checkpoint, the {{ctransform}}:

The {{transform}} applies changes related to either new or changed entities or time buckets to the destination index. The set of changes can be paginated. The {{transform}} performs a composite aggregation similarly to the batch {{transform}} operation, however it also injects query filters based on the previous step to reduce the amount of work. After all changes have been applied, the checkpoint is complete.


This checkpoint process involves both search and indexing activity on the cluster. We have attempted to favor control over performance while developing {{transforms}}. We decided it was preferable for the {{transform}} to take longer to complete, rather than to finish quickly and take precedence in resource consumption. That being said, the cluster still requires enough resources to support both the composite aggregation search and the indexing of its results.

::::{tip}
If the cluster experiences unsuitable performance degradation due to the {{transform}}, stop the {{transform}} and refer to [Performance considerations](transform-overview.md#transform-performance).
::::



## Using the ingest timestamp for syncing the {{transform}} [sync-field-ingest-timestamp]

In most cases, it is strongly recommended to use the ingest timestamp of the source indices for syncing the {{transform}}. This is the most optimal way for {{transforms}} to be able to identify new changes. If your data source follows the [ECS standard](https://www.elastic.co/guide/en/ecs/{{ecs_version}}/ecs-reference.html), you might already have an [`event.ingested`](https://www.elastic.co/guide/en/ecs/{{ecs_version}}/ecs-event.html#field-event-ingested) field. In this case, use `event.ingested` as the `sync`.`time`.`field` property of your {{transform}}.
Expand All @@ -65,7 +59,6 @@ After you created the ingest pipeline, apply it to the source indices of your {{

Refer to [Add a pipeline to an indexing request](../../manage-data/ingest/transform-enrich/ingest-pipelines.md#add-pipeline-to-indexing-request) and [Ingest pipelines](../../manage-data/ingest/transform-enrich/ingest-pipelines.md) to learn more about how to use an ingest pipeline.


## Change detection heuristics [ml-transform-checkpoint-heuristics]

When the {{transform}} runs in continuous mode, it updates the documents in the destination index as new data comes in. The {{transform}} uses a set of heuristics called change detection to update the destination index with fewer operations.
Expand All @@ -74,7 +67,6 @@ In this example, the data is grouped by host names. Change detection detects whi

Another heuristic can be applied for time buckets when a `date_histogram` is used to group by time buckets. Change detection detects which time buckets have changed and only update those.


## Error handling [ml-transform-checkpoint-errors]

Failures in {{transforms}} tend to be related to searching or indexing. To increase the resiliency of {{transforms}}, the cursor positions of the aggregated search and the changed entities search are tracked in memory and persisted periodically.
Expand Down
21 changes: 0 additions & 21 deletions explore-analyze/transforms/transform-examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,8 @@ mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-examples.html
---



# Examples [transform-examples]


These examples demonstrate how to use {{transforms}} to derive useful insights from your data. All the examples use one of the [{{kib}} sample datasets](https://www.elastic.co/guide/en/kibana/current/add-sample-data.html). For a more detailed, step-by-step example, see [Tutorial: Transforming the eCommerce sample data](ecommerce-transforms.md).

* [Finding your best customers](#example-best-customers)
Expand Down Expand Up @@ -58,12 +55,10 @@ POST _transform/_preview
1. The destination index for the {{transform}}. It is ignored by `_preview`.
2. Two `group_by` fields is selected. This means the {{transform}} contains a unique row per `user` and `customer_id` combination. Within this data set, both these fields are unique. By including both in the {{transform}}, it gives more context to the final results.


::::{note}
In the example above, condensed JSON formatting is used for easier readability of the pivot object.
::::


The preview {{transforms}} API enables you to see the layout of the {{transform}} in advance, populated with some sample values. For example:

```js
Expand All @@ -85,7 +80,6 @@ The preview {{transforms}} API enables you to see the layout of the {{transform}

:::::


This {{transform}} makes it easier to answer questions such as:

* Which customers spend the most?
Expand All @@ -95,7 +89,6 @@ This {{transform}} makes it easier to answer questions such as:

It’s possible to answer these questions using aggregations alone, however {{transforms}} allow us to persist this data as a customer centric index. This enables us to analyze data at scale and gives more flexibility to explore and navigate data from a customer centric perspective. In some cases, it can even make creating visualizations much simpler.


## Finding air carriers with the most delays [example-airline]

This example uses the Flights sample data set to find out which air carrier had the most delays. First, filter the source data such that it excludes all the cancelled flights by using a query filter. Then transform the data to contain the distinct number of flights, the sum of delayed minutes, and the sum of the flight minutes by air carrier. Finally, use a [`bucket_script`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html) to determine what percentage of the flight time was actually delay.
Expand Down Expand Up @@ -143,7 +136,6 @@ POST _transform/_preview
3. The data is grouped by the `Carrier` field which contains the airline name.
4. This `bucket_script` performs calculations on the results that are returned by the aggregation. In this particular example, it calculates what percentage of travel time was taken up by delays.


The preview shows you that the new index would contain data like this for each carrier:

```js
Expand All @@ -169,8 +161,6 @@ This {{transform}} makes it easier to answer questions such as:
This data is fictional and does not reflect actual delays or flight stats for any of the featured destination or origin airports.
::::



## Finding suspicious client IPs [example-clientips]

This example uses the web log sample data set to identify suspicious client IPs. It transforms the data such that the new index contains the sum of bytes and the number of distinct URLs, agents, incoming requests by location, and geographic destinations for each client IP. It also uses filter aggregations to count the specific types of HTTP responses that each client IP receives. Ultimately, the example below transforms web log data into an entity centric index where the entity is `clientip`.
Expand Down Expand Up @@ -235,7 +225,6 @@ PUT _transform/suspicious_client_ips
4. Filter aggregation that counts the occurrences of successful (`200`) responses in the `response` field. The following two aggregations (`error404` and `error5xx`) count the error responses by error codes, matching an exact value or a range of response codes.
5. This `bucket_script` calculates the duration of the `clientip` access based on the results of the aggregation.


After you create the {{transform}}, you must start it:

```console
Expand Down Expand Up @@ -285,15 +274,13 @@ The search result shows you data like this for each client IP:
Like other Kibana sample data sets, the web log sample dataset contains timestamps relative to when you installed it, including timestamps in the future. The {{ctransform}} will pick up the data points once they are in the past. If you installed the web log sample dataset some time ago, you can uninstall and reinstall it and the timestamps will change.
::::


This {{transform}} makes it easier to answer questions such as:

* Which client IPs are transferring the most amounts of data?
* Which client IPs are interacting with a high number of different URLs?
* Which client IPs have high error rates?
* Which client IPs are interacting with a high number of destination countries?


## Finding the last log event for each IP address [example-last-log]

This example uses the web log sample data set to find the last log from an IP address. Let’s use the `latest` type of {{transform}} in continuous mode. It copies the most recent document for each unique key from the source index to the destination index and updates the destination index as new data comes into the source index.
Expand Down Expand Up @@ -357,7 +344,6 @@ PUT _transform/last-log-from-clientip
4. Contains the time field and delay settings used to synchronize the source and destination indices.
5. Specifies the retention policy for the transform. Documents that are older than the configured value will be removed from the destination index.


After you create the {{transform}}, start it:

```console
Expand All @@ -366,7 +352,6 @@ POST _transform/last-log-from-clientip/_start

::::


After the {{transform}} processes the data, search the destination index:

```console
Expand Down Expand Up @@ -425,7 +410,6 @@ This {{transform}} makes it easier to answer questions such as:

* What was the most recent log event associated with a specific IP address?


## Finding client IPs that sent the most bytes to the server [example-bytes]

This example uses the web log sample data set to find the client IP that sent the most bytes to the server in every hour. The example uses a `pivot` {{transform}} with a [`top_metrics`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-metrics.html) aggregation.
Expand Down Expand Up @@ -477,7 +461,6 @@ POST _transform/_preview
2. Calculates the maximum value of the `bytes` field.
3. Specifies the fields (`clientip` and `geo.src`) of the top document to return and the sorting method (document with the highest `bytes` value).


The API call above returns a response similar to this:

```js
Expand Down Expand Up @@ -518,7 +501,6 @@ The API call above returns a response similar to this:
}
```


## Getting customer name and email address by customer ID [example-customer-names]

This example uses the ecommerce sample data set to create an entity-centric index based on customer ID, and to get the customer name and email address by using the `top_metrics` aggregation.
Expand Down Expand Up @@ -566,7 +548,6 @@ POST _transform/_preview
1. The data is grouped by a `terms` aggregation on the `customer_id` field.
2. Specifies the fields to return (email and name fields) in a descending order by the order date.


The API returns a response that is similar to this:

```js
Expand Down Expand Up @@ -600,5 +581,3 @@ The API returns a response that is similar to this:
]
}
```


Loading
Loading