Skip to content

Commit bc171aa

Browse files
authored
[E&A] Refines transforms section (#351)
* [E&A] Refines transforms section. * [E&A] Reviews all pages in transforms section.
1 parent 29e0374 commit bc171aa

File tree

16 files changed

+122
-338
lines changed

16 files changed

+122
-338
lines changed

explore-analyze/transforms.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,19 @@ mapped_urls:
44
- https://www.elastic.co/guide/en/serverless/current/transforms.html
55
- https://www.elastic.co/guide/en/elasticsearch/reference/current/data-rollup-transform.html
66
---
7+
# Transforming data [transforms]
78

8-
# Transforms
9+
{{transforms-cap}} enable you to convert existing {{es}} indices into summarized indices, which provide opportunities for new insights and analytics. For example, you can use {{transforms}} to pivot your data into entity-centric indices that summarize the behavior of users or sessions or other entities in your data. Or you can use {{transforms}} to find the latest document among all the documents that have a certain unique key.
910

10-
% What needs to be done: Align serverless/stateful
11-
12-
% Scope notes: views in last 6 months: ~90/week
13-
14-
% Use migrated content from existing pages that map to this page:
15-
16-
% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/transforms.md
17-
% - [ ] ./raw-migrated-files/docs-content/serverless/transforms.md
18-
% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/data-rollup-transform.md
11+
* [Overview](transforms/transform-overview.md)
12+
* [Setup](transforms/transform-setup.md)
13+
* [When to use {{transforms}}](transforms/transform-usage.md)
14+
* [Generating alerts for {{transforms}}](transforms/transform-alerts.md)
15+
* [{{transforms-cap}} at scale](transforms/transform-scale.md)
16+
* [How checkpoints work](transforms/transform-checkpoints.md)
17+
* [API quick reference](transforms/transform-api-quickref.md)
18+
* [Tutorial: Transforming the eCommerce sample data](transforms/ecommerce-transforms.md)
19+
* [Examples](transforms/transform-examples.md)
20+
* [Painless examples](transforms/transform-painless-examples.md)
21+
* [Troubleshooting {{transforms}}](../troubleshoot/elasticsearch/transform-troubleshooting.md)
22+
* [Limitations](transforms/transform-limitations.md)

explore-analyze/transforms/ecommerce-transforms.md

Lines changed: 94 additions & 113 deletions
Large diffs are not rendered by default.

explore-analyze/transforms/transform-alerts.md

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,10 @@ You can create {{transform}} rules under **{{stack-manage-app}} > {{rules-ui}}**
1414
1. Click **Create rule** and select the {{transform}} health rule type.
1515
2. Give a name to the rule and optionally provide tags.
1616
3. Select the {{transform}} or {{transforms}} to include. You can also use a special character (`*`) to apply the rule to all your {{transforms}}. {{transforms-cap}} created after the rule are automatically included.
17-
18-
:::{image} ../../images/elasticsearch-reference-transform-check-config.png
19-
:alt: Selecting health check
20-
:class: screenshot
21-
:::
17+
:::{image} ../../images/elasticsearch-reference-transform-check-config.png
18+
:alt: Selecting health check
19+
:class: screenshot
20+
:::
2221

2322
4. The following health checks are available and enabled by default:
2423

@@ -33,7 +32,6 @@ You can create {{transform}} rules under **{{stack-manage-app}} > {{rules-ui}}**
3332

3433
As the last step in the rule creation process, define its actions.
3534

36-
3735
## Defining actions [defining-actions]
3836

3937
You can add one or more actions to your rule to generate notifications when its conditions are met and when they are no longer met. In particular, this rule type supports:
@@ -55,7 +53,6 @@ After you select a connector, you must set the action frequency. You can choose
5553
If you choose a custom action interval, it cannot be shorter than the rule’s check interval.
5654
::::
5755

58-
5956
Alternatively, you can set the action frequency such that actions run for each alert. Choose how often the action runs (at each check interval, only when the alert status changes, or at a custom action interval). You must also choose an action group, which indicates whether the action runs when the issue is detected or when it is recovered.
6057

6158
You can further refine the conditions under which actions run by specifying that actions only run when they match a KQL query or when an alert occurs within a specific time frame.
@@ -71,7 +68,6 @@ After you save the configurations, the rule appears in the **{{rules-ui}}** list
7168

7269
The name of an alert is always the same as the {{transform}} ID of the associated {{transform}} that triggered it. You can mute the notifications for a particular {{transform}} on the page of the rule that lists the individual alerts. You can open it via **{{rules-ui}}** by selecting the rule name.
7370

74-
7571
## Action variables [transform-action-variables]
7672

7773
The following variables are specific to the {{transform}} health rule type. You can also specify [variables common to all rules](../alerts/kibana/rule-action-variables.md).
@@ -103,5 +99,4 @@ The following variables are specific to the {{transform}} health rule type. You
10399
{{/context.results}}
104100
```
105101

106-
107102
For more examples, refer to [Rule action variables](../alerts/kibana/rule-action-variables.md).

explore-analyze/transforms/transform-api-quickref.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,3 @@ _transform/
2323
* [Update {{transforms}}](https://www.elastic.co/guide/en/elasticsearch/reference/current/update-transform.html)
2424

2525
For the full list, see [*{{transform-cap}} APIs*](https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-apis.html).
26-

explore-analyze/transforms/transform-checkpoints.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,8 @@ mapped_pages:
44
- https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-checkpoints.html
55
---
66

7-
8-
97
# How checkpoints work [transform-checkpoints]
108

11-
129
Each time a {{transform}} examines the source indices and creates or updates the destination index, it generates a *checkpoint*.
1310

1411
If your {{transform}} runs only once, there is logically only one checkpoint. If your {{transform}} runs continuously, however, it creates checkpoints as it ingests and transforms new source data. The `sync` property of the {{transform}} configures checkpointing by specifying a time field.
@@ -31,15 +28,12 @@ To create a checkpoint, the {{ctransform}}:
3128

3229
The {{transform}} applies changes related to either new or changed entities or time buckets to the destination index. The set of changes can be paginated. The {{transform}} performs a composite aggregation similarly to the batch {{transform}} operation, however it also injects query filters based on the previous step to reduce the amount of work. After all changes have been applied, the checkpoint is complete.
3330

34-
3531
This checkpoint process involves both search and indexing activity on the cluster. We have attempted to favor control over performance while developing {{transforms}}. We decided it was preferable for the {{transform}} to take longer to complete, rather than to finish quickly and take precedence in resource consumption. That being said, the cluster still requires enough resources to support both the composite aggregation search and the indexing of its results.
3632

3733
::::{tip}
3834
If the cluster experiences unsuitable performance degradation due to the {{transform}}, stop the {{transform}} and refer to [Performance considerations](transform-overview.md#transform-performance).
3935
::::
4036

41-
42-
4337
## Using the ingest timestamp for syncing the {{transform}} [sync-field-ingest-timestamp]
4438

4539
In most cases, it is strongly recommended to use the ingest timestamp of the source indices for syncing the {{transform}}. This is the most optimal way for {{transforms}} to be able to identify new changes. If your data source follows the [ECS standard](https://www.elastic.co/guide/en/ecs/{{ecs_version}}/ecs-reference.html), you might already have an [`event.ingested`](https://www.elastic.co/guide/en/ecs/{{ecs_version}}/ecs-event.html#field-event-ingested) field. In this case, use `event.ingested` as the `sync`.`time`.`field` property of your {{transform}}.
@@ -65,7 +59,6 @@ After you created the ingest pipeline, apply it to the source indices of your {{
6559

6660
Refer to [Add a pipeline to an indexing request](../../manage-data/ingest/transform-enrich/ingest-pipelines.md#add-pipeline-to-indexing-request) and [Ingest pipelines](../../manage-data/ingest/transform-enrich/ingest-pipelines.md) to learn more about how to use an ingest pipeline.
6761

68-
6962
## Change detection heuristics [ml-transform-checkpoint-heuristics]
7063

7164
When the {{transform}} runs in continuous mode, it updates the documents in the destination index as new data comes in. The {{transform}} uses a set of heuristics called change detection to update the destination index with fewer operations.
@@ -74,7 +67,6 @@ In this example, the data is grouped by host names. Change detection detects whi
7467

7568
Another heuristic can be applied for time buckets when a `date_histogram` is used to group by time buckets. Change detection detects which time buckets have changed and only update those.
7669

77-
7870
## Error handling [ml-transform-checkpoint-errors]
7971

8072
Failures in {{transforms}} tend to be related to searching or indexing. To increase the resiliency of {{transforms}}, the cursor positions of the aggregated search and the changed entities search are tracked in memory and persisted periodically.

explore-analyze/transforms/transform-examples.md

Lines changed: 0 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,8 @@ mapped_pages:
44
- https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-examples.html
55
---
66

7-
8-
97
# Examples [transform-examples]
108

11-
129
These examples demonstrate how to use {{transforms}} to derive useful insights from your data. All the examples use one of the [{{kib}} sample datasets](https://www.elastic.co/guide/en/kibana/current/add-sample-data.html). For a more detailed, step-by-step example, see [Tutorial: Transforming the eCommerce sample data](ecommerce-transforms.md).
1310

1411
* [Finding your best customers](#example-best-customers)
@@ -58,12 +55,10 @@ POST _transform/_preview
5855
1. The destination index for the {{transform}}. It is ignored by `_preview`.
5956
2. Two `group_by` fields is selected. This means the {{transform}} contains a unique row per `user` and `customer_id` combination. Within this data set, both these fields are unique. By including both in the {{transform}}, it gives more context to the final results.
6057

61-
6258
::::{note}
6359
In the example above, condensed JSON formatting is used for easier readability of the pivot object.
6460
::::
6561

66-
6762
The preview {{transforms}} API enables you to see the layout of the {{transform}} in advance, populated with some sample values. For example:
6863

6964
```js
@@ -85,7 +80,6 @@ The preview {{transforms}} API enables you to see the layout of the {{transform}
8580

8681
:::::
8782

88-
8983
This {{transform}} makes it easier to answer questions such as:
9084

9185
* Which customers spend the most?
@@ -95,7 +89,6 @@ This {{transform}} makes it easier to answer questions such as:
9589

9690
It’s possible to answer these questions using aggregations alone, however {{transforms}} allow us to persist this data as a customer centric index. This enables us to analyze data at scale and gives more flexibility to explore and navigate data from a customer centric perspective. In some cases, it can even make creating visualizations much simpler.
9791

98-
9992
## Finding air carriers with the most delays [example-airline]
10093

10194
This example uses the Flights sample data set to find out which air carrier had the most delays. First, filter the source data such that it excludes all the cancelled flights by using a query filter. Then transform the data to contain the distinct number of flights, the sum of delayed minutes, and the sum of the flight minutes by air carrier. Finally, use a [`bucket_script`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html) to determine what percentage of the flight time was actually delay.
@@ -143,7 +136,6 @@ POST _transform/_preview
143136
3. The data is grouped by the `Carrier` field which contains the airline name.
144137
4. This `bucket_script` performs calculations on the results that are returned by the aggregation. In this particular example, it calculates what percentage of travel time was taken up by delays.
145138

146-
147139
The preview shows you that the new index would contain data like this for each carrier:
148140

149141
```js
@@ -169,8 +161,6 @@ This {{transform}} makes it easier to answer questions such as:
169161
This data is fictional and does not reflect actual delays or flight stats for any of the featured destination or origin airports.
170162
::::
171163

172-
173-
174164
## Finding suspicious client IPs [example-clientips]
175165

176166
This example uses the web log sample data set to identify suspicious client IPs. It transforms the data such that the new index contains the sum of bytes and the number of distinct URLs, agents, incoming requests by location, and geographic destinations for each client IP. It also uses filter aggregations to count the specific types of HTTP responses that each client IP receives. Ultimately, the example below transforms web log data into an entity centric index where the entity is `clientip`.
@@ -235,7 +225,6 @@ PUT _transform/suspicious_client_ips
235225
4. Filter aggregation that counts the occurrences of successful (`200`) responses in the `response` field. The following two aggregations (`error404` and `error5xx`) count the error responses by error codes, matching an exact value or a range of response codes.
236226
5. This `bucket_script` calculates the duration of the `clientip` access based on the results of the aggregation.
237227

238-
239228
After you create the {{transform}}, you must start it:
240229

241230
```console
@@ -285,15 +274,13 @@ The search result shows you data like this for each client IP:
285274
Like other Kibana sample data sets, the web log sample dataset contains timestamps relative to when you installed it, including timestamps in the future. The {{ctransform}} will pick up the data points once they are in the past. If you installed the web log sample dataset some time ago, you can uninstall and reinstall it and the timestamps will change.
286275
::::
287276

288-
289277
This {{transform}} makes it easier to answer questions such as:
290278

291279
* Which client IPs are transferring the most amounts of data?
292280
* Which client IPs are interacting with a high number of different URLs?
293281
* Which client IPs have high error rates?
294282
* Which client IPs are interacting with a high number of destination countries?
295283

296-
297284
## Finding the last log event for each IP address [example-last-log]
298285

299286
This example uses the web log sample data set to find the last log from an IP address. Let’s use the `latest` type of {{transform}} in continuous mode. It copies the most recent document for each unique key from the source index to the destination index and updates the destination index as new data comes into the source index.
@@ -357,7 +344,6 @@ PUT _transform/last-log-from-clientip
357344
4. Contains the time field and delay settings used to synchronize the source and destination indices.
358345
5. Specifies the retention policy for the transform. Documents that are older than the configured value will be removed from the destination index.
359346

360-
361347
After you create the {{transform}}, start it:
362348

363349
```console
@@ -366,7 +352,6 @@ POST _transform/last-log-from-clientip/_start
366352

367353
::::
368354

369-
370355
After the {{transform}} processes the data, search the destination index:
371356

372357
```console
@@ -425,7 +410,6 @@ This {{transform}} makes it easier to answer questions such as:
425410

426411
* What was the most recent log event associated with a specific IP address?
427412

428-
429413
## Finding client IPs that sent the most bytes to the server [example-bytes]
430414

431415
This example uses the web log sample data set to find the client IP that sent the most bytes to the server in every hour. The example uses a `pivot` {{transform}} with a [`top_metrics`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-metrics.html) aggregation.
@@ -477,7 +461,6 @@ POST _transform/_preview
477461
2. Calculates the maximum value of the `bytes` field.
478462
3. Specifies the fields (`clientip` and `geo.src`) of the top document to return and the sorting method (document with the highest `bytes` value).
479463

480-
481464
The API call above returns a response similar to this:
482465

483466
```js
@@ -518,7 +501,6 @@ The API call above returns a response similar to this:
518501
}
519502
```
520503

521-
522504
## Getting customer name and email address by customer ID [example-customer-names]
523505

524506
This example uses the ecommerce sample data set to create an entity-centric index based on customer ID, and to get the customer name and email address by using the `top_metrics` aggregation.
@@ -566,7 +548,6 @@ POST _transform/_preview
566548
1. The data is grouped by a `terms` aggregation on the `customer_id` field.
567549
2. Specifies the fields to return (email and name fields) in a descending order by the order date.
568550

569-
570551
The API returns a response that is similar to this:
571552

572553
```js
@@ -600,5 +581,3 @@ The API returns a response that is similar to this:
600581
]
601582
}
602583
```
603-
604-

0 commit comments

Comments
 (0)