Skip to content

Commit 004da8c

Browse files
authored
[DOCS] Expands transforms docs with persistent tasks and related links. (#68582) (#68637)
1 parent 2774416 commit 004da8c

File tree

2 files changed

+39
-37
lines changed

2 files changed

+39
-37
lines changed

docs/reference/transform/checkpoints.asciidoc

Lines changed: 33 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -5,48 +5,46 @@
55
<titleabbrev>How checkpoints work</titleabbrev>
66
++++
77

8-
Each time a {transform} examines the source indices and creates or
9-
updates the destination index, it generates a _checkpoint_.
8+
Each time a {transform} examines the source indices and creates or updates the
9+
destination index, it generates a _checkpoint_.
1010

11-
If your {transform} runs only once, there is logically only one
12-
checkpoint. If your {transform} runs continuously, however, it creates
13-
checkpoints as it ingests and transforms new source data.
11+
If your {transform} runs only once, there is logically only one checkpoint. If
12+
your {transform} runs continuously, however, it creates checkpoints as it
13+
ingests and transforms new source data.
1414

1515
To create a checkpoint, the {ctransform}:
1616

1717
. Checks for changes to source indices.
1818
+
19-
Using a simple periodic timer, the {transform} checks for changes to
20-
the source indices. This check is done based on the interval defined in the
21-
transform's `frequency` property.
19+
Using a simple periodic timer, the {transform} checks for changes to the source
20+
indices. This check is done based on the interval defined in the transform's
21+
`frequency` property.
2222
+
2323
If the source indices remain unchanged or if a checkpoint is already in progress
2424
then it waits for the next timer.
2525

2626
. Identifies which entities have changed.
2727
+
28-
The {transform} searches to see which entities have changed since the
29-
last time it checked. The `sync` configuration object in the {transform}
30-
identifies a time field in the source indices. The {transform} uses the values
31-
in that field to synchronize the source and destination indices.
28+
The {transform} searches to see which entities have changed since the last time
29+
it checked. The `sync` configuration object in the {transform} identifies a time
30+
field in the source indices. The {transform} uses the values in that field to
31+
synchronize the source and destination indices.
3232

3333
. Updates the destination index (the {dataframe}) with the changed entities.
3434
+
3535
--
36-
The {transform} applies changes related to either new or changed
37-
entities to the destination index. The set of changed entities is paginated. For
38-
each page, the {transform} performs a composite aggregation using a
39-
`terms` query. After all the pages of changes have been applied, the checkpoint
40-
is complete.
36+
The {transform} applies changes related to either new or changed entities to the
37+
destination index. The set of changed entities is paginated. For each page, the
38+
{transform} performs a composite aggregation using a `terms` query. After all
39+
the pages of changes have been applied, the checkpoint is complete.
4140
--
4241

4342
This checkpoint process involves both search and indexing activity on the
4443
cluster. We have attempted to favor control over performance while developing
45-
{transforms}. We decided it was preferable for the
46-
{transform} to take longer to complete, rather than to finish quickly
47-
and take precedence in resource consumption. That being said, the cluster still
48-
requires enough resources to support both the composite aggregation search and
49-
the indexing of its results.
44+
{transforms}. We decided it was preferable for the {transform} to take longer to
45+
complete, rather than to finish quickly and take precedence in resource
46+
consumption. That being said, the cluster still requires enough resources to
47+
support both the composite aggregation search and the indexing of its results.
5048

5149
TIP: If the cluster experiences unsuitable performance degradation due to the
5250
{transform}, stop the {transform} and refer to <<transform-performance>>.
@@ -63,20 +61,18 @@ persisted periodically.
6361
Checkpoint failures can be categorized as follows:
6462

6563
* Temporary failures: The checkpoint is retried. If 10 consecutive failures
66-
occur, the {transform} has a failed status. For example, this
67-
situation might occur when there are shard failures and queries return only
68-
partial results.
69-
* Irrecoverable failures: The {transform} immediately fails. For
70-
example, this situation occurs when the source index is not found.
71-
* Adjustment failures: The {transform} retries with adjusted settings.
72-
For example, if a parent circuit breaker memory errors occur during the
73-
composite aggregation, the {transform} receives partial results. The aggregated
74-
search is retried with a smaller number of buckets. This retry is performed at
75-
the interval defined in the `frequency` property for the {transform}. If the
76-
search is retried to the point where it reaches a minimal number of buckets, an
64+
occur, the {transform} has a failed status. For example, this situation might
65+
occur when there are shard failures and queries return only partial results.
66+
* Irrecoverable failures: The {transform} immediately fails. For example, this
67+
situation occurs when the source index is not found.
68+
* Adjustment failures: The {transform} retries with adjusted settings. For
69+
example, if a parent circuit breaker memory errors occur during the composite
70+
aggregation, the {transform} receives partial results. The aggregated search is
71+
retried with a smaller number of buckets. This retry is performed at the
72+
interval defined in the `frequency` property for the {transform}. If the search
73+
is retried to the point where it reaches a minimal number of buckets, an
7774
irrecoverable failure occurs.
7875

79-
If the node running the {transforms} fails, the {transform} restarts
80-
from the most recent persisted cursor position. This recovery process might
81-
repeat some of the work the {transform} had already done, but it ensures data
82-
consistency.
76+
If the node running the {transforms} fails, the {transform} restarts from the
77+
most recent persisted cursor position. This recovery process might repeat some
78+
of the work the {transform} had already done, but it ensures data consistency.

docs/reference/transform/overview.asciidoc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,12 @@ You can choose either of the following methods to transform your data:
1111
IMPORTANT: All {transforms} leave your source index intact. They create a new
1212
index that is dedicated to the transformed data.
1313

14+
{transforms-cap} are persistent tasks; they are stored in cluster state which
15+
makes them resilient for node failures. Refer to <<transform-checkpoints>> and
16+
<<ml-transform-checkpoint-errors>> to learn more about the machinery behind
17+
{transforms}.
18+
19+
1420
[[pivot-transform-overview]]
1521
== Pivot {transforms}
1622

0 commit comments

Comments
 (0)