5
5
<titleabbrev>How checkpoints work</titleabbrev>
6
6
++++
7
7
8
- Each time a {transform} examines the source indices and creates or
9
- updates the destination index, it generates a _checkpoint_.
8
+ Each time a {transform} examines the source indices and creates or updates the
9
+ destination index, it generates a _checkpoint_.
10
10
11
- If your {transform} runs only once, there is logically only one
12
- checkpoint. If your {transform} runs continuously, however, it creates
13
- checkpoints as it ingests and transforms new source data.
11
+ If your {transform} runs only once, there is logically only one checkpoint. If
12
+ your {transform} runs continuously, however, it creates checkpoints as it
13
+ ingests and transforms new source data.
14
14
15
15
To create a checkpoint, the {ctransform}:
16
16
17
17
. Checks for changes to source indices.
18
18
+
19
- Using a simple periodic timer, the {transform} checks for changes to
20
- the source indices. This check is done based on the interval defined in the
21
- transform's `frequency` property.
19
+ Using a simple periodic timer, the {transform} checks for changes to the source
20
+ indices. This check is done based on the interval defined in the transform's
21
+ `frequency` property.
22
22
+
23
23
If the source indices remain unchanged or if a checkpoint is already in progress
24
24
then it waits for the next timer.
25
25
26
26
. Identifies which entities have changed.
27
27
+
28
- The {transform} searches to see which entities have changed since the
29
- last time it checked. The `sync` configuration object in the {transform}
30
- identifies a time field in the source indices. The {transform} uses the values
31
- in that field to synchronize the source and destination indices.
28
+ The {transform} searches to see which entities have changed since the last time
29
+ it checked. The `sync` configuration object in the {transform} identifies a time
30
+ field in the source indices. The {transform} uses the values in that field to
31
+ synchronize the source and destination indices.
32
32
33
33
. Updates the destination index (the {dataframe}) with the changed entities.
34
34
+
35
35
--
36
- The {transform} applies changes related to either new or changed
37
- entities to the destination index. The set of changed entities is paginated. For
38
- each page, the {transform} performs a composite aggregation using a
39
- `terms` query. After all the pages of changes have been applied, the checkpoint
40
- is complete.
36
+ The {transform} applies changes related to either new or changed entities to the
37
+ destination index. The set of changed entities is paginated. For each page, the
38
+ {transform} performs a composite aggregation using a `terms` query. After all
39
+ the pages of changes have been applied, the checkpoint is complete.
41
40
--
42
41
43
42
This checkpoint process involves both search and indexing activity on the
44
43
cluster. We have attempted to favor control over performance while developing
45
- {transforms}. We decided it was preferable for the
46
- {transform} to take longer to complete, rather than to finish quickly
47
- and take precedence in resource consumption. That being said, the cluster still
48
- requires enough resources to support both the composite aggregation search and
49
- the indexing of its results.
44
+ {transforms}. We decided it was preferable for the {transform} to take longer to
45
+ complete, rather than to finish quickly and take precedence in resource
46
+ consumption. That being said, the cluster still requires enough resources to
47
+ support both the composite aggregation search and the indexing of its results.
50
48
51
49
TIP: If the cluster experiences unsuitable performance degradation due to the
52
50
{transform}, stop the {transform} and refer to <<transform-performance>>.
@@ -63,20 +61,18 @@ persisted periodically.
63
61
Checkpoint failures can be categorized as follows:
64
62
65
63
* Temporary failures: The checkpoint is retried. If 10 consecutive failures
66
- occur, the {transform} has a failed status. For example, this
67
- situation might occur when there are shard failures and queries return only
68
- partial results.
69
- * Irrecoverable failures: The {transform} immediately fails. For
70
- example, this situation occurs when the source index is not found.
71
- * Adjustment failures: The {transform} retries with adjusted settings.
72
- For example, if a parent circuit breaker memory errors occur during the
73
- composite aggregation, the {transform} receives partial results. The aggregated
74
- search is retried with a smaller number of buckets. This retry is performed at
75
- the interval defined in the `frequency` property for the {transform}. If the
76
- search is retried to the point where it reaches a minimal number of buckets, an
64
+ occur, the {transform} has a failed status. For example, this situation might
65
+ occur when there are shard failures and queries return only partial results.
66
+ * Irrecoverable failures: The {transform} immediately fails. For example, this
67
+ situation occurs when the source index is not found.
68
+ * Adjustment failures: The {transform} retries with adjusted settings. For
69
+ example, if a parent circuit breaker memory errors occur during the composite
70
+ aggregation, the {transform} receives partial results. The aggregated search is
71
+ retried with a smaller number of buckets. This retry is performed at the
72
+ interval defined in the `frequency` property for the {transform}. If the search
73
+ is retried to the point where it reaches a minimal number of buckets, an
77
74
irrecoverable failure occurs.
78
75
79
- If the node running the {transforms} fails, the {transform} restarts
80
- from the most recent persisted cursor position. This recovery process might
81
- repeat some of the work the {transform} had already done, but it ensures data
82
- consistency.
76
+ If the node running the {transforms} fails, the {transform} restarts from the
77
+ most recent persisted cursor position. This recovery process might repeat some
78
+ of the work the {transform} had already done, but it ensures data consistency.
0 commit comments