@@ -10,7 +10,8 @@ destination index, it generates a _checkpoint_.
10
10
11
11
If your {transform} runs only once, there is logically only one checkpoint. If
12
12
your {transform} runs continuously, however, it creates checkpoints as it
13
- ingests and transforms new source data.
13
+ ingests and transforms new source data. The `sync` property of the {transform}
14
+ configures checkpointing by specifying a time field.
14
15
15
16
To create a checkpoint, the {ctransform}:
16
17
@@ -22,21 +23,25 @@ indices. This check is done based on the interval defined in the transform's
22
23
+
23
24
If the source indices remain unchanged or if a checkpoint is already in progress
24
25
then it waits for the next timer.
26
+ +
27
+ If changes are found a checkpoint is created.
25
28
26
- . Identifies which entities have changed.
29
+ . Identifies which entities and/or time buckets have changed.
27
30
+
28
- The {transform} searches to see which entities have changed since the last time
29
- it checked. The `sync` configuration object in the {transform} identifies a time
30
- field in the source indices. The {transform} uses the values in that field to
31
- synchronize the source and destination indices .
31
+ The {transform} searches to see which entities or time buckets have changed
32
+ between the last and the new checkpoint. The {transform} uses the values to
33
+ synchronize the source and destination indices with fewer operations than a
34
+ full re-run .
32
35
33
- . Updates the destination index (the {dataframe}) with the changed entities .
36
+ . Updates the destination index (the {dataframe}) with the changes .
34
37
+
35
38
--
36
- The {transform} applies changes related to either new or changed entities to the
37
- destination index. The set of changed entities is paginated. For each page, the
38
- {transform} performs a composite aggregation using a `terms` query. After all
39
- the pages of changes have been applied, the checkpoint is complete.
39
+ The {transform} applies changes related to either new or changed entities or
40
+ time buckets to the destination index. The set of changes can be paginated. The
41
+ {transform} performs a composite aggregation similarly to the batch {transform}
42
+ operation, however it also injects query filters based on the previous step to
43
+ reduce the amount work. After all changes have been applied, the checkpoint is
44
+ complete.
40
45
--
41
46
42
47
This checkpoint process involves both search and indexing activity on the
@@ -49,6 +54,25 @@ support both the composite aggregation search and the indexing of its results.
49
54
TIP: If the cluster experiences unsuitable performance degradation due to the
50
55
{transform}, stop the {transform} and refer to <<transform-performance>>.
51
56
57
+
58
+ [discrete]
59
+ [[ml-transform-checkpoint-heuristics]]
60
+ == Change detection heuristics
61
+
62
+ When the {transform} runs in continuous mode, it updates the documents in the
63
+ destination index as new data comes in. The {transform} uses a set of heuristics
64
+ called change detection to update the destination index with fewer operations.
65
+
66
+ In this example, the data is grouped by host names. Change detection detects
67
+ which host names have changed, for example, host `A`, `C` and `G` and only
68
+ updates documents with those hosts but does not update documents that store
69
+ information about host `B`, `D`, or any other host that are not changed.
70
+
71
+ Another heuristic can be applied for time buckets when a `date_histogram` is
72
+ used to group by time buckets. Change detection detects which time buckets have
73
+ changed and only update those.
74
+
75
+
52
76
[discrete]
53
77
[[ml-transform-checkpoint-errors]]
54
78
== Error handling
0 commit comments