Skip to content

Commit 5bf7249

Browse files
kosabogiszabosteve
andauthored
Document delay subparameter in transform checkpoints and usage guide (elastic#125280) (elastic#125496)
* Adds explanation on the delay parameter * Attribute fixes * Update docs/reference/transform/usage.asciidoc --------- Co-authored-by: István Zoltán Szabó <[email protected]>
1 parent 4a5494d commit 5bf7249

File tree

2 files changed

+12
-0
lines changed

2 files changed

+12
-0
lines changed

docs/reference/transform/checkpoints.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ Using a simple periodic timer, the {transform} checks for changes to the source
2121
indices. This check is done based on the interval defined in the transform's
2222
`frequency` property.
2323
+
24+
If new data is ingested with a slight delay, it might not be immediately available when the {transform} runs. To prevent missing documents, you can use the `delay` parameter in the `sync` configuration. This shifts the search window backward, ensuring that late-arriving data is included before a checkpoint processes it. Adjusting this value based on your data ingestion patterns can help ensure completeness.
25+
+
2426
If the source indices remain unchanged or if a checkpoint is already in progress
2527
then it waits for the next timer.
2628
+

docs/reference/transform/usage.asciidoc

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,13 @@ have a high level dashboard that is accessed by a large number of users and it
5353
uses a complex aggregation over a large dataset, it may be more efficient to
5454
create a {transform} to cache results. Thus, each user doesn't need to run the
5555
aggregation query.
56+
57+
* You need to account for late-arriving data.
58+
+
59+
In some cases, data might not be immediately available when a {transform} runs, leading to missing records in the destination index. This can happen due to ingestion delays, where documents take a few seconds or minutes to become searchable after being indexed.
60+
To handle this, the `delay` parameter in the {transform}'s sync configuration allows you to postpone processing new data. Instead of always querying the most recent records, the {transform} will skip a short period of time (for example, 60 seconds) to ensure all relevant data has arrived before processing.
61+
+
62+
For example, if a {transform} runs every 5 minutes, it usually processes data from 5 minutes ago up to the current time. However, if you set `delay` to 60 seconds, the {transform} will instead process data from 6 minutes ago up to 1 minute ago, making sure that any documents that arrived late are included.
63+
By adjusting the `delay` parameter, you can improve the accuracy of transformed data while still maintaining near real-time results.
64+
65+

0 commit comments

Comments
 (0)