From 9dd40877b6467eedcad023459889c345e3608ecb Mon Sep 17 00:00:00 2001
From: kosabogi <105062005+kosabogi@users.noreply.github.com>
Date: Mon, 24 Mar 2025 14:15:03 +0100
Subject: [PATCH] Document delay subparameter in transform checkpoints and
 usage guide (#125280)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Adds explanation on the delay parameter

* Attribute fixes

* Update docs/reference/transform/usage.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
---
 docs/reference/transform/checkpoints.asciidoc |  2 ++
 docs/reference/transform/usage.asciidoc       | 10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/docs/reference/transform/checkpoints.asciidoc b/docs/reference/transform/checkpoints.asciidoc
index 77e1eae318327..8a08f483f94ff 100644
--- a/docs/reference/transform/checkpoints.asciidoc
+++ b/docs/reference/transform/checkpoints.asciidoc
@@ -21,6 +21,8 @@ Using a simple periodic timer, the {transform} checks for changes to the source
 indices. This check is done based on the interval defined in the transform's 
 `frequency` property.
 +
+If new data is ingested with a slight delay, it might not be immediately available when the {transform} runs. To prevent missing documents, you can use the `delay` parameter in the `sync` configuration. This shifts the search window backward, ensuring that late-arriving data is included before a checkpoint processes it. Adjusting this value based on your data ingestion patterns can help ensure completeness.
++
 If the source indices remain unchanged or if a checkpoint is already in progress
 then it waits for the next timer.
 +
diff --git a/docs/reference/transform/usage.asciidoc b/docs/reference/transform/usage.asciidoc
index 2153ee63aa510..0fd3822a22fc3 100644
--- a/docs/reference/transform/usage.asciidoc
+++ b/docs/reference/transform/usage.asciidoc
@@ -53,3 +53,13 @@ have a high level dashboard that is accessed by a large number of users and it
 uses a complex aggregation over a large dataset, it may be more efficient to
 create a {transform} to cache results. Thus, each user doesn't need to run the
 aggregation query.
+
+* You need to account for late-arriving data.
++
+In some cases, data might not be immediately available when a {transform} runs, leading to missing records in the destination index. This can happen due to ingestion delays, where documents take a few seconds or minutes to become searchable after being indexed.
+To handle this, the `delay` parameter in the {transform}'s sync configuration allows you to postpone processing new data. Instead of always querying the most recent records, the {transform} will skip a short period of time (for example, 60 seconds) to ensure all relevant data has arrived before processing.
++
+For example, if a {transform} runs every 5 minutes, it usually processes data from 5 minutes ago up to the current time. However, if you set `delay` to 60 seconds, the {transform} will instead process data from 6 minutes ago up to 1 minute ago, making sure that any documents that arrived late are included.
+By adjusting the `delay` parameter, you can improve the accuracy of transformed data while still maintaining near real-time results.
+
+