[Pull-based Ingestion] Update pull-based semantics and offset based lag metric (#11352)

varunbharadwaj · kolchfa-aws · web-flow · commit 10d778228f66 · 2025-10-21T10:25:32.000-04:00
* update pull-based semantics and offset based lag metric

Signed-off-by: Varun Bharadwaj &lt;varunbharadwaj1995@gmail.com&gt;

* Update _api-reference/document-apis/pull-based-ingestion.md

Co-authored-by: kolchfa-aws &lt;105444904+kolchfa-aws@users.noreply.github.com&gt;
Signed-off-by: Varun Bharadwaj &lt;varunbharadwaj1995@gmail.com&gt;

* Update _api-reference/document-apis/pull-based-ingestion.md

Co-authored-by: kolchfa-aws &lt;105444904+kolchfa-aws@users.noreply.github.com&gt;
Signed-off-by: Varun Bharadwaj &lt;varunbharadwaj1995@gmail.com&gt;

---------

Signed-off-by: Varun Bharadwaj &lt;varunbharadwaj1995@gmail.com&gt;
Co-authored-by: kolchfa-aws &lt;105444904+kolchfa-aws@users.noreply.github.com&gt;
diff --git a/_api-reference/document-apis/pull-based-ingestion.md b/_api-reference/document-apis/pull-based-ingestion.md
@@ -13,7 +13,7 @@ nav_order: 90
 This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).    
 {: .warning}
 
-Pull-based ingestion enables OpenSearch to ingest data from streaming sources such as Apache Kafka or Amazon Kinesis. Unlike traditional ingestion methods where clients actively push data to OpenSearch through REST APIs, pull-based ingestion allows OpenSearch to control the data flow by retrieving data directly from streaming sources. This approach provides exactly-once ingestion semantics and native backpressure handling, helping prevent server overload during traffic spikes.
+Pull-based ingestion enables OpenSearch to ingest data from streaming sources such as Apache Kafka or Amazon Kinesis. Unlike traditional ingestion methods where clients actively push data to OpenSearch through REST APIs, pull-based ingestion allows OpenSearch to control the data flow by retrieving data directly from streaming sources. This approach provides native backpressure handling, helping prevent server overload during traffic spikes. Pull-based ingestion guarantees at-least-once ingestion semantics and uses external versioning to ensure data consistency.
 
 ## Prerequisites
 
@@ -199,8 +199,8 @@ The following table lists the available `polling_ingest_stats` metrics.
 | `consumer_stats.total_consumer_error_count` | The total number of fatal consumer read errors. |
 | `consumer_stats.total_poller_message_failure_count` | The total number of failed messages on the poller. |
 | `consumer_stats.total_poller_message_dropped_count` | The total number of failed messages on the poller that were dropped. |
-| `consumer_stats.total_duplicate_message_skipped_count` | The total number of skipped messages that were previously processed. |
 | `consumer_stats.lag_in_millis` | Lag in milliseconds, computed as the time elapsed since the last processed message timestamp. |
+| `consumer_stats.pointer_based_lag` | The Apache Kafka offset-based lag, calculated as the difference between the latest available offset and the current message offset. This metric applies only when Apache Kafka is used as the streaming source. |
 
 To retrieve shard-level pull-based ingestion metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/update-settings/):