You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _api-reference/document-apis/pull-based-ingestion.md
+17-1Lines changed: 17 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,9 +76,18 @@ The `ingestion_source` parameters control how OpenSearch pulls data from the str
76
76
|`internal_queue_size`| The size of the internal blocking queue for advanced tuning. Valid values are from 1 to 100,000, inclusive. Optional. Default is 100. |
77
77
|`all_active`| Whether to enable the all-active ingestion mode. Cannot be enabled for indexes that use segment replication mode. Default is `false`. See [Ingestion modes](#ingestion-modes). |
78
78
|`pointer_based_lag_update_interval`| The interval at which pointer-based lag is calculated. Accepts time units. Default is `10s`. Setting this value to `0` disables pointer-based lag calculation. |
79
+
|`mapper_type`| Defines the mapper for the input message format. Valid values are `default` and `raw_payload`. See [Message format](#message-format). |
79
80
|`param`| Source-specific configuration parameters. Required. <br> • The `ingest-kafka` plugin requires:<br>  - `topic`: The Kafka topic to consume from<br>  - `bootstrap_servers`: The Kafka server addresses<br>  Optionally, you can provide additional standard Kafka consumer parameters (such as `fetch.min.bytes`). These parameters are passed directly to the Kafka consumer. <br> • The `ingest-kinesis` plugin requires:<br>  - `stream`: The Kinesis stream name<br>  - `region`: The AWS Region<br>  - `access_key`: The AWS access key<br>  - `secret_key`: The AWS secret key<br>  Optionally, you can provide an `endpoint_override`. |
80
81
81
82
83
+
### Other parameters
84
+
85
+
Pull-based ingestion supports the following OpenSearch parameters.
86
+
87
+
| Parameter | Description |
88
+
| :--- | :--- |
89
+
|`index.periodic_flush_interval`| The interval at which OpenSearch will trigger a flush. Default for pull-based ingestion indexes is `10m`. See [Index settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index-settings/). |
90
+
82
91
### Ingestion modes
83
92
84
93
Pull-based ingestion supports the following modes.
@@ -173,7 +182,7 @@ To be correctly processed by OpenSearch, messages in the streaming source must h
Each data unit in the streaming source (Kafka message or Kinesis record) must include the following fields that specify how to create or modify an OpenSearch document.
185
+
Each data unit in the streaming source (Kafka message or Kinesis record) must include the following fields that specify how to create or modify an OpenSearch document. This is the default format supported by pull-based ingestion.
177
186
178
187
| Field | Data type | Required | Description |
179
188
| :--- | :--- | :--- | :--- |
@@ -182,6 +191,13 @@ Each data unit in the streaming source (Kafka message or Kinesis record) must in
182
191
|`_op_type`| String | No | The operation to perform. Valid values are:<br>- `index`: Creates a new document or updates an existing one.<br>- `create`: Creates a new document in append mode. Note that this will not update existing documents. <br>- `delete`: Soft deletes a document. |
183
192
|`_source`| Object | Yes | The message payload containing the document data. |
184
193
194
+
Alternatively, pull-based ingestion supports indexing raw payloads in append-only mode without transformations. To enable this behavior, set `index.ingestion_source.mapper_type` to `raw_payload`. Note that in this mode, the index mappings must conform to the message structure because dynamic mapping is not supported. When using `raw_payload`, you must provide raw JSON objects exactly as they appear in the incoming data stream, as shown in the following example:
195
+
196
+
```json
197
+
{"name": "alice", "age": 30}
198
+
{"name": "bob", "age": 30}
199
+
```
200
+
185
201
## Pull-based ingestion metrics
186
202
187
203
Pull-based ingestion provides metrics that can be used to monitor the ingestion process. The `polling_ingest_stats` metric is currently supported and is available at the shard level.
Copy file name to clipboardExpand all lines: _install-and-configure/configuring-opensearch/index-settings.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -295,6 +295,8 @@ OpenSearch supports the following dynamic index-level index settings:
295
295
296
296
-`index.derived_source.translog.enabled` (Boolean): Controls how documents are read from the translog for an index with derived source enabled. Defaults to the `index.derived_source.enabled` value. For more information, see [Derived source]({{site.url}}{{site.baseurl}}/mappings/metadata-fields/source/#derived-source).
297
297
298
+
-`index.periodic_flush_interval` (Time unit): Triggers a flush periodically at the configured interval, storing all in-memory operations to segments on disk. OpenSearch automatically performs flush operations in the background based on conditions such as transaction log size. Default is `-1`, which disables periodic flush. You can configure this setting if your workload requires predictable, time-based flush intervals.
299
+
298
300
### Updating a dynamic index setting
299
301
300
302
You can update a dynamic index setting at any time through the API. For example, to update the refresh interval, use the following request:
0 commit comments