Skip to content

Commit a99f176

Browse files
Update periodic flush and message mapper settings (#11562)
* update periodic flush and message mapper settings Signed-off-by: Varun Bharadwaj <[email protected]> * Update _api-reference/document-apis/pull-based-ingestion.md Co-authored-by: kolchfa-aws <[email protected]> Signed-off-by: Varun Bharadwaj <[email protected]> * Update _api-reference/document-apis/pull-based-ingestion.md Co-authored-by: kolchfa-aws <[email protected]> Signed-off-by: Varun Bharadwaj <[email protected]> * Update _api-reference/document-apis/pull-based-ingestion.md Co-authored-by: kolchfa-aws <[email protected]> Signed-off-by: Varun Bharadwaj <[email protected]> * Update _api-reference/document-apis/pull-based-ingestion.md Co-authored-by: kolchfa-aws <[email protected]> Signed-off-by: Varun Bharadwaj <[email protected]> * Update _install-and-configure/configuring-opensearch/index-settings.md Co-authored-by: kolchfa-aws <[email protected]> Signed-off-by: Varun Bharadwaj <[email protected]> * add description from suggestion Signed-off-by: Varun Bharadwaj <[email protected]> --------- Signed-off-by: Varun Bharadwaj <[email protected]> Co-authored-by: kolchfa-aws <[email protected]>
1 parent 6560cd9 commit a99f176

File tree

2 files changed

+19
-1
lines changed

2 files changed

+19
-1
lines changed

_api-reference/document-apis/pull-based-ingestion.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,9 +76,18 @@ The `ingestion_source` parameters control how OpenSearch pulls data from the str
7676
| `internal_queue_size` | The size of the internal blocking queue for advanced tuning. Valid values are from 1 to 100,000, inclusive. Optional. Default is 100. |
7777
| `all_active` | Whether to enable the all-active ingestion mode. Cannot be enabled for indexes that use segment replication mode. Default is `false`. See [Ingestion modes](#ingestion-modes). |
7878
| `pointer_based_lag_update_interval` | The interval at which pointer-based lag is calculated. Accepts time units. Default is `10s`. Setting this value to `0` disables pointer-based lag calculation. |
79+
| `mapper_type` | Defines the mapper for the input message format. Valid values are `default` and `raw_payload`. See [Message format](#message-format). |
7980
| `param` | Source-specific configuration parameters. Required. <br>&ensp;&#x2022; The `ingest-kafka` plugin requires:<br>&ensp;&ensp;- `topic`: The Kafka topic to consume from<br>&ensp;&ensp;- `bootstrap_servers`: The Kafka server addresses<br>&ensp;&ensp;Optionally, you can provide additional standard Kafka consumer parameters (such as `fetch.min.bytes`). These parameters are passed directly to the Kafka consumer. <br>&ensp;&#x2022; The `ingest-kinesis` plugin requires:<br>&ensp;&ensp;- `stream`: The Kinesis stream name<br>&ensp;&ensp;- `region`: The AWS Region<br>&ensp;&ensp;- `access_key`: The AWS access key<br>&ensp;&ensp;- `secret_key`: The AWS secret key<br>&ensp;&ensp;Optionally, you can provide an `endpoint_override`. |
8081

8182

83+
### Other parameters
84+
85+
Pull-based ingestion supports the following OpenSearch parameters.
86+
87+
| Parameter | Description |
88+
| :--- | :--- |
89+
| `index.periodic_flush_interval` | The interval at which OpenSearch will trigger a flush. Default for pull-based ingestion indexes is `10m`. See [Index settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index-settings/). |
90+
8291
### Ingestion modes
8392

8493
Pull-based ingestion supports the following modes.
@@ -173,7 +182,7 @@ To be correctly processed by OpenSearch, messages in the streaming source must h
173182
{"_id":"2", "_version":"2", "_source":{"name": "alice", "age": 30}, "_op_type": "delete"}
174183
```
175184

176-
Each data unit in the streaming source (Kafka message or Kinesis record) must include the following fields that specify how to create or modify an OpenSearch document.
185+
Each data unit in the streaming source (Kafka message or Kinesis record) must include the following fields that specify how to create or modify an OpenSearch document. This is the default format supported by pull-based ingestion.
177186

178187
| Field | Data type | Required | Description |
179188
| :--- | :--- | :--- | :--- |
@@ -182,6 +191,13 @@ Each data unit in the streaming source (Kafka message or Kinesis record) must in
182191
| `_op_type` | String | No | The operation to perform. Valid values are:<br>- `index`: Creates a new document or updates an existing one.<br>- `create`: Creates a new document in append mode. Note that this will not update existing documents. <br>- `delete`: Soft deletes a document. |
183192
| `_source` | Object | Yes | The message payload containing the document data. |
184193

194+
Alternatively, pull-based ingestion supports indexing raw payloads in append-only mode without transformations. To enable this behavior, set `index.ingestion_source.mapper_type` to `raw_payload`. Note that in this mode, the index mappings must conform to the message structure because dynamic mapping is not supported. When using `raw_payload`, you must provide raw JSON objects exactly as they appear in the incoming data stream, as shown in the following example:
195+
196+
```json
197+
{"name": "alice", "age": 30}
198+
{"name": "bob", "age": 30}
199+
```
200+
185201
## Pull-based ingestion metrics
186202

187203
Pull-based ingestion provides metrics that can be used to monitor the ingestion process. The `polling_ingest_stats` metric is currently supported and is available at the shard level.

_install-and-configure/configuring-opensearch/index-settings.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,8 @@ OpenSearch supports the following dynamic index-level index settings:
295295

296296
- `index.derived_source.translog.enabled` (Boolean): Controls how documents are read from the translog for an index with derived source enabled. Defaults to the `index.derived_source.enabled` value. For more information, see [Derived source]({{site.url}}{{site.baseurl}}/mappings/metadata-fields/source/#derived-source).
297297

298+
- `index.periodic_flush_interval` (Time unit): Triggers a flush periodically at the configured interval, storing all in-memory operations to segments on disk. OpenSearch automatically performs flush operations in the background based on conditions such as transaction log size. Default is `-1`, which disables periodic flush. You can configure this setting if your workload requires predictable, time-based flush intervals.
299+
298300
### Updating a dynamic index setting
299301

300302
You can update a dynamic index setting at any time through the API. For example, to update the refresh interval, use the following request:

0 commit comments

Comments
 (0)