Skip to content

[Feature Request] Support configurable document mapping in pull-based ingestionΒ #20721

@imRishN

Description

@imRishN

Is your feature request related to a problem? Please describe

Pull-based ingestion currently requires messages to be in a specific envelope format:

{"_id": "doc1", "_source": {"field": "value"}, "_op_type": "index"}

Real-world streaming sources produce raw data that doesn't conform to this format. Users must run an external preprocessing layer (e.g., Flink, custom consumers) to transform raw messages into the expected envelope before publishing to an intermediate topic.

This issue tracks adding configurable document mapping so users can tell OpenSearch which raw message fields map to _id, _version, and _op_type:

  PUT /my-index
  {
    "settings": {
      "ingestion_source": {
        "type": "kafka",
        "mapper_type": "field_mapping",
        "mapper_settings.id_field": "user_id",
        "mapper_settings.version_field": "timestamp",
        "mapper_settings.op_type_field": "is_deleted"
      }
    }
  }

Describe the solution you'd like

Scope

  • mapper_settings.* prefix setting on IngestionSource for mapper-specific options
  • field_mapping mapper type that extracts _id, _version, _op_type from configurable top-level fields in the raw message
  • Extracted fields are removed from _source; remaining fields become the document source
  • Backward compatible: existing default and raw_payload mappers unchanged
  • Version compatibility check to prevent usage in mixed clusters

Related component

No response

Describe alternatives you've considered

No response

Additional context

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions