Skip to content

OpenSearch connector: Add support for match_only_text field type #29165

@pedroluislopez

Description

@pedroluislopez

Feature Request / Improvement

The OpenSearch connector does not recognize the match_only_text field type introduced in OpenSearch 2.12. Fields with this type are silently excluded from the table schema, making them invisible to Trino queries:

DESCRIBE opensearch.default.my_index;

-- only shows keyword, float, etc. fields
-- match_only_text fields are missing

The match_only_text type is a storage-optimized variant of text that omits positions, frequencies, and norms. From a read perspective, it is identical to text — both return a plain string value from _source.
OpenSearch documentation: https://opensearch.org/blog/Optimize-storage-and-performance-using-MatchOnlyText-field

The text type is already mapped to VARCHAR in the connector. The same mapping should apply to match_only_text.

Use case: match_only_text is commonly used for large text fields such as log messages, document bodies,... where storage savings of up to 25% over text are significant. Users of these fields are currently forced to use raw_query to access them, which returns a single JSON blob instead of individual rows and is capped at the max_result_window limit (10,000 by default). This prevents standard SQL operations like JOINs and aggregations over these fields.

Proposed approach:

  • Add match_only_text alongside text in the type mapping logic that resolves OpenSearch field types to Trino types (likely a one-line change in the decoder/metadata layer).
  • Predicate pushdown is not expected for this type, consistent with the existing text behavior.
  • Add a test case with a match_only_text field to confirm it is exposed as VARCHAR and readable via standard SQL.
  • Update the type mapping table in the OpenSearch connector docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions