Feature Request / Improvement
The OpenSearch connector does not recognize the match_only_text field type introduced in OpenSearch 2.12. Fields with this type are silently excluded from the table schema, making them invisible to Trino queries:
DESCRIBE opensearch.default.my_index;
-- only shows keyword, float, etc. fields
-- match_only_text fields are missing
The match_only_text type is a storage-optimized variant of text that omits positions, frequencies, and norms. From a read perspective, it is identical to text — both return a plain string value from _source.
OpenSearch documentation: https://opensearch.org/blog/Optimize-storage-and-performance-using-MatchOnlyText-field
The text type is already mapped to VARCHAR in the connector. The same mapping should apply to match_only_text.
Use case: match_only_text is commonly used for large text fields such as log messages, document bodies,... where storage savings of up to 25% over text are significant. Users of these fields are currently forced to use raw_query to access them, which returns a single JSON blob instead of individual rows and is capped at the max_result_window limit (10,000 by default). This prevents standard SQL operations like JOINs and aggregations over these fields.
Proposed approach:
- Add
match_only_text alongside text in the type mapping logic that resolves OpenSearch field types to Trino types (likely a one-line change in the decoder/metadata layer).
- Predicate pushdown is not expected for this type, consistent with the existing
text behavior.
- Add a test case with a
match_only_text field to confirm it is exposed as VARCHAR and readable via standard SQL.
- Update the type mapping table in the OpenSearch connector docs.
Feature Request / Improvement
The OpenSearch connector does not recognize the
match_only_textfield type introduced in OpenSearch 2.12. Fields with this type are silently excluded from the table schema, making them invisible to Trino queries:The
match_only_texttype is a storage-optimized variant oftextthat omits positions, frequencies, and norms. From a read perspective, it is identical totext— both return a plain string value from_source.OpenSearch documentation: https://opensearch.org/blog/Optimize-storage-and-performance-using-MatchOnlyText-field
The
texttype is already mapped toVARCHARin the connector. The same mapping should apply tomatch_only_text.Use case:
match_only_textis commonly used for large text fields such as log messages, document bodies,... where storage savings of up to 25% overtextare significant. Users of these fields are currently forced to useraw_queryto access them, which returns a single JSON blob instead of individual rows and is capped at themax_result_windowlimit (10,000 by default). This prevents standard SQL operations like JOINs and aggregations over these fields.Proposed approach:
match_only_textalongsidetextin the type mapping logic that resolves OpenSearch field types to Trino types (likely a one-line change in the decoder/metadata layer).textbehavior.match_only_textfield to confirm it is exposed asVARCHARand readable via standard SQL.