Skip to content

Add MOR snapshot reads, metadata index pruning, and table statistics to Hudi connector#28511

Open
voonhous wants to merge 6 commits intotrinodb:masterfrom
voonhous:oss-upstream-2
Open

Add MOR snapshot reads, metadata index pruning, and table statistics to Hudi connector#28511
voonhous wants to merge 6 commits intotrinodb:masterfrom
voonhous:oss-upstream-2

Conversation

@voonhous
Copy link

@voonhous voonhous commented Mar 3, 2026

Description

Significantly expands the capabilities of the Hudi connector in three main areas:

Merge-On-Read (MOR) snapshot reads
Previously the connector could only read the read-optimised view of MOR tables (base Parquet files only). This adds a full snapshot read path using HoodieFileGroupReader that merges base files with delta log files at query time, returning an up-to-date view of the table. TrinoHudiReaderContext and HudiPageSource bridge the Hudi merging infrastructure to Trino pages via bidirectional Avro <-> Trino type conversion (HudiAvroSerializer).

Metadata index-based file and partition pruning
Introduces an extensible HudiIndexSupport strategy hierarchy that uses the Hudi metadata table to skip file slices at split-planning time, without reading data files. Four strategies are implemented and selected in priority order via IndexSupportFactory:

Strategy Prunes by
Record-level index Exact record key equality / IN predicates
Secondary index Arbitrary column equality predicates
Column stats index Domain range overlap per-file (async, configurable timeout)
Partition stats index Column stats extended with IS NULL / IS NOT NULL on partition columns

All strategies are independently toggleable via session properties and connector config.

Table statistics for Cost based optimizer (CBO)
Reads column-level statistics (row counts, null fractions, data sizes) from the Hudi metadata table's COLUMN_STATS partition. Refresh runs asynchronously in the background so statistics never block query planning; a stale-but-close-enough cached value is returned immediately if available.

Additional improvements

  • HudiSnapshotDirectoryLister replaces HudiReadOptimizedDirectoryLister, unifying the directory listing path for both COW and MOR tables
  • resolveColumnNameCasing config/session option handles column name mismatches between the Hive metastore schema and the Hudi table schema (common when tables are created by case-sensitive Spark jobs)
  • Inline file I/O (InlineSeekableDataInputStream, TrinoHudiInlineStorage) supports log files embedded via InLineFS URI scheme
  • TupleDomainUtils provides shared predicate-extraction helpers used across index implementations and split generation
  • 18 new test datasets covering: COW/MOR, partitioned/non-partitioned, multi-file-group, custom and timestamp-based key generators, v6/v8 table versions, and mixed field-name casing

Additional context and related issues

Hudi tables come in two storage types:

  • COW (Copy-On-Write): writes produce new Parquet files; reads are simple file scans. Already worked before this PR.
  • MOR (Merge-On-Read): writes append delta records to Avro/HFile log files; reads must merge the log files on top of the base Parquet file for each file group to get the latest view. This PR enables that merge.

The metadata index pruning strategies require the Hudi metadata table to be enabled on the table (hoodie.metadata.enable=true, the default since Hudi 0.11). When the metadata table or a specific index partition is absent the connector falls back gracefully to a full listing.

Statistics are similarly gated: if the COLUMN_STATS metadata partition is unavailable the connector returns empty statistics and the planner uses its defaults.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:

## Hudi connector
* Add support for reading Merge-On-Read (MOR) tables using snapshot reads that merge base files with delta log files.
* Improve query performance by using Hudi metadata table indexes (column stats, partition stats, record-level, secondary) to skip irrelevant file slices at split-planning time.
* Add support for table statistics read from the Hudi metadata table, enabling the cost-based optimizer to generate better query plans.
* Add `hudi.resolve-column-name-casing` config property and `hudi_resolve_column_name_casing` session property to handle column name mismatches between the metastore schema and the Hudi table schema.

voonhous added 6 commits March 3, 2026 16:49
- HudiFile/HudiBaseFile/HudiLogFile: Trino-native wrappers for Hudi
  base files and log files
- TupleDomainUtils: predicate helpers for domain-based index lookups
- HudiAvroSerializer: bidirectional Avro <-> Trino type conversion
- HudiTableTypeUtils: COW/MOR input format detection
- InlineSeekableDataInputStream / TrinoHudiInlineStorage: support for
  reading log files embedded via InLineFS URI scheme
Introduces TrinoHudiReaderContext and HudiPageSource to support reading
MOR tables by merging base files with delta log files using
HoodieFileGroupReader. HudiAvroSerializer bridges the Avro record
representation used internally by the merger back to Trino pages.
Adds an extensible index strategy hierarchy (HudiIndexSupport) backed by
four implementations: column stats, partition stats, record-level, and
secondary indexes. IndexSupportFactory selects the best applicable
strategy per query based on session config. Also adds HudiSplitColumns
for materialising virtual columns ($path, $file_size, partition keys).
Adds TableStatisticsReader and TableMetadataReader to pull column-level
statistics (row counts, null fractions, data sizes) from the Hudi
metadata table's column stats partition. HudiExecutorModule provides a
dedicated async executor for background statistics refresh.
- HudiConfig/HudiSessionProperties: add knobs for index types (column
  stats, partition stats, record-level, secondary), statistics, async
  timeouts, and column-name casing resolution
- HudiBackgroundSplitLoader: integrate IndexSupportFactory for metadata
  index-driven file-slice pruning
- HudiSplitFactory: use HudiBaseFile/HudiLogFile abstractions
- HudiSnapshotDirectoryLister: replaces HudiReadOptimizedDirectoryLister
- HudiMetadata: wire async statistics refresh via HudiExecutorModule
- pom.xml: updated Hudi dependency versions
Adds 18 new Hudi test datasets (COW/MOR, partitioned/non-partitioned,
multi-field-group, custom keygen, v6/v8 table versions) along with
HudiTableUnzipper for loading zip-archived test tables.
Expands TestHudiSmokeTest to cover the new table variants including
MOR snapshot reads, column-name casing, timestamp keygens, and
file-operation counts (alluxio/memory/no-cache).
@cla-bot
Copy link

cla-bot bot commented Mar 3, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added hudi Hudi connector lakehouse labels Mar 3, 2026
@voonhous voonhous changed the title Oss upstream 2 Upstream Hudi features to open source Mar 3, 2026
@voonhous voonhous changed the title Upstream Hudi features to open source Add MOR snapshot reads, metadata index pruning, and table statistics to Hudi connector Mar 3, 2026
@voonhous
Copy link
Author

voonhous commented Mar 3, 2026

@ebyhr I tried breaking up this PR into 6 smaller commits physically, grouping them by the components/feature they touch, but they cannot be independent of each other as they are stacked on top of each other. So, i think having them in one PR is the best way to go.

Will it be possible for us to iterate on this PR commit-by-commit? I'm open to suggestions.

After everything is done, I can squash the commit into one so that the CI doesn't throw any more errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hudi Hudi connector lakehouse

Development

Successfully merging this pull request may close these issues.

1 participant