Add MOR snapshot reads, metadata index pruning, and table statistics to Hudi connector#28511
Add MOR snapshot reads, metadata index pruning, and table statistics to Hudi connector#28511voonhous wants to merge 6 commits intotrinodb:masterfrom
Conversation
- HudiFile/HudiBaseFile/HudiLogFile: Trino-native wrappers for Hudi base files and log files - TupleDomainUtils: predicate helpers for domain-based index lookups - HudiAvroSerializer: bidirectional Avro <-> Trino type conversion - HudiTableTypeUtils: COW/MOR input format detection - InlineSeekableDataInputStream / TrinoHudiInlineStorage: support for reading log files embedded via InLineFS URI scheme
Introduces TrinoHudiReaderContext and HudiPageSource to support reading MOR tables by merging base files with delta log files using HoodieFileGroupReader. HudiAvroSerializer bridges the Avro record representation used internally by the merger back to Trino pages.
Adds an extensible index strategy hierarchy (HudiIndexSupport) backed by four implementations: column stats, partition stats, record-level, and secondary indexes. IndexSupportFactory selects the best applicable strategy per query based on session config. Also adds HudiSplitColumns for materialising virtual columns ($path, $file_size, partition keys).
Adds TableStatisticsReader and TableMetadataReader to pull column-level statistics (row counts, null fractions, data sizes) from the Hudi metadata table's column stats partition. HudiExecutorModule provides a dedicated async executor for background statistics refresh.
- HudiConfig/HudiSessionProperties: add knobs for index types (column stats, partition stats, record-level, secondary), statistics, async timeouts, and column-name casing resolution - HudiBackgroundSplitLoader: integrate IndexSupportFactory for metadata index-driven file-slice pruning - HudiSplitFactory: use HudiBaseFile/HudiLogFile abstractions - HudiSnapshotDirectoryLister: replaces HudiReadOptimizedDirectoryLister - HudiMetadata: wire async statistics refresh via HudiExecutorModule - pom.xml: updated Hudi dependency versions
Adds 18 new Hudi test datasets (COW/MOR, partitioned/non-partitioned, multi-field-group, custom keygen, v6/v8 table versions) along with HudiTableUnzipper for loading zip-archived test tables. Expands TestHudiSmokeTest to cover the new table variants including MOR snapshot reads, column-name casing, timestamp keygens, and file-operation counts (alluxio/memory/no-cache).
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
@ebyhr I tried breaking up this PR into 6 smaller commits physically, grouping them by the components/feature they touch, but they cannot be independent of each other as they are stacked on top of each other. So, i think having them in one PR is the best way to go. Will it be possible for us to iterate on this PR commit-by-commit? I'm open to suggestions. After everything is done, I can squash the commit into one so that the CI doesn't throw any more errors. |
Description
Significantly expands the capabilities of the Hudi connector in three main areas:
Merge-On-Read (MOR) snapshot reads
Previously the connector could only read the read-optimised view of MOR tables (base Parquet files only). This adds a full snapshot read path using
HoodieFileGroupReaderthat merges base files with delta log files at query time, returning an up-to-date view of the table.TrinoHudiReaderContextandHudiPageSourcebridge the Hudi merging infrastructure to Trino pages via bidirectional Avro <-> Trino type conversion (HudiAvroSerializer).Metadata index-based file and partition pruning
Introduces an extensible
HudiIndexSupportstrategy hierarchy that uses the Hudi metadata table to skip file slices at split-planning time, without reading data files. Four strategies are implemented and selected in priority order viaIndexSupportFactory:All strategies are independently toggleable via session properties and connector config.
Table statistics for Cost based optimizer (CBO)
Reads column-level statistics (row counts, null fractions, data sizes) from the Hudi metadata table's
COLUMN_STATSpartition. Refresh runs asynchronously in the background so statistics never block query planning; a stale-but-close-enough cached value is returned immediately if available.Additional improvements
HudiSnapshotDirectoryListerreplacesHudiReadOptimizedDirectoryLister, unifying the directory listing path for both COW and MOR tablesresolveColumnNameCasingconfig/session option handles column name mismatches between the Hive metastore schema and the Hudi table schema (common when tables are created by case-sensitive Spark jobs)InlineSeekableDataInputStream,TrinoHudiInlineStorage) supports log files embedded via InLineFS URI schemeTupleDomainUtilsprovides shared predicate-extraction helpers used across index implementations and split generationAdditional context and related issues
Hudi tables come in two storage types:
The metadata index pruning strategies require the Hudi metadata table to be enabled on the table (
hoodie.metadata.enable=true, the default since Hudi 0.11). When the metadata table or a specific index partition is absent the connector falls back gracefully to a full listing.Statistics are similarly gated: if the
COLUMN_STATSmetadata partition is unavailable the connector returns empty statistics and the planner uses its defaults.Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text: