Skip to content
Discussion options

You must be logged in to vote

Unlike Hive, in Iceberg there isn't a separate metadata call for fetching partitions of a table that this functionality can be built on top of. Since we enumerate splits lazily and get them through iceberg library, we find out about partitions at execution time as the files are processed.
Just the number of partitions doesn't seem like a great metric for blocking things as it doesn't take into account the amount of data per partition, the number of columns queried and the effect of predicate pushdown in orc/parquet on the worker. It also ignores unpartitioned tables completely. I think query.max-scan-physical-bytes is a good substitute here. Maybe we can add an SPI in ConnectorMetadata (D…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by okayhooni
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #20543 on February 15, 2024 04:49.