You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ducklake looks really amazing — I'm looking forward to migrating our Iceberg stack to it.
However, we make intensive use of bucket transform partitioning in Iceberg, and I was wondering whether this is something you’re planning to support in Ducklake as well.
Our use case is essentially to partition ~6M elements identified by a business ID (with thousands of rows per ID). We need to batch‑access these elements, but there’s no particular business logic involved. So we currently use 200 buckets in Iceberg and are able to target a specific bucket when retrieving multiple elements.
A possible workaround would be to add a bucket column to the data (hash(business_id) % number_of_buckets) and use an IdentityTransform on it. But this feels a bit hacky and hard to maintain — especially if the number of buckets needs to change.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello there 🦆
Ducklake looks really amazing — I'm looking forward to migrating our Iceberg stack to it.
However, we make intensive use of
bucket transformpartitioning in Iceberg, and I was wondering whether this is something you’re planning to support in Ducklake as well.Our use case is essentially to partition ~6M elements identified by a business ID (with thousands of rows per ID). We need to batch‑access these elements, but there’s no particular business logic involved. So we currently use 200 buckets in Iceberg and are able to target a specific bucket when retrieving multiple elements.
A possible workaround would be to add a bucket column to the data (
hash(business_id) % number_of_buckets) and use anIdentityTransformon it. But this feels a bit hacky and hard to maintain — especially if the number of buckets needs to change.Thank you for the help!
Beta Was this translation helpful? Give feedback.
All reactions