Partition question #679
Unanswered
orennia-scott-wang
asked this question in
Q&A
Replies: 1 comment 1 reply
-
|
Sorry that you encountered this! Can you provide an example dataset and a script that will reproduce your issue? Also, what settings / config options are you using in your DuckLake? Thank you! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am encountering a strange issue with Ducklake partitioning and could use some insight.
My use case involves a large volume of time-series data that I am trying to partition by both year and month. When I partition by year only, the process works perfectly. In S3, it creates a folder (e.g., year=2024) containing a single, large data file (5GB–10GB).
However, when I use the exact same SQL but change the partitioning to year and month, the behavior changes significantly. While the folder structure is correct (e.g., year=2024/month=03), the data is split into thousands of tiny files (~250KB each) instead of a single large one. This fragmentation is drastically reducing query performance.
Why does adding a second partition column cause the data to splinter into small files, and how can I ensure it writes a single larger file per partition?
Configuration Comparison:
Working: ALTER TABLE my_table SET PARTITIONED BY ("year");
Issue: ALTER TABLE my_table SET PARTITIONED BY ("year", "month");
Beta Was this translation helpful? Give feedback.
All reactions