-
Notifications
You must be signed in to change notification settings - Fork 295
[Bug] get_partition_batches exceeds ICEBERG_TOO_MANY_OPEN_PARTITIONS limit with bucket + non-bucket partitions #1783
Description
Is this a new bug?
- I believe this is a new bug
- I have searched the existing issues, and I could not find an existing issue for this bug
Which packages are affected?
- dbt-adapters
- dbt-tests-adapter
- dbt-athena
- dbt-athena-community
- dbt-bigquery
- dbt-postgres
- dbt-redshift
- dbt-snowflake
- dbt-spark
Current Behavior
When using get_partition_batches with both non-bucket partitions (e.g., month(date_col)) and bucket partitions (e.g., bucket(user_id, 10)), the macro can generate batches that exceed Athena's ICEBERG_TOO_MANY_OPEN_PARTITIONS limit.
The macro chunks non-bucket partitions and bucket IN clause values independently, both using athena_partitions_limit. Since the final batches are a cross-product of these two, a single batch can contain up to limit x limit open partitions.
For example, with 35 months and 100 bucket values at the default limit of 100:
- Non-bucket chunk: 100 partitions
- Bucket chunk: 100 values in
INclause - Actual open partitions per batch: 100 x 100 = 10,000 (limit is 100)
This causes ICEBERG_TOO_MANY_OPEN_PARTITIONS errors at query execution time.
Expected Behavior
Each batch should open at most partitions_limit partitions (default 100). The cross-product of non-bucket partitions and bucket IN values within a single batch should never exceed this limit.
Steps To Reproduce
- Create an Iceberg table with both non-bucket and bucket partitions:
config: materialized: incremental incremental_strategy: merge table_type: iceberg partitioned_by: - "month(date_col)" - "bucket(user_id, 10)" partitions_limit: 100
- Insert data spanning many months with many distinct
user_idvalues (e.g., 35 months x 100+ users) - Run an incremental model refresh
- Observe
ICEBERG_TOO_MANY_OPEN_PARTITIONSerror from Athena
Relevant log output
ICEBERG_TOO_MANY_OPEN_PARTITIONS: Too many open partitions: N. The limit is 100.Environment
- OS: macOS
- Python: 3.12.8
- dbt-adapters: 1.22.8
- dbt-athena: 1.9.4Additional Context
This was introduced by #1636, which restructured batch generation from OR-of-ANDs to AND-of-ORs to solve a query size problem (IN clause duplication exceeding Athena's 256KB limit). That PR correctly solved the size issue, but the new AND structure creates an implicit cross-product between non-bucket partition chunks and bucket IN clause chunks. Both use athena_partitions_limit as their chunk size independently, so a single batch can open up to limit x limit partitions instead of limit.