Skip to content

[Bug] get_partition_batches exceeds ICEBERG_TOO_MANY_OPEN_PARTITIONS limit with bucket + non-bucket partitions #1783

@dtaniwaki

Description

@dtaniwaki

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Which packages are affected?

  • dbt-adapters
  • dbt-tests-adapter
  • dbt-athena
  • dbt-athena-community
  • dbt-bigquery
  • dbt-postgres
  • dbt-redshift
  • dbt-snowflake
  • dbt-spark

Current Behavior

When using get_partition_batches with both non-bucket partitions (e.g., month(date_col)) and bucket partitions (e.g., bucket(user_id, 10)), the macro can generate batches that exceed Athena's ICEBERG_TOO_MANY_OPEN_PARTITIONS limit.

The macro chunks non-bucket partitions and bucket IN clause values independently, both using athena_partitions_limit. Since the final batches are a cross-product of these two, a single batch can contain up to limit x limit open partitions.

For example, with 35 months and 100 bucket values at the default limit of 100:

  • Non-bucket chunk: 100 partitions
  • Bucket chunk: 100 values in IN clause
  • Actual open partitions per batch: 100 x 100 = 10,000 (limit is 100)

This causes ICEBERG_TOO_MANY_OPEN_PARTITIONS errors at query execution time.

Expected Behavior

Each batch should open at most partitions_limit partitions (default 100). The cross-product of non-bucket partitions and bucket IN values within a single batch should never exceed this limit.

Steps To Reproduce

  1. Create an Iceberg table with both non-bucket and bucket partitions:
    config:
      materialized: incremental
      incremental_strategy: merge
      table_type: iceberg
      partitioned_by:
        - "month(date_col)"
        - "bucket(user_id, 10)"
      partitions_limit: 100
  2. Insert data spanning many months with many distinct user_id values (e.g., 35 months x 100+ users)
  3. Run an incremental model refresh
  4. Observe ICEBERG_TOO_MANY_OPEN_PARTITIONS error from Athena

Relevant log output

ICEBERG_TOO_MANY_OPEN_PARTITIONS: Too many open partitions: N. The limit is 100.

Environment

- OS: macOS
- Python: 3.12.8
- dbt-adapters: 1.22.8
- dbt-athena: 1.9.4

Additional Context

This was introduced by #1636, which restructured batch generation from OR-of-ANDs to AND-of-ORs to solve a query size problem (IN clause duplication exceeding Athena's 256KB limit). That PR correctly solved the size issue, but the new AND structure creates an implicit cross-product between non-bucket partition chunks and bucket IN clause chunks. Both use athena_partitions_limit as their chunk size independently, so a single batch can open up to limit x limit partitions instead of limit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    triage:productIn Product's queuetype:bugSomething isn't working as documented

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions