Skip to content

[Bug] If the number of objects in the schema = max_results_per_iter - dbt errors with Tables contain columns with the same names, but different types #1827

@jeremyyeo

Description

@jeremyyeo

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Which packages are affected?

  • dbt-adapters
  • dbt-tests-adapter
  • dbt-athena
  • dbt-athena-community
  • dbt-bigquery
  • dbt-postgres
  • dbt-redshift
  • dbt-snowflake
  • dbt-spark

Current Behavior

  1. dbt's list relations without caching macro will paginate if we hit 1000 tables in a schema.
  2. When we hit pagination, dbt will error with the error shared above.

Expected Behavior

No error.

Steps To Reproduce

  1. Project setup:
# dbt_project.yml
name: dbt_basic_5
profile: sf
version: "1.0.0"

models:
  dbt_basic_5:
    +materialized: table
  1. Create an empty schema:
drop schema db.sch;
create schema db.sch;
  1. Modify the list_relations_without_caching macro:
{% macro snowflake__list_relations_without_caching(schema_relation, max_iter=10000, max_results_per_iter=1) %}

    {%- if schema_relation is string -%}
        {%- set schema = schema_relation -%}
    {%- else -%}
        {%- set schema = schema_relation.include(identifier=False) -%}
    {%- endif -%}

    {%- set max_results_per_iter = adapter.config.flags.get('list_relations_per_page', max_results_per_iter) -%}
    {%- set max_iter = adapter.config.flags.get('list_relations_page_limit', max_iter) -%}
    {%- set too_many_relations_msg -%}
        dbt is currently configured to list a maximum of {{ max_results_per_iter * max_iter }} objects per schema.
        {{ schema }} exceeds this limit. If this is expected, you may configure this limit
        by setting list_relations_per_page and list_relations_page_limit in your project flags.
        It is recommended to start by increasing list_relations_page_limit.
    {%- endset -%}

    {%- set paginated_state = namespace(paginated_results=[], watermark=none) -%}

    {#-
        loop an extra time to catch the breach of max iterations
        Note: while range is 0-based, loop.index starts at 1
    -#}
    {%- for _ in range(max_iter + 1) -%}

        {#-
            raise a warning and break if we still didn't exit and we're beyond the max iterations limit
            Note: while range is 0-based, loop.index starts at 1
        -#}
        {%- if loop.index == max_iter + 1 -%}
            {%- do exceptions.warn(too_many_relations_msg) -%}
            {%- break -%}
        {%- endif -%}

        {%- set show_objects_sql = snowflake__show_objects_sql(schema, max_results_per_iter, paginated_state.watermark) -%}
        {%- set paginated_result = run_query(show_objects_sql) -%}
        {% do print(schema) %}
        {% do print(paginated_result) %}
        {%- do paginated_state.paginated_results.append(paginated_result) -%}
        {%- set paginated_state.watermark = paginated_result.columns.get('name').values()[-1] -%}

        {#- we got less results than the max_results_per_iter (includes 0), meaning we reached the end -#}
        {%- if (paginated_result | length) < max_results_per_iter -%}
            {%- break -%}
        {%- endif -%}

    {%- endfor -%}

    {#- grab the first table in the paginated results to access the `merge` method -#}
    {%- set agate_table = paginated_state.paginated_results[0] -%}
    {%- do return(agate_table.merge(paginated_state.paginated_results)) -%}

{% endmacro %}

^ I'm setting max_results_per_iter to 1 which is effectively - if we see 1 table in the schema, then we're going to paginate.

  1. Do a double run
$ dbt run && dbt run --debug
23:02:19  Running with dbt=1.11.7
23:02:20  Registered adapter: snowflake=1.11.3
23:02:20  Found 1 model, 523 macros
23:02:20  
23:02:20  Concurrency: 4 threads (target='ci')
23:02:20  
db.sch
| column          | data_type |
| --------------- | --------- |
| created_on      | Integer   |
| name            | Integer   |
| database_name   | Integer   |
| schema_name     | Integer   |
| kind            | Integer   |
| comment         | Integer   |
| cluster_by      | Integer   |
| rows            | Integer   |
| bytes           | Integer   |
| owner           | Integer   |
| retention_time  | Integer   |
| owner_role_type | Integer   |
| is_hybrid       | Integer   |
| is_dynamic      | Integer   |
| is_iceberg      | Integer   |
| is_interactive  | Integer   |

23:02:21  1 of 1 START sql table model sch.bar ........................................... [RUN]
23:02:22  1 of 1 OK created sql table model sch.bar ...................................... [SUCCESS 1 in 1.80s]
23:02:23  
23:02:23  Finished running 1 table model in 0 hours 0 minutes and 2.86 seconds (2.86s).
23:02:23  
23:02:23  Completed successfully
23:02:23  
23:02:23  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 NO-OP=0 TOTAL=1
23:02:24  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1142bf010>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11429b650>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1142d71d0>]}



23:02:24  Running with dbt=1.11.7
23:02:24  running dbt with arguments {'empty': 'False', 'version_check': 'True', 'no_print': 'None', 'profiles_dir': '/Users/jeremy/.dbt', 'log_format': 'default', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'warn_error_options': 'WarnErrorOptionsV2(error=[], warn=[], silence=[])', 'log_cache_events': 'False', 'write_json': 'True', 'indirect_selection': 'eager', 'use_colors': 'True', 'static_parser': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'fail_fast': 'False', 'introspect': 'True', 'target_path': 'None', 'warn_error': 'None', 'invocation_command': 'dbt run --debug', 'use_experimental_parser': 'False', 'send_anonymous_usage_stats': 'True', 'debug': 'True', 'quiet': 'False', 'printer_width': '80'}
23:02:24  Snowflake adapter: Setting snowflake.connector to ERROR (file logging only)
23:02:24  Snowflake adapter: Setting botocore to ERROR (file logging only)
23:02:24  Snowflake adapter: Setting boto3 to ERROR (file logging only)
23:02:25  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'a5210b3e-e2f0-4fae-9890-da8fe609d703', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1131e4250>]}
23:02:25  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'a5210b3e-e2f0-4fae-9890-da8fe609d703', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11288e1d0>]}
23:02:25  Registered adapter: snowflake=1.11.3
23:02:25  checksum: b80718358155fc17c3559352095564dd56ada1cc1a1c0ef2e62c857e71583775, vars: {}, profile: , target: , version: 1.11.7
23:02:25  Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
23:02:25  Nothing changed, skipping partial parsing.
23:02:25  Partial parsing enabled, no changes found, skipping parsing
23:02:25  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'a5210b3e-e2f0-4fae-9890-da8fe609d703', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1142c7890>]}
23:02:25  Wrote artifact WritableManifest to /Users/jeremy/git/dbt-basic/target/manifest.json
23:02:25  Wrote artifact SemanticManifest to /Users/jeremy/git/dbt-basic/target/semantic_manifest.json
23:02:25  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'a5210b3e-e2f0-4fae-9890-da8fe609d703', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x124f44550>]}
23:02:25  Found 1 model, 523 macros
23:02:25  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'a5210b3e-e2f0-4fae-9890-da8fe609d703', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1142cf910>]}
23:02:25  
23:02:25  Concurrency: 4 threads (target='ci')
23:02:25  
23:02:25  Acquiring new snowflake connection 'master'
23:02:25  Acquiring new snowflake connection 'list_db'
23:02:25  Using snowflake connection "list_db"
23:02:25  On list_db: show terse schemas in database db
    limit 10000
/* {"app": "dbt", "dbt_version": "1.11.7", "profile_name": "sf", "target_name": "ci", "connection_name": "list_db"} */
23:02:25  Opening a new connection, currently in state init
23:02:25  SQL status: SUCCESS 55 in 0.583 seconds
23:02:25  Re-using an available connection from the pool (formerly list_db, now list_db_sch)
23:02:25  Using snowflake connection "list_db_sch"
23:02:25  On list_db_sch: show objects in db.sch
    limit 1
    

/* {"app": "dbt", "dbt_version": "1.11.7", "profile_name": "sf", "target_name": "ci", "connection_name": "list_db_sch"} */;
23:02:26  SQL status: SUCCESS 1 in 0.152 seconds
db.sch
| column          | data_type |
| --------------- | --------- |
| created_on      | DateTime  |
| name            | Text      |
| database_name   | Text      |
| schema_name     | Text      |
| kind            | Text      |
| comment         | Text      |
| cluster_by      | Text      |
| rows            | Integer   |
| bytes           | Integer   |
| owner           | Text      |
| retention_time  | Text      |
| owner_role_type | Text      |
| is_hybrid       | Text      |
| is_dynamic      | Text      |
| is_iceberg      | Text      |
| is_interactive  | Text      |

23:02:26  Using snowflake connection "list_db_sch"
23:02:26  On list_db_sch: show objects in db.sch
    limit 1
    from 'BAR'

/* {"app": "dbt", "dbt_version": "1.11.7", "profile_name": "sf", "target_name": "ci", "connection_name": "list_db_sch"} */;
23:02:26  SQL status: SUCCESS 0 in 0.148 seconds
db.sch
| column          | data_type |
| --------------- | --------- |
| created_on      | Integer   |
| name            | Integer   |
| database_name   | Integer   |
| schema_name     | Integer   |
| kind            | Integer   |
| comment         | Integer   |
| cluster_by      | Integer   |
| rows            | Integer   |
| bytes           | Integer   |
| owner           | Integer   |
| retention_time  | Integer   |
| owner_role_type | Integer   |
| is_hybrid       | Integer   |
| is_dynamic      | Integer   |
| is_iceberg      | Integer   |
| is_interactive  | Integer   |

23:02:26  Snowflake adapter: Error running SQL: macro list_relations_without_caching
23:02:26  Snowflake adapter: Rolling back transaction.
23:02:26  Connection 'master' was properly closed.
23:02:26  Connection 'list_db_sch' was left open.
23:02:26  On list_db_sch: Close
23:02:26  
23:02:26  Finished running  in 0 hours 0 minutes and 1.13 seconds (1.13s).
23:02:26  Encountered an error:
Compilation Error
  Tables contain columns with the same names, but different types.
  
  > in macro list_relations_without_caching (macros/adapters/metadata.sql)
  > called by <Unknown>
23:02:26  Resource report: {"command_name": "run", "command_success": false, "command_wall_clock_time": 1.8282759, "process_in_blocks": "0", "process_kernel_time": 0.235137, "process_mem_max_rss": "227311616", "process_out_blocks": "0", "process_user_time": 1.308286}
23:02:26  Command `dbt run` failed at 16:02:26.352880 after 1.83 seconds
23:02:26  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10516e4d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1265e9610>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1142a37d0>]}
23:02:26  Flushing usage events
23:02:26  An error was encountered while trying to flush usage events

Relevant log output

Environment

- OS: macOS
- Python: 3.11
- dbt-adapters: 1.22.9
- dbt-snowflake: 1.11.3

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    triage:productIn Product's queuetype:bugSomething isn't working as documented

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions