-
Notifications
You must be signed in to change notification settings - Fork 207
Open
Labels
pkg:dbt-athenaIssue affects dbt-athenaIssue affects dbt-athenatype:bugSomething isn't working as documentedSomething isn't working as documentedtype:regressionSomething used to work and is no longer workingSomething used to work and is no longer working
Description
Is this a regression?
- I believe this is a regression in functionality
- I have searched the existing issues, and I could not find an existing issue for this regression
Which packages are affected?
- dbt-adapters
- dbt-tests-adapter
- dbt-athena
- dbt-athena-community
- dbt-bigquery
- dbt-postgres
- dbt-redshift
- dbt-snowflake
- dbt-spark
Current Behavior
The table type for AWS AppFlow tables is not parsed correctly, even though it used to work.
Expected/Previous Behavior
The table type for AWS AppFlow tables is parsed correctly. They should be of the TableType.TABLE type.
Steps To Reproduce
- Create a Glue table using AppFlow, Salesforce integration.
- Use that table as a dbt source
- name: salesforce
schema: landing_zone
tables:
- name: foo_bar
identifier: sf_appflow_sf_foo_bar_1739369457_latest- Run
dbt docs generate - Expected error
21:17:54 dbt encountered 1 failure while writing the catalog
Relevant log output
21:17:08 Running with dbt=1.9.3
21:17:08 Registered adapter: athena=1.9.2
21:17:09 Found 435 models, 382 data tests, 54 seeds, 155 sources, 21 exposures, 643 macros
21:17:09
21:17:09 Concurrency: 16 threads (target='dev')
21:17:09
21:17:40 Building catalog
21:17:54 Encountered an error while generating catalog: Table type cannot be None for table redacted.landing_zone.sf_appflow_sf_foo_bar_1739369457_latest
21:17:54 dbt encountered 1 failure while writing the catalog
21:17:54 Catalog written to /Users/redacted/repos/redacted/dbt_elt/target/catalog.jsonEnvironment
❯ dbt debug
21:21:20 Running with dbt=1.9.3
21:21:20 dbt version: 1.9.3
21:21:20 python version: 3.10.16
21:21:20 python path: /Users/redacted/repos/redacted/dbt_elt/.venv/bin/python3
21:21:20 os info: macOS-15.4-arm64-arm-64bit
21:21:20 Using profiles dir at /Users/redacted/repos/redacted/dbt_elt
21:21:20 Using profiles.yml file at /Users/redacted/repos/redacted/dbt_elt/profiles.yml
21:21:20 Using dbt_project.yml file at /Users/redacted/repos/redacted/dbt_elt/dbt_project.yml
21:21:20 adapter type: athena
21:21:20 adapter version: 1.9.2
21:21:20 Configuration:
21:21:20 profiles.yml file [OK found and valid]
21:21:20 dbt_project.yml file [OK found and valid]
21:21:20 Required dependencies:
21:21:20 - git [OK found]
21:21:20 Connection:
21:21:20 s3_staging_dir: s3://redacted/staging_curated_zones/redacted/athena_query_results
21:21:20 work_group: None
21:21:20 skip_workgroup_check: False
21:21:20 region_name: us-east-1
21:21:20 database: awsdatacatalog
21:21:20 schema: curated_zone_redacted
21:21:20 poll_interval: 1.0
21:21:20 aws_profile_name: None
21:21:20 aws_access_key_id: None
21:21:20 endpoint_url: None
21:21:20 s3_data_dir: s3://redacted/staging_curated_zones/redacted
21:21:20 s3_data_naming: schema_table_unique
21:21:20 s3_tmp_table_dir: None
21:21:20 debug_query_state: False
21:21:20 seed_s3_upload_args: None
21:21:20 lf_tags_database: None
21:21:20 spark_work_group: None
21:21:20 Registered adapter: athena=1.9.2
21:21:22 Connection test: [OK connection ok]
21:21:22 All checks passed!Additional Context
I isolated the issue, and I think I know when the regression was introduced:
Repro code (I copy pasted get_table_type from dbt/adapters/athena/relation.py into a script and feed it with a AppFlow generated table):
import json
from enum import Enum
import boto3
glue_client = boto3.client("glue")
class TableType(Enum):
TABLE = "table"
VIEW = "view"
CTE = "cte"
MATERIALIZED_VIEW = "materializedview"
ICEBERG = "iceberg_table"
def is_physical(self) -> bool:
return self in [TableType.TABLE, TableType.ICEBERG]
RELATION_TYPE_MAP = {
"EXTERNAL_TABLE": TableType.TABLE,
"EXTERNAL": TableType.TABLE, # type returned by federated query tables
"GOVERNED": TableType.TABLE,
"MANAGED_TABLE": TableType.TABLE,
"VIRTUAL_VIEW": TableType.VIEW,
"table": TableType.TABLE,
"view": TableType.VIEW,
"cte": TableType.CTE,
"materializedview": TableType.MATERIALIZED_VIEW,
}
def get_table_type(table):
table_full_name = ".".join(
filter(None, [table.get("CatalogId"), table.get("DatabaseName"), table["Name"]])
)
input_table_type = table.get("TableType")
if input_table_type and input_table_type not in RELATION_TYPE_MAP:
raise ValueError(
f"Table type {table['TableType']} is not supported for table {table_full_name}"
)
if table.get("Parameters", {}).get("table_type", "").lower() == "iceberg":
_type = TableType.ICEBERG
elif not input_table_type:
raise ValueError(f"Table type cannot be None for table {table_full_name}")
else:
_type = RELATION_TYPE_MAP[input_table_type]
print(f"table_name : {table_full_name}")
print(f"table type : {_type}")
return _type
if __name__ == "__main__":
table = glue_client.get_table(
DatabaseName="landing_zone",
Name="sf_appflow_sf_foo_bar_1739369457_latest",
)
print(json.dumps(table, indent=2, default=str))
table_type = get_table_type(table["Table"])
print(f"Table type", table_type)❯ python -m repro
{
"Table": {
"Name": "sf_appflow_sf_foo_bar_1739369457_latest",
"DatabaseName": "landing_zone",
"CreateTime": "2025-02-12 09:10:58-05:00",
"UpdateTime": "2025-04-17 16:01:01-04:00",
"Retention": 0,
"StorageDescriptor": {
"Columns": [
{
"Name": "id",
"Type": "string",
"Parameters": {
"AppFlowLabel": "Record ID",
"AppFlowDescription": "Record ID"
}
},
{
"Name": "ownerid",
"Type": "string",
"Parameters": {
"AppFlowLabel": "Owner ID",
"AppFlowDescription": "Owner ID"
}
},
...
],
"Location": "s3://redacted/landing_zone/sf/appflow/sf_foo_bar/schemaVersion_2/acb94e86-a77a-33a8-97d4-84765bcd4739/",
"InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
"Compressed": false,
"NumberOfBuckets": 0,
"SerdeInfo": {
"SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
"Parameters": {
"skip.header.line.count": "1"
}
},
"SortColumns": [],
"StoredAsSubDirectories": false
},
"PartitionKeys": [],
"Parameters": {
"sourceConnectorObject": "foo_bar__c",
"sourceConnectorName": "salesforce",
"appflowName": "sf_foo_bar",
"appflowDescription": "",
"createdBy": "appflow.amazonaws.com",
"appflowARN": "",
"classification": "PARQUET"
},
"CreatedBy": "arn:aws:sts::redacted:assumed-role/appflow-glue-catalog/SandstoneMRS-455df3aa-c9c5-48bd-86b4-6757fcb6d95f",
"IsRegisteredWithLakeFormation": false,
"CatalogId": "redacted",
"VersionId": "1541",
"IsMultiDialectView": false
},
"ResponseMetadata": {
"RequestId": "0311a58f-bbfb-49ff-8870-ecbd3a3da715",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"date": "Thu, 17 Apr 2025 20:59:41 GMT",
"content-type": "application/x-amz-json-1.1",
"content-length": "22350",
"connection": "keep-alive",
"x-amzn-requestid": "0311a58f-bbfb-49ff-8870-ecbd3a3da715",
"cache-control": "no-cache"
},
"RetryAttempts": 0
}
}
Traceback (most recent call last):
File "/opt/homebrew/Cellar/[email protected]/3.10.16/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/homebrew/Cellar/[email protected]/3.10.16/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/redacted/repos/redacted/dbt_elt/repro.py", line 61, in <module>
table_type = get_table_type(table["Table"])
File "/Users/redacted/repos/redacted/dbt_elt/repro.py", line 45, in get_table_type
raise ValueError(f"Table type cannot be None for table {table_full_name}")
ValueError: Table type cannot be None for table 854713690974.landing_zone.sf_appflow_sf_foo_bar_1739369457_latestThe code used to default to EXTERNAL_TABLE, and in consequence the mapping used to yield TableType.TABLE type. Related PR: dbt-labs/dbt-athena#661.
cc: @svdimchenko
bryansoftdev and antonysouthworth-halter
Metadata
Metadata
Assignees
Labels
pkg:dbt-athenaIssue affects dbt-athenaIssue affects dbt-athenatype:bugSomething isn't working as documentedSomething isn't working as documentedtype:regressionSomething used to work and is no longer workingSomething used to work and is no longer working