-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Affected module
Does it impact the UI, backend or Ingestion Framework?
Describe the bug
When running BigQuery sample data ingestion in OpenMetadata 1.11.2.0, sample data is generated correctly for some tables, but fails for many others.
The failure happens when the sampler generates a SELECT statement that explicitly lists columns which BigQuery reports as missing, invalid, or not resolvable at query time.
To Reproduce
Screenshots or steps to reproduce
Expected behavior
Sample data ingestion should not fail entirely for a table when:
A column no longer exists
A view has an invalid or outdated column
BigQuery cannot resolve a column at runtime
The sampler should either:
Skip invalid columns automatically, or
Fall back to SELECT * where supported, or
Gracefully skip sample data for the table without repeated stack traces
Currently, valid tables and views are skipped unnecessarily due to strict column-level SELECT generation.
I couldn’t upload the full logs because the file size exceeds GitHub’s 65 MB limit. However, the errors below clearly show the cause of the issue and were consistently observed during the ingestion run:
Not found: Table warehouse-390509:nwre_warehouse_bigQ.delivery_valorisation_nwre was not found in location EU
Name interco_import_cross_border_opening_mw_50Hz not found inside dbt_intraday_analysis_nordpool
Unrecognized name: Updated_date; failed to parse view
No matching signature for operator = for argument types: STRING, INT64
Version:
- OS: [e.g. iOS]
- Python version:
- OpenMetadata version: [1.11.2]
- OpenMetadata Ingestion package version: [1.11.2.0]
Additional context
This does not affect all tables
The same ingestion run successfully generates sample data for other BigQuery tables
The issue appears only when the sampler enumerates columns that BigQuery rejects at query time
The errors originate from:
metadata/sampler/sqlalchemy/sampler.py
metadata/sampler/sampler_interface.py
The problem is reproducible across multiple datasets and schemas
Metadata
Metadata
Assignees
Labels
Type
Projects
Status