Skip to content

BigQuery sampler fails on tables and views due to invalid columns in generated SELECT statements #24932

@aimendenche

Description

@aimendenche

Affected module
Does it impact the UI, backend or Ingestion Framework?

Describe the bug
When running BigQuery sample data ingestion in OpenMetadata 1.11.2.0, sample data is generated correctly for some tables, but fails for many others.

The failure happens when the sampler generates a SELECT statement that explicitly lists columns which BigQuery reports as missing, invalid, or not resolvable at query time.

To Reproduce

Screenshots or steps to reproduce

Expected behavior

Sample data ingestion should not fail entirely for a table when:

A column no longer exists

A view has an invalid or outdated column

BigQuery cannot resolve a column at runtime

The sampler should either:

Skip invalid columns automatically, or

Fall back to SELECT * where supported, or

Gracefully skip sample data for the table without repeated stack traces

Currently, valid tables and views are skipped unnecessarily due to strict column-level SELECT generation.

I couldn’t upload the full logs because the file size exceeds GitHub’s 65 MB limit. However, the errors below clearly show the cause of the issue and were consistently observed during the ingestion run:

Not found: Table warehouse-390509:nwre_warehouse_bigQ.delivery_valorisation_nwre was not found in location EU

Name interco_import_cross_border_opening_mw_50Hz not found inside dbt_intraday_analysis_nordpool

Unrecognized name: Updated_date; failed to parse view

No matching signature for operator = for argument types: STRING, INT64

Version:

  • OS: [e.g. iOS]
  • Python version:
  • OpenMetadata version: [1.11.2]
  • OpenMetadata Ingestion package version: [1.11.2.0]

Additional context
This does not affect all tables

The same ingestion run successfully generates sample data for other BigQuery tables

The issue appears only when the sampler enumerates columns that BigQuery rejects at query time

The errors originate from:

metadata/sampler/sqlalchemy/sampler.py

metadata/sampler/sampler_interface.py

The problem is reproducible across multiple datasets and schemas

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions