Implement BigQueryStreamingBufferEmptySensor to handle DML operations on streaming tables#61148
Implement BigQueryStreamingBufferEmptySensor to handle DML operations on streaming tables#61148radhwene wants to merge 17 commits intoapache:mainfrom
Conversation
- Add missing execute_complete() callback - Support deprecated polling_interval parameter - Pass poll_interval and hook_params to trigger for consistency
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
…r better readability
Thanks for your contribution! |
There was a problem hiding this comment.
Almost there! I wanted to merge - but I've just realized that:
- There are no unit tests for the new operators - could you please add for each, including checking exceptions? It's essential as a fast indicator for regressions. We currently cannot run the system tests in the automated CI, and even if we could - it would be better to detect regressions in earlier stage.
- We try to reduce usage of
AirflowExceptionoverall - could you please change the exceptions to Python built-in exceptions? (e.g.,ValueError, etc.)
If you could please take care of the above, I'll happily approve and merge.
Asked to add unit tests + changing AirflowException to native Python exceptions
Implement BigQueryStreamingBufferEmptySensor to handle DML operations on streaming tables. The sensor polls BigQuery table metadata to check if the streaming buffer is empty before proceeding with DML operations. - New sensor: BigQueryStreamingBufferEmptySensor (sync + deferrable) - New trigger: BigQueryStreamingBufferEmptyTrigger (async polling) - Unit tests for both sensor and trigger - Documentation and system test examples Fixes apache#59408
d2e7ef8 to
66d7c2e
Compare
|
Hi @shahar1 , Sensor tests (
|
| Test | Description |
|---|---|
test_poke_table_not_found |
Raises AirflowException when the table doesn't exist |
test_poke_raises_on_unexpected_error |
Re-raises unexpected exceptions from the BigQuery client |
test_execute_complete_no_event |
Raises AirflowException when no event is received in the trigger callback |
test_execute_complete_error_event |
Raises AirflowException when trigger returns an error status |
Trigger tests (test_bigquery.py)
| Test | Description |
|---|---|
test_run_raises_on_table_not_found |
Yields error event when table returns 404 |
test_run_raises_on_exception |
Yields error event on unexpected exceptions |
test_is_streaming_buffer_empty_table_not_exists |
Raises AirflowException when table response is empty |



Implement BigQueryStreamingBufferEmptySensor to handle DML operations on streaming tables
Fixes #59408
Problem
When using BigQuery DML operators (UPDATE, DELETE, MERGE) on tables with active streaming buffers, tasks fail with:
This is a documented BigQuery limitation. Currently, Airflow has no built-in mechanism to wait for the buffer to flush before executing DML operations, causing repeated failures until it eventually clears (within 90 minutes per Google Cloud documentation).
Solution
This PR implements
BigQueryStreamingBufferEmptySensor- a composable sensor that allows users to explicitly wait for a BigQuery table's streaming buffer to empty before proceeding with DML operations.This aligns with Airflow's design philosophy by providing:
Changes
1. New Sensor:
BigQueryStreamingBufferEmptySensorBigQueryTableExistenceSensorimplementation2. New Trigger:
BigQueryStreamingBufferEmptyTrigger3. Documentation & Examples
bigquery.rstUsage Example