A customer reported that, from time to time, instances of DatabricksNotebookOperator are stuck in a running state in Airflow while being completed on Databricks.
The logs need to explain what the Databricks job is trying to use - they are empty.
While checking our code, I noticed that the implementation could be improved.
https://github.com/astronomer/astro-provider-databricks/blob/3e1ca039a024a98f9079d178478aa24702e15453/src/astro_databricks/operators/notebook.py#L235C1-L238C64
The implementation seems to have been improved in our contribution to Airflow
apache/airflow#39178
In:
https://github.com/astronomer/airflow/blob/20dacc7cec64d0055fad79943fd6afa453dbe775/airflow/providers/databricks/operators/databricks.py#L1038-L1063
Since this affects an Astronomer customer and we have not completed the migration yet, my suggestion is that:
- We give visibility of what is happening in the Airflow worker node by logging something like "Waiting for the job to complete, current status: PENDING"
- We make the implementation of polling the status of the job consistent with what we have contributed to Airflow.