Commit 9023c78
committed
Implement AiiDA provenance tracking for Airflow via XCom backend and listeners
Add comprehensive AiiDA provenance graph creation for Airflow DAG executions:
XCom Backend (src/airflow_provider_aiida/xcom/backend.py):
- Custom XCom backend that creates AiiDA nodes during serialization/deserialization
- WorkChainNode represents entire DAG run (dag_id + run_id)
- CalcJobNode represents individual tasks (task_id + run_id + map_index)
- Data nodes store XCom values as AiiDA-typed data
- Establishes provenance links:
* CALL_CALC: WorkChain → CalcJob (workflow calls calculation)
* CREATE: CalcJob → Data (task produces output)
* INPUT_CALC: Data → CalcJob (data flows to consuming task)
- Smart link labeling:
* For PythonOperator: extracts parameter names via inspect.signature()
* For other operators: deterministic hash based on producer task info
- Handles duplicate link prevention and stored node constraints
- Monkey-patches link validation for stored nodes when necessary
Provenance Listener (src/airflow_provider_aiida/plugins/provenance_listener.py):
- Airflow listener plugin that updates AiiDA node states in real-time
- Hooks into DAG run lifecycle (success, failure, running)
- Hooks into task instance lifecycle (success, failure, running)
- Maps Airflow states to AiiDA ProcessStates:
* QUEUED/SCHEDULED → CREATED
* RUNNING → RUNNING
* SUCCESS → FINISHED
* FAILED → EXCEPTED
* SKIPPED → KILLED
- Creates nodes proactively if they don't exist (handles tasks starting before XCom)
- Registered as ProvenanceListenerPlugin for automatic discovery
Common utilities to handle aiida nodes(src/airflow_provider_aiida/common/utils.py):
- _get_or_create_workchain_node: Query by unique_id or create new WorkChainNode
- _get_or_create_calcjob_node: Query by unique_id or create new CalcJobNode
- _sanitize_link_label: Ensure AiiDA-compatible link labels (alphanumeric + underscore)
- All new nodes initialized with ProcessState.CREATED
Caveats:
- When deserializing no information about the input key is given, so an
educated guess has to be made which for the moment fails when maps are used
- on_dag_run_running is not called in test run environment, therefore the
workchain node is created in on_task_run_running function
- Because we have no guarantee from airflow for the order of callbacks (executed by the task instance) and xcom backend (executed by the scheduler) we have to make logic redundant in the xcom backend and listeners
Result: Complete AiiDA provenance graph mirroring Airflow DAG structure with
real-time state synchronization and proper data lineage tracking.1 parent 2100993 commit 9023c78
File tree
5 files changed
+1000
-4
lines changed- src/airflow_provider_aiida
- common
- plugins
- xcom
5 files changed
+1000
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
46 | 49 | | |
47 | 50 | | |
48 | 51 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
0 commit comments