You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
== Patching Airflow to stress-test DAG parsing using relevant environment variables
133
133
134
-
By default, Airflow runs database intialization routines on start-up.
135
-
These check that an Admin user exists and that the database schema is up-to-date.
136
-
Since they are idempotent and invoke little overhead, they can be run safely each time for most environments.
137
-
If, however, it makes sense to deactivate this, it can be turned off by patching the running cluster with a resource definition such as this:
138
-
139
-
[source,yaml]
140
-
----
141
-
---
142
-
apiVersion: airflow.stackable.tech/v1alpha1
143
-
kind: AirflowCluster
144
-
metadata:
145
-
name: airflow
146
-
spec:
147
-
clusterConfig:
148
-
databaseInitialization:
149
-
enabled: false # <1>
150
-
----
151
-
<1> Turn off the initialization routine by setting `databaseInitialization.enabled` to `false`
152
-
153
-
NOTE: The field `databaseInitialization.enabled` is `true` by default to be backwards-compatible.
154
-
A fresh Airflow cluster cannot be created with this field set to `false` as this results in missing metadata in the Airflow database.
155
-
156
-
WARNING: Setting `databaseInitialization.enabled` to `false` is an unsupported operation as subsequent updates to a running Airflow cluster can result in broken behaviour due to inconsistent metadata.
157
-
Only set `databaseInitialization.enabled` to `false` if you know what you are doing!
158
-
159
134
The demo also created a third DAG in the ConfigMap, called `dag_factory.py`, which was not mounted to the cluster and therefore does not appear in the UI.
160
135
This DAG can be used to create a number of individual DAGs on-the-fly, thus allowing a certain degree of stress-testing of the DAG scan/register steps (the generated DAGs themselves are trivial and so this approach will not really increase the burden of DAG _parsing_).
161
136
To include this in the list of DAGs (without removing the existing ones), an extra volumeMount is needed, as shown below.
The scheduled job runs every minute and so an instance of it may be running while the scheduler is being re-started as a result of the patch, in which case that instance may fail.
189
+
====
190
+
206
191
== Summary
207
192
208
193
This demo showed how DAGs can be made available for Airflow, scheduled, run and then inspected with the Webserver UI.
0 commit comments