Skip to content

Commit 2a0bbab

Browse files
committed
added patch section
1 parent 3c5bd7b commit 2a0bbab

File tree

1 file changed

+69
-0
lines changed

1 file changed

+69
-0
lines changed

docs/modules/demos/pages/airflow-scheduled-job.adoc

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,75 @@ asynchronously - and another to poll the running job to report on its status.
129129

130130
image::airflow-scheduled-job/airflow_11.png[]
131131

132+
== Patching Airflow to deactivate database initialization
133+
134+
By default, Airflow runs database intialization routines on start-up.
135+
These check that an Admin user exists and that the database schema is up-to-date.
136+
Since they are idempotent and invoke little overhead, they can be run safely each time for most environments.
137+
If, however, it makes sense to deactivate this, it can be turned off by patching the running cluster with a resource definition such as this:
138+
139+
[source,yaml]
140+
----
141+
---
142+
apiVersion: airflow.stackable.tech/v1alpha1
143+
kind: AirflowCluster
144+
metadata:
145+
name: airflow
146+
spec:
147+
clusterConfig:
148+
dbInit: false # <1>
149+
----
150+
<1> Turn off the initialization routine by setting `dbInit` to `false`
151+
152+
NOTE: The field `dbInit` is `true` by default to be backwards-compatible.
153+
A fresh Airflow cluster cannot be created with this field set to `false` as this results in missing metadata in the Airflow database.
154+
155+
The demo also created a third DAG in the ConfigMap, called `dag_factory.py`, which was not mounted to the cluster and therefore does not appear in the UI.
156+
This DAG can be used to create a number of individual DAGs on-the-fly, thus allowing a certain degree of stress-testing of the DAG scan/register steps (the generated DAGs themselves are trivial and so this approach will not really increase the burden of DAG _parsing_).
157+
To include this is in the list of DAGs (without removing the existing ones), an extra volumeMount is needed, as shown below.
158+
The patch also sets some environment variables that can used to change the frequency of certain operations. The descriptions can be found here: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html.
159+
160+
[source,yaml]
161+
----
162+
---
163+
apiVersion: airflow.stackable.tech/v1alpha1
164+
kind: AirflowCluster
165+
metadata:
166+
name: airflow
167+
spec:
168+
clusterConfig:
169+
dbInit: false
170+
volumeMounts:
171+
- name: airflow-dags
172+
mountPath: /dags/dag_factory.py
173+
subPath: dag_factory.py
174+
- name: airflow-dags
175+
mountPath: /dags/date_demo.py
176+
subPath: date_demo.py
177+
- name: airflow-dags
178+
mountPath: /dags/pyspark_pi.py
179+
subPath: pyspark_pi.py
180+
- name: airflow-dags
181+
mountPath: /dags/pyspark_pi.yaml
182+
subPath: pyspark_pi.yaml
183+
webservers:
184+
roleGroups:
185+
default:
186+
envOverrides: &envOverrides
187+
AIRFLOW__CORE__DAGS_FOLDER: "/dags"
188+
AIRFLOW__CORE__MIN_SERIALIZED_DAG_UPDATE_INTERVAL: "60"
189+
AIRFLOW__CORE__MIN_SERIALIZED_DAG_FETCH_INTERVAL: "60"
190+
AIRFLOW__DAG_PROCESSOR__MIN_FILE_PROCESS_INTERVAL: "60"
191+
AIRFLOW__DAG_PROCESSOR__PRINT_STATS_INTERVAL: "60"
192+
AIRFLOW_CONN_KUBERNETES_IN_CLUSTER: "kubernetes://?__extra__=%7B%22extra__kubernetes__in_cluster%22%3A+true%2C+%22extra__kubernetes__kube_config%22%3A+%22%22%2C+%22extra__kubernetes__kube_config_path%22%3A+%22%22%2C+%22extra__kubernetes__namespace%22%3A+%22%22%7D"
193+
kubernetesExecutors:
194+
envOverrides: *envOverrides
195+
schedulers:
196+
roleGroups:
197+
default:
198+
envOverrides: *envOverrides
199+
----
200+
132201
== Summary
133202

134203
This demo showed how DAGs can be made available for Airflow, scheduled, run and then inspected with the Webserver UI.

0 commit comments

Comments
 (0)