-
-
Notifications
You must be signed in to change notification settings - Fork 3
Closed
stackabletech/demos
#291Description
Please talk to @adwk67 before starting work on this
Related to SUP-174 and SUP-199
At the moment, upgrades to the airflow database occur every time the scheduler starts up: see here and here. This can cause significant overhead, depending on the number and complexity of DAGs (as well as delays in the scheduler being available). This ticket will cover the following:
- replacing
airflow db upgrade
withairflow db migrate
(as the former has been deprecated) - making the migration on demand via e.g. a new field in the resource
- We need to make sure we don't introduce problems similar to the AirflowDB resource in the past (e.g. don't run a migration after a Airflow version bump), see Removed AirflowDB #322
(from below)
Proposal
- add a flag to bypass db migration and user creation, defaulting to true in both cases (so it's not breaking)
- extend the airflow demo using a DAG factory and setting
AIRFLOW__CORE__MIN_SERIALIZED_DAG_UPDATE_INTERVAL
to a different default setting (this is better than an integration test as it is somewhat of a corner-case, is difficult to verify in a test but simple to document as part of the demo)
CRD change
A new struct initializeDatabase
has been introduced as part of the cluster config:
---
apiVersion: airflow.stackable.tech/v1alpha1
kind: AirflowCluster
metadata:
name: airflow
spec:
clusterConfig:
databaseInitialization:
enabled: false # <1>
# future configs here
# <1> Turn off the initialization routine by setting this to `false`
The field databaseInitialization.enabled
is true
by default to be backwards-compatible.
A fresh Airflow cluster cannot be created with this field set to false
as this results in missing metadata in the Airflow database.
Metadata
Metadata
Assignees
Type
Projects
Status
Development: Done
Status
Done