-
-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Description / Context
The DAG file processor runs at regular intervals to parse and prepare python files ready for execution.
Airflow 2.x
This can be started automatically by default as a subprocess as part of the command airflow scheduler,
or run as a stand-alone process by setting AIRFLOW__SCHEDULER__STANDALONE_DAG_PROCESSOR
Airflow 3.x
Always runs as a standalone process, to be started with airflow dag-processor
.
Problem
As of 25.7.0 and Airflow 3.x, the Airflow operator runs the dag-processor as a standalone process, but as part of the scheduler role (and statefulset). This has a number of drawbacks:
- if the dag-processor crashes it is not detected and does not trigger a re-start automatically
- resources can not be set independently for the scheduler and the dag-processor
- logging levels can not be set independently for the scheduler and the dag-processor
Proposal
Introduce a new optional role to go alongside the existing ones:
- webservers
- schedulers
- celeryExecutors / kubernetesExecutors
- dagProcessors
This new role will encapsulate a command that runs airflow dag-processor
.
The CRD would look something like this:
---
apiVersion: airflow.stackable.tech/v1alpha1
kind: AirflowCluster
metadata:
name: airflow
spec:
...
webservers:
...
celeryExecutors:
...
schedulers:
...
dagProcessors:
roleGroups:
default:
replicas: 1
Note the following:
airflow dag-processor
cannot be started as part of the scheduler in Airflow 3.x- so it will be either (still) included as a specific command in the scheduler pod, or moved to its own pod, depending on whether a
dagProcessors
role has been specified
- so it will be either (still) included as a specific command in the scheduler pod, or moved to its own pod, depending on whether a
airflow dag-processor
should default to being a scheduler subprocess in Airflow 2.x- so users can include a
dagProcessors
role, which will causeAIRFLOW__SCHEDULER__STANDALONE_DAG_PROCESSOR
to be set toTrue
- so users can include a
Metadata
Metadata
Assignees
Labels
No labels