Skip to content

Airflow 3.x: split DAG-Processor service into its own role. #637

@adwk67

Description

@adwk67

Description / Context

The DAG file processor runs at regular intervals to parse and prepare python files ready for execution.

Airflow 2.x

This can be started automatically by default as a subprocess as part of the command airflow scheduler, or run as a stand-alone process by setting AIRFLOW__SCHEDULER__STANDALONE_DAG_PROCESSOR

Airflow 3.x

Always runs as a standalone process, to be started with airflow dag-processor.

Problem

As of 25.7.0 and Airflow 3.x, the Airflow operator runs the dag-processor as a standalone process, but as part of the scheduler role (and statefulset). This has a number of drawbacks:

  • if the dag-processor crashes it is not detected and does not trigger a re-start automatically
  • resources can not be set independently for the scheduler and the dag-processor
  • logging levels can not be set independently for the scheduler and the dag-processor

Proposal

Introduce a new optional role to go alongside the existing ones:

  • webservers
  • schedulers
  • celeryExecutors / kubernetesExecutors
  • dagProcessors

This new role will encapsulate a command that runs airflow dag-processor.

The CRD would look something like this:

---
apiVersion: airflow.stackable.tech/v1alpha1
kind: AirflowCluster
metadata:
  name: airflow
spec:
  ...
  webservers:
    ...
  celeryExecutors:
    ...
  schedulers:
    ...
  dagProcessors:
    roleGroups:
      default:
        replicas: 1

Note the following:

  • airflow dag-processor cannot be started as part of the scheduler in Airflow 3.x
    • so it will be either (still) included as a specific command in the scheduler pod, or moved to its own pod, depending on whether a dagProcessors role has been specified
  • airflow dag-processor should default to being a scheduler subprocess in Airflow 2.x
    • so users can include a dagProcessors role, which will cause AIRFLOW__SCHEDULER__STANDALONE_DAG_PROCESSOR to be set to True

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions