This repository provides a Docker Compose configuration to set up an Apache Airflow cluster using the CeleryExecutor. It includes additional services like Redis, PostgreSQL, MySQL, Adminer, Jaeger, OpenTelemetry Collector, Prometheus, and Grafana to enhance functionality, monitoring, and tracing capabilities.
This setup is intended for local development and testing purposes. It is not recommended for production deployments. The configuration allows you to experiment with Airflow and its integrations with various monitoring and tracing tools.
Docker installed on your machine Docker Compose installed (if not included with Docker)
The Docker Compose configuration includes the following services:
- airflow-webserver: The web interface for Apache Airflow.
- airflow-scheduler: Monitors and triggers scheduled tasks.
- airflow-worker: Executes the tasks queued by the scheduler.
- airflow-triggerer: Handles deferred tasks.
- airflow-init: Initializes the Airflow environment (database migrations, user creation).
- postgres: PostgreSQL database used by Airflow for metadata storage.
- mysql: MySQL database for testing purposes (e.g., sample datasets).
- Message Broker
- redis: In-memory data store used as a message broker for CeleryExecutor.
- adminer: Web-based database management tool for MySQL.
- jaeger: Distributed tracing system for microservices.
- hotrod: A sample application to generate tracing data for Jaeger.
- otel-collector: OpenTelemetry Collector to process and export telemetry data.
- prometheus: Monitoring system to collect metrics from the services.
- grafana: Analytics platform to visualize data collected by Prometheus.
- Start the containers by running
docker-compose up
. wait for the process to complete - Go to the Apache Airflow dashboard at localhost:8080. Use
airflow
as the username and password.
- Click
etl_pipeline
DAG and run it. This should send a trace to Jaeger via OpenTelemetry - Go to (Jaeger UI)[http://localhost:16686/] and select
my-helloworld-service
if not selected already. - Click
Find Traces
. A trace should appear.
- The DAG
sleep_random
is scheduled to run every second. So it would already be sending tracing to Promethus - Go to the (Grafana UI)[http://localhost:23000/] and click the
+
button. - Add an empty panel.
- Change the data source to Promethus
- Click the
metrics browse
- select
airflow_dagrun_duration_success_sleep_random{}
- This will plot a graph which could be added to the dashboard
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.