A lightweight solution to automatically clean up old Airflow log files, helping you save disk space and keep your Airflow environment tidy. This repository provides both a standalone script and an Airflow DAG for automated log cleanup.
Airflow generates a large volume of log files over time, which can quickly consume disk space. Manually cleaning these logs is tedious and error-prone. This tool automates the process by:
- Preventing disk space issues: Automatically removes old logs before they fill up your disk
- Reducing maintenance overhead: Set it and forget it with the included Airflow DAG
- Keeping your environment clean: Removes rotated logs and empty directories automatically
- Configurable retention: Adjust how long to keep logs based on your needs
- Deletes rotated
dag_processor_managerlogs (e.g.,dag_processor_manager.log.1,.2, etc.) - Removes log files older than 7 days (configurable)
- Cleans up empty directories left after log deletion
- Can be run as a standalone script or as an Airflow DAG
- Safe and reliable: Includes error handling for file operations
- Python 3.8+
- Apache Airflow (any version that supports PythonOperator)
AIRFLOW_HOMEenvironment variable must be set
You can run cleanup_logs.py directly to clean up logs:
python cleanup_logs.pyNote: The AIRFLOW_HOME environment variable must be set to your Airflow home directory.
The cleanup_logs_dag.py file defines a DAG that runs the cleanup task daily at 3 AM.
Setup Steps:
- Copy
cleanup_logs_dag.pyto your Airflow DAGs folder (typically$AIRFLOW_HOME/dags/) - Ensure the
AIRFLOW_HOMEenvironment variable is set for your Airflow environment - The DAG will appear in the Airflow UI as
cleanup_logs_dag - Enable the DAG in the Airflow UI to start automatic cleanup
DAG Schedule: Daily at 3:00 AM (configurable in the DAG definition)
By default, logs older than 7 days are deleted. You can change this by modifying the MILLISECONDS_TO_KEEP constant in the scripts:
# In cleanup_logs.py or cleanup_logs_dag.py
MILLISECONDS_TO_KEEP = 7 * 86400 * 1000 # 7 days in milliseconds
# Example: Change to 14 days
MILLISECONDS_TO_KEEP = 14 * 86400 * 1000 # 14 daysTo change when the cleanup runs, modify the schedule_interval in cleanup_logs_dag.py:
# Current: Daily at 3 AM
schedule_interval="0 3 * * *"
# Example: Daily at midnight
schedule_interval="0 0 * * *"
# Example: Weekly on Sundays at 2 AM
schedule_interval="0 2 * * 0"- Rotated Log Cleanup: Identifies and deletes rotated
dag_processor_managerlogs (.log.1,.log.2, etc.) - Old File Cleanup: Recursively scans the logs directory and removes files older than the retention period
- Empty Directory Cleanup: Removes directories that become empty after file deletion (excluding
dag_processor_manager)
The script includes error handling for:
- Missing directories
- Permission errors
- File system errors
- Missing
AIRFLOW_HOMEenvironment variable
All errors are logged to the console without stopping the cleanup process.
Please read our Contributing Guide for details on our code of conduct, development setup, and the process for submitting pull requests.
If you find this useful, please consider:
- ⭐ Starring the repository on GitHub to help others discover it.
- 💖 Sponsoring to support ongoing maintenance and development.
Become a Sponsor on GitHub | Support on Patreon
MIT License - see the LICENSE file for details.
Y. Siva Sai Krishna
- GitHub: @ysskrishna
- LinkedIn: ysskrishna
