This project implements time series forecasting for PM2.5 air quality data from Indore using three different boosting algorithms. The goal is to predict future PM2.5 levels to help monitor and understand air quality trends.
The project uses the indore_air_quality_date_pm2_5.csv dataset containing:
- Date: Hourly timestamps from August 2022 onwards
- PM2.5: Particulate matter concentration (μg/m³)
- Total Records: 28,730+ data points
- Notebook:
gradientboost.ipynb - Features: Time-based feature extraction (year, month, day, hour)
- Model: Sklearn's GradientBoostingRegressor
- Forecast Period: Next month prediction
- Notebook:
xgboost.ipynb - Features: Advanced gradient boosting with regularization
- Model: XGBoost Regressor
- Optimization: Hyperparameter tuning for better performance
- Notebook:
lightGDM.ipynb - Features: Fast, distributed, high-performance gradient boosting
- Model: LightGBM Regressor
- Advantages: Memory efficient and faster training
Each algorithm follows a similar workflow:
-
Data Preprocessing
- Convert date column to datetime objects
- Set date as DataFrame index
- Extract time-based features (year, month, day, hour)
- Handle missing values in PM2.5 data
-
Feature Engineering
- Time-based feature extraction
- Lag features for temporal dependencies
- Cyclical encoding for seasonal patterns
-
Model Training
- Split data into training and testing sets
- Train the respective boosting algorithm
- Hyperparameter optimization
-
Forecasting & Visualization
- Generate predictions for the next month
- Create comprehensive visualizations
- Compare actual vs predicted values
- Time Series Analysis: Comprehensive temporal feature extraction
- Multiple Algorithms: Comparison of three state-of-the-art boosting methods
- Visualization: Clear, informative forecast plots
- Scalability: Efficient handling of large time series datasets
- Python: Core programming language
- Pandas: Data manipulation and analysis
- NumPy: Numerical computations
- Scikit-learn: Machine learning library (Gradient Boosting)
- XGBoost: Extreme gradient boosting framework
- LightGBM: Microsoft's gradient boosting framework
- Matplotlib/Seaborn: Data visualization
- Jupyter Notebooks: Interactive development environment
pandas
numpy
scikit-learn
xgboost
lightgbm
matplotlib
seaborn
jupyter- Clone the repository
- Install required dependencies
- Run the notebooks:
gradientboost.ipynb- Gradient Boosting implementationxgboost.ipynb- XGBoost implementationlightGDM.ipynb- LightGBM implementation
The forecast visualizations show:
- Historical PM2.5 trends
- Model predictions for the next month
- Confidence intervals (where applicable)
- Performance metrics for model evaluation
Each algorithm demonstrates different strengths in capturing temporal patterns and providing accurate air quality forecasts for environmental monitoring and public health applications.
- Environmental Monitoring: Track air quality trends
- Public Health: Early warning systems for pollution levels
- Policy Making: Data-driven environmental regulations
- Research: Understanding air pollution patterns
This project demonstrates the application of advanced machine learning techniques to real-world environmental data, providing valuable insights for air quality management and public health protection.



