Hybrid model using XGBoost and LSTM to forecast natural gas spot prices.
This project applies a combination of machine learning and deep learning techniques to predict natural gas prices based on historical market data. It integrates ARIMA, XGBoost, and LSTM models to capture both linear and non-linear temporal dependencies in multivariate time series data.
Accurate forecasting of natural gas prices is crucial for energy market analysis, trading strategies, and risk management.
This project explores a hybrid approach that combines the strengths of different forecasting models:
- ARIMA for linear time dependencies
- XGBoost for capturing non-linear relationships
- LSTM for learning long-term temporal dependencies
The ensemble model improves prediction accuracy and generalization across test datasets.
- Develop and fine-tune ARIMA, XGBoost, and LSTM models for multivariate time series forecasting.
- Engineer lag-based temporal features to capture short- and long-term dependencies.
- Apply walk-forward validation to ensure robust model evaluation.
- Optimize hyperparameters to achieve low RMSE and MAE on the test set.
| Category | Tools / Libraries |
|---|---|
| Language | Python |
| Data Analysis | Pandas, NumPy |
| Machine Learning | XGBoost, Scikit-learn |
| Deep Learning | TensorFlow / Keras (LSTM) |
| Time Series Modeling | ARIMA (Statsmodels) |
| Visualization | Matplotlib, Seaborn |
| Environment | Jupyter Notebook |
- File:
gasprice.xls - Description: Historical natural gas price data including variables such as date, demand, supply, and spot price.
- Preprocessing: Missing value handling, feature scaling, and lag feature generation were performed before model training.
-
Data Preprocessing
- Cleaning and normalization of input features
- Lag and rolling window feature engineering
-
Exploratory Data Analysis (EDA)
- Time series decomposition and correlation analysis
- Visualization of seasonal and trend components
-
Model Development
- Train ARIMA for baseline forecasting
- Implement XGBoost for feature-driven regression
- Build LSTM networks for sequential data learning
-
Hybrid Ensemble
- Combine XGBoost and LSTM outputs for final prediction
- Perform weighted averaging based on validation metrics
-
Evaluation
- Metrics: Root Mean Square Error (RMSE) and Mean Absolute Error (MAE)
- Cross-validation using walk-forward approach
- The hybrid XGBoost + LSTM model achieved lower RMSE and MAE compared to individual models.
- Incorporating lag-based features and walk-forward validation improved temporal generalization.
- Demonstrated robust forecasting performance across multiple test periods.