Energy Consumption Time Series Forecasting

A comprehensive comparison of time series forecasting techniques applied to hourly energy consumption data, from classical statistical models (ARIMA, SARIMAX) to modern approaches (Prophet, N-HiTS). This project demonstrates the evolution and performance differences between traditional and state-of-the-art forecasting methods.

Dataset

The dataset used is the Kaggle Hourly Energy Consumption dataset, specifically the PJME (PJM East) region.

Dataset Characteristics:

145,392 hourly observations from January 2002 to August 2018
Original columns: Datetime, PJME_MW (consumption in megawatts)
Strong seasonality patterns: hourly, daily, weekly, and annual cycles
Clear seasonal patterns with higher consumption during summer (air conditioning) and winter (heating) months

Data Preprocessing

The dataset underwent thorough cleaning and feature engineering to create a robust foundation for modeling:

Cleaning Steps:

Duplicate removal: Eliminated duplicate timestamps
Missing value handling: Interpolated gaps up to 5 consecutive hours using time-based interpolation
Reindexing: Created complete hourly grid from 2002-01-01 00:00 to 2018-08-03 05:00
DST correction: Handled days with 23 or 25 hours due to Daylight Saving Time transitions

Feature Engineering:

Hour (0-23): Hour of the day
Month (1-12): Month of the year
DayOfWeek (0-6): Day of the week (Monday=0, Sunday=6)
is_weekend (Boolean): Weekend indicator
is_holiday (Boolean): US Federal holiday indicator

Final Dataset: datetime, consumption, Hour, Month, DayOfWeek, DayOfWeekName, is_weekend, is_holiday

Models Implemented

1. ARIMA (AutoRegressive Integrated Moving Average)

Documentation

ARIMA is the most widely-known classical statistical model for time series analysis, defined by three parameters: (p, d, q)

Model Components:

AR (AutoRegressive) - p: Regression of the series on its own past values. Parameter p specifies how many lagged observations to include in the model.
Integrated - d: Number of differencing operations needed to make the series stationary (i.e., statistical properties like mean and variance don't change over time). Parameter d specifies the order of differencing.
MA (Moving Average) - q: Regression on past forecast errors (residuals). Parameter q specifies how many lagged forecast errors to include in the model.

Implementation Details:

Hyperparameter tuning: Used Optuna for automated hyperparameter optimization
Parameter selection: ACF (AutoCorrelation Function) and PACF (Partial AutoCorrelation Function) plots analyzed to guide parameter search space
Final configuration: Order (3, 1, 0) - 3 autoregressive terms, 1st order differencing, no moving average terms
MLflow tracking: All experiments, parameters, and metrics logged for reproducibility
Final test MAPE: 18.15% (excluded from main comparison due to different test set partition)

2. SARIMAX (Seasonal ARIMA with eXogenous variables)

Documentation

SARIMAX extends ARIMA by combining two separate ARIMA processes, one for non-seasonal patterns (p, d, q) and one for seasonal patterns (P, D, Q, s), while also incorporating exogenous features.

Model Structure:

Non-seasonal component (p, d, q): Captures short-term patterns and trends
Seasonal component (P, D, Q, s): Captures recurring patterns at seasonal lag s
Exogenous variables (X): External factors that influence the target variable

Implementation Details:

Dataset scope: Due to computational constraints, trained only on 2013-2016 data (35,064 hours)
Test set: 2017-2018 (13,896 hours)
Hyperparameter tuning: 70 trials using Optuna optimization
Final configuration: (2, 0, 1) × (2, 1, 2, 24)
- Non-seasonal: 2 AR terms, no differencing, 1 MA term
- Seasonal: 2 seasonal AR terms, 1 seasonal differencing, 2 seasonal MA terms, period = 24 hours (daily seasonality)

Exogenous Features:

Hour, Month, DayOfWeek: Temporal indicators
is_holiday: Binary holiday indicator
Hour_sin, Hour_cos, Month_sin, Month_cos: Cyclical encodings (captures that 23:00 and 00:00 are adjacent, December and January are adjacent)

3. Prophet

Documentation

Prophet is an open-source forecasting library developed by Meta that uses an additive decomposition model particularly well-suited for business time series with strong seasonal patterns and missing data.

Model Equation:

y(t) = g(t) + s(t) + h(t) + εₜ

Where:

g(t): Piecewise linear or logistic growth trend
s(t): Multiple seasonal components (daily, weekly, yearly) modeled using Fourier series
h(t): Holiday and special event effects with custom windows
εₜ: Error term (normally distributed noise)

Implementation Details:

Dataset scope: Full training data (2002-2016: 131,496 hours)
Test set: 2017-2018 (13,896 hours)
Hyperparameter tuning: 70 trials with Optuna

Exogenous Features (Regressors):

is_holiday: Federal holidays
is_weekend: Weekend indicator
is_night (00:00-06:00): Nighttime low consumption periods
is_business_hours (08:00-18:00 weekdays): Peak business activity
is_peak_hours (07:00-09:00, 17:00-20:00): Morning and evening peaks
is_summer, is_fall, is_winter, is_spring: Seasonal indicators
is_monday, is_tuesday ...

4. N-HiTS (Neural Hierarchical Interpolation for Time Series)

Documentation | Paper

N-HiTS is a state-of-the-art deep learning architecture specifically designed for long-horizon time series forecasting. It uses a hierarchical neural network structure to capture patterns at multiple time scales simultaneously.

Architecture Overview:

Multi-rate signal processing: Uses multiple "stacks" of blocks, each operating at different temporal resolutions
Hierarchical interpolation: Learns to decompose the forecast into different frequency components (high-frequency hourly patterns → low-frequency weekly/yearly trends)
Backcast/Forecast structure: Each block produces both a backcast (explanation of past) and forecast (prediction of future)
Unlike traditional methods that require manual specification of seasonal patterns, N-HiTS automatically learns the hierarchical structure of seasonality from raw data through its multi-scale architecture.

Implementation Details:

Dataset split:
- Training: 2002-2015 (~121,000 hours)
- Validation: 2016 (8,760 hours) - used for early stopping and hyperparameter selection
- Test: 2017-2018 (13,896 hours)
Hyperparameter tuning: 70 trials with Optuna
Hardware acceleration: Trained on RTX 5090 GPU

Results Comparison

All models were evaluated on the same test set (2017-2018, 13,896 hours) using three standard forecasting metrics:

Evaluation Metrics

RMSE (Root Mean Squared Error): Average squared error magnitude in MW, emphasizing large deviations. Lower is better.
MAE (Mean Absolute Error): Average absolute error magnitude in MW, treating all errors equally. Lower is better.
MAPE (Mean Absolute Percentage Error): Percentage deviation from actual consumption, scale-independent. Lower is better.

Performance Summary

Model	RMSE (MW)	MAE (MW)	MAPE (%)	Training Time
SARIMAX	5,087	3,985	12.80%	~18 hours
Prophet	3,542	2,635	8.03%	~12 minutes
N-HiTS	2,320	1,568	4.90%	~14 minutes

Note: ARIMA results excluded as it used a different test set partition. Its test MAPE was 18.15%, proving inefficient and very slow to execute.

Visual Comparison

SARIMAX Results

SARIMAX captures the daily pattern (24-hour cycle) but misses weekly and yearly patterns, resulting in systematic under/over-prediction during certain periods.

Advantages: Handles one seasonal pattern with exogenous variables, more flexible than ARIMA
Limitations: Limited to single seasonality (daily only), extremely slow training (~18 hours), struggles with overlapping seasonal patterns

Prophet Results

Prophet successfully models multiple seasonalities (daily, weekly, yearly), showing significant improvement over SARIMAX. However, forecasts still exhibit moderate deviations, particularly during extreme weather periods.

Advantages: Handles multiple seasonalities simultaneously, interpretable decomposition (trend/seasonal/holiday), fast training (~12 min)
Limitations: Additive model structure limits non-linear interactions, moderate errors during extreme events

N-HiTS Results

N-HiTS achieves state-of-the-art performance, with forecasts that closely track actual values by capturing all hierarchical patterns and complex non-linear relationships.

Advantages: Automatically learns multi-scale patterns (hourly to yearly), captures non-linear interactions, designed for long-horizon forecasting, fast training (~14 min)
Limitations: Requires GPU for optimal performance, less interpretable than statistical models, needs larger datasets

How to Run the Project

Prerequisites

Python 3.8 or higher
NVIDIA GPU with CUDA support (optional but recommended for N-HiTS)
8GB+ RAM
MLflow for experiment tracking

Installation

Clone the repository:

git clone https://github.com/dream-19/Time_Series_PJME_hourly_consumption.git
cd Energy_Consumption_Time_Series

Install dependencies:

pip install -r requirements.txt

License

This project is licensed under the MIT License LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
plots		plots
time_series		time_series
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
LICENSE		LICENSE
README.md		README.md
common_functions.py		common_functions.py
execute_N-HiTS.ipynb		execute_N-HiTS.ipynb
execute_arima.ipynb		execute_arima.ipynb
execute_prophet.ipynb		execute_prophet.ipynb
execute_sarimax.ipynb		execute_sarimax.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Energy Consumption Time Series Forecasting

Dataset

Data Preprocessing

Models Implemented

1. ARIMA (AutoRegressive Integrated Moving Average)

2. SARIMAX (Seasonal ARIMA with eXogenous variables)

3. Prophet

4. N-HiTS (Neural Hierarchical Interpolation for Time Series)

Results Comparison

Evaluation Metrics

Performance Summary

Visual Comparison

SARIMAX Results

Prophet Results

N-HiTS Results

How to Run the Project

Prerequisites

Installation

License

About

Uh oh!

Releases

Packages

Languages

License

dream-19/Time_Series_PJME_hourly_consumption

Folders and files

Latest commit

History

Repository files navigation

Energy Consumption Time Series Forecasting

Dataset

Data Preprocessing

Models Implemented

1. ARIMA (AutoRegressive Integrated Moving Average)

2. SARIMAX (Seasonal ARIMA with eXogenous variables)

3. Prophet

4. N-HiTS (Neural Hierarchical Interpolation for Time Series)

Results Comparison

Evaluation Metrics

Performance Summary

Visual Comparison

SARIMAX Results

Prophet Results

N-HiTS Results

How to Run the Project

Prerequisites

Installation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages