Skip to content

Latest commit

Β 

History

History
194 lines (135 loc) Β· 13.8 KB

File metadata and controls

194 lines (135 loc) Β· 13.8 KB

Time Series Forecasting β€” Loss Functions

A chronological catalog of loss functions for time series forecasting, temporal prediction, and sequential modeling.


Part I β€” Classical & Probabilistic Forecasting Losses


1. Mean Absolute Error (MAE) / L1 Loss (Classical) β€” The average absolute difference between predicted and true values; robust to outliers and widely used as a baseline point forecasting loss. πŸ“„ Least Absolute Deviations (Wikipedia) β€” Classical statistical method πŸ’» PyTorch torch.nn.L1Loss


2. Mean Squared Error (MSE) / L2 Loss (Classical) β€” The average squared difference between predicted and true values; penalizes large errors disproportionately, making it sensitive to outliers. πŸ“„ Least Squares (Wikipedia) β€” Classical statistical method (Gauss, Legendre) πŸ’» PyTorch torch.nn.MSELoss


3. Huber Loss (1964) β€” A piecewise loss that behaves as L2 for small errors and L1 for large errors, combining MSE's smoothness with MAE's robustness to outliers. πŸ“„ Robust Estimation of a Location Parameter β€” Peter J. Huber πŸ’» PyTorch torch.nn.HuberLoss


4. Quantile Loss / Pinball Loss (1978) β€” Asymmetric loss that penalizes over- and under-prediction differently based on a chosen quantile, enabling prediction interval estimation and probabilistic forecasting. πŸ“„ Regression Quantiles β€” Roger Koenker, Gilbert Bassett Jr. πŸ’» GluonTS QuantileLoss


5. MAPE (Mean Absolute Percentage Error) (Classical) β€” Scale-independent percentage error measuring relative forecast accuracy; undefined when true values are zero and asymmetrically penalizes positive vs. negative errors. πŸ“„ Another Look at Measures of Forecast Accuracy β€” Rob J. Hyndman, Anne B. Koehler (2006, critical analysis) πŸ’» Nixtla/neuralforecast (MAPE metric)


6. sMAPE (Symmetric MAPE) (1999) β€” Symmetric variant of MAPE that normalizes by the average of predicted and true values, addressing MAPE's asymmetry but still problematic near zero. πŸ“„ A Better Measure of Relative Prediction Accuracy for Model Selection and Model Estimation β€” J. Scott Armstrong πŸ’» Nixtla/neuralforecast (sMAPE metric)


7. MASE (Mean Absolute Scaled Error) (2006) β€” Scale-free error metric that normalizes MAE by the in-sample MAE of a naive (random walk) forecast, well-defined for zero values and suitable for comparing across series. πŸ“„ Another Look at Measures of Forecast Accuracy β€” Rob J. Hyndman, Anne B. Koehler πŸ’» Nixtla/neuralforecast (MASE metric)


8. Negative Log-Likelihood (Gaussian) (Classical) β€” Probabilistic forecasting loss that jointly learns the predicted mean and variance of a Gaussian distribution, penalizing both inaccurate point predictions and miscalibrated uncertainty. πŸ“„ Pattern Recognition and Machine Learning, Β§1.2.4 β€” Christopher M. Bishop (2006) πŸ’» GluonTS GaussianOutput


9. CRPS (Continuous Ranked Probability Score) (2007) β€” A proper scoring rule for probabilistic forecasts that measures the integrated squared difference between the predicted CDF and the empirical CDF of the observation, generalizing MAE to distributions. πŸ“„ Strictly Proper Scoring Rules, Prediction, and Estimation β€” Tilmann Gneiting, Adrian E. Raftery πŸ’» GluonTS EnergyScore / CRPS


10. DILATE Loss (2019) β€” Combines a shape-based loss (soft-DTW) with a Temporal Distortion Index (TDI) penalty, jointly optimizing for both shape accuracy and temporal alignment in time series prediction. πŸ“„ Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models β€” Vincent Le Guen, Nicolas Thome πŸ’» vincent-leguen/DILATE


Part II β€” Deep Learning Forecasting Losses


11. DeepAR Loss (2020) β€” Autoregressive RNN trained with the negative log-likelihood of parametric distributions (Gaussian, negative binomial, beta, etc.), producing calibrated probabilistic forecasts via ancestral sampling. πŸ“„ DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks β€” David Salinas, Valentin Flunkert, Jan Gasthaus, Tim Januschowski πŸ’» awslabs/gluonts (DeepAR)


12. N-BEATS Loss (2020) β€” Interpretable deep architecture using basis expansion with backward/forward residual stacking; trained with MAPE, sMAPE, or MASE losses depending on the evaluation metric, achieving pure DL state-of-the-art on M4. πŸ“„ N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting β€” Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio πŸ’» ServiceNow/N-BEATS


13. Informer Loss (2021) β€” MSE loss applied to long-sequence time series forecasting with ProbSparse self-attention and generative-style decoder, enabling direct multi-step prediction without autoregressive accumulation of error. πŸ“„ Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting β€” Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang πŸ’» zhouhaoyi/Informer2020


14. Autoformer Loss (2021) β€” MSE loss with a novel auto-correlation mechanism replacing standard self-attention, combined with progressive series decomposition (trend + seasonal) for long-term forecasting. πŸ“„ Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting β€” Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long πŸ’» thuml/Autoformer


15. FEDformer Loss (2022) β€” MSE loss with frequency-enhanced attention that operates in the Fourier/wavelet domain, capturing global temporal patterns with linear complexity for long-term forecasting. πŸ“„ FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting β€” Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, Rong Jin πŸ’» MAZiqing/FEDformer


16. PatchTST Loss (2023) β€” MSE loss with channel-independent patching that segments time series into subseries-level patches fed to a vanilla Transformer, reducing computation and capturing local semantic information for multivariate forecasting. πŸ“„ A Time Series is Worth 64 Words: Long-term Forecasting with Transformers β€” Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam πŸ’» yuqinie98/PatchTST


17. TimesNet Loss (2023) β€” MSE loss with a 2D variation modeling approach that uses FFT-based period detection to reshape 1D time series into 2D tensors, capturing both intra-period and inter-period variations via 2D convolutions. πŸ“„ TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis β€” Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, Mingsheng Long πŸ’» thuml/Time-Series-Library (TimesNet)


18. iTransformer Loss (2024) β€” MSE loss with an inverted Transformer architecture that applies attention on the variate dimension (not time), treating each time series as a token to capture multivariate correlations more effectively. πŸ“„ iTransformer: Inverted Transformers Are Effective for Time Series Forecasting β€” Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, Mingsheng Long πŸ’» thuml/iTransformer


19. TimesFM Loss (2024) β€” Patched decoder-only transformer foundation model trained on a large corpus of real-world and synthetic time series, using quantile heads (quantile loss) for probabilistic forecasting with zero-shot generalization. πŸ“„ A Decoder-Only Foundation Model for Time-Series Forecasting β€” Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou πŸ’» google-research/timesfm


Part III β€” Specialized Time Series Losses


20. DTW Loss (Dynamic Time Warping) (1978) β€” Alignment-based distance that finds the optimal non-linear warping path between two time series, allowing temporal distortion; non-differentiable in its original form. πŸ“„ Dynamic Programming Algorithm Optimization for Spoken Word Recognition β€” Hiroaki Sakoe, Seibi Chiba πŸ’» tslearn DTW


21. Soft-DTW Loss (2017) β€” A differentiable relaxation of DTW that replaces the hard minimum with a soft-minimum (log-sum-exp), enabling gradient-based optimization of DTW-like alignment losses for time series. πŸ“„ Soft-DTW: a Differentiable Loss Function for Time-Series β€” Marco Cuturi, Mathieu Blondel πŸ’» mblondel/soft-dtw


22. TDI (Temporal Distortion Index) (2019) β€” Measures the temporal alignment quality between predicted and true time series by computing the area between the DTW warping path and the diagonal, quantifying how much temporal distortion exists. πŸ“„ Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models β€” Vincent Le Guen, Nicolas Thome πŸ’» vincent-leguen/DILATE (TDI component)


23. Variational Inference Loss for State Space Models (2018) β€” ELBO-based loss for deep state space models that combines a reconstruction term (negative log-likelihood) with a KL divergence regularizer, enabling probabilistic forecasting with learned latent dynamics. πŸ“„ Deep State Space Models for Time Series Forecasting β€” Syama Sundar Rangapuram, Matthias W. Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, Tim Januschowski πŸ’» awslabs/gluonts (DeepState)


Unified Libraries

Library Description Link
GluonTS AWS probabilistic time series modeling (DeepAR, DeepState, Transformer, etc.) awslabs/gluonts
NeuralForecast Nixtla's production-ready neural forecasting (N-BEATS, NHITS, PatchTST, etc.) Nixtla/neuralforecast
pytorch-forecasting High-level PyTorch forecasting API (TFT, DeepAR, N-BEATS, etc.) jdb78/pytorch-forecasting
TSlib (Time-Series-Library) Unified benchmark for time series (Informer, Autoformer, TimesNet, iTransformer, etc.) thuml/Time-Series-Library

πŸ“Š Summary Table

# Loss Function Year Category Key Innovation
1 MAE / L1 Loss Classical Point Absolute error, outlier-robust
2 MSE / L2 Loss Classical Point Squared error, smooth gradients
3 Huber Loss 1964 Point L1/L2 hybrid, robust
4 Quantile / Pinball Loss 1978 Probabilistic Asymmetric quantile regression
5 MAPE Classical Point Scale-independent percentage error
6 sMAPE 1999 Point Symmetric percentage error
7 MASE 2006 Point Scaled by naive forecast baseline
8 Gaussian NLL Classical Probabilistic Learned mean + variance
9 CRPS 2007 Probabilistic Proper scoring rule for CDFs
10 DILATE Loss 2019 Shape+Temporal Soft-DTW + temporal distortion
11 DeepAR Loss 2020 Probabilistic Autoregressive parametric NLL
12 N-BEATS Loss 2020 Point Basis expansion + residual stacking
13 Informer Loss 2021 Point ProbSparse attention + MSE
14 Autoformer Loss 2021 Point Auto-correlation + decomposition
15 FEDformer Loss 2022 Point Fourier/wavelet attention + MSE
16 PatchTST Loss 2023 Point Channel-independent patching + MSE
17 TimesNet Loss 2023 Point FFT-based 2D variation + MSE
18 iTransformer Loss 2024 Point Inverted attention (variate dim)
19 TimesFM Loss 2024 Probabilistic Foundation model + quantile heads
20 DTW Loss 1978 Alignment Non-linear temporal warping
21 Soft-DTW Loss 2017 Alignment Differentiable DTW relaxation
22 TDI 2019 Alignment Warping path distortion area
23 VI Loss (Deep SSM) 2018 Probabilistic ELBO for latent state dynamics