Discretization of temporal data for downstream ML tasks
This repository provides an implementation of TOTEM applied to time series forecasting. TOTEM introduces a generalist approach to time series modelling by first discretizing input sequences via self-supervised learning, then using these discrete tokens for downstream tasks such as imputation, anomaly detection and forecasting. This approach is inspired by recent advances in Natural Language Processing (NLP) and Computer Vision (CV) where discrete representations and foundation models have led to strong generalization.
We collected time series data from various domains and sources spanning weather, stock market, traffic, and electricity.
The compiled dataset is available on Kaggle.
-
Clone the repository
-
Install the required dependencies:
pip install -r requirements.txt
Similar to the original paper, the tokenizer is a Vector Quantized Variational Autoencoder (VQVAE). The model in this repository, uses Exponential Moving Average (EMA) for improved codebook utilization. The VQVAE converts a given normalized time series into discrete tokens, which can then be used for some downstream task. These discrete tokens can be used to reconstruct the original sequences.
The Forecaster is a hybrid model that combines two components:
-
A non-autoregressive Transformer that processes the discrete tokens (produced by the VQVAE tokenizer) to predict the future normalized sequence.
-
A lightweight MLP that models distributional shifts using the raw input time series.
These two outputs are combined to produce the final forecast.
Before training, configure your experiment or inference in config/default.yaml
or supply a custom YAML config using the --config
flag.
python3 train.py --config path/to/your_config.yaml
python3 train.py
The training script supports training the VQVAE tokenizer, the Forecaster, or both sequentially. By default, it will train both but you specify with the --train
flag. You can also specify the path to a pretrained model with the --vqvae
for the tokenizer and --forecaster
for the forecaster.
Example Commands
Train both VQVAE and Forecaster:
python train.py --train both
Train only VQVAE:
python train.py --train vqvae
Train only Forecaster (You must specify a pretrained VQVAE):
python train.py --train forecaster --vqvae path/to/vqvae.pth
Optionally, resume training:
python train.py --train forecaster --vqvae path/to/vqvae.pth --forecaster path/to/forecaster.pth
Each training run generates a unique directory under experiments/
, named by timestamp. It contains:
- the full log
- Model checkpoints
- Plots
- Results
You can run inference on a trained VQVAE and/or Forecaster model using the test.py
script.
Example Command
python test.py --config config/default.yaml --vqvae path/to/vqvae.pth --forecaster path/to/forecaster.pth
While the --config
and --forecaster
flags are optional, You must provide a pretrained VQVAE model, even when only evaluating the Forecaster, since the input tokens for forecasting are obtained from the VQVAE.
VQVAE Reconstructions on a Test Set
For experimentation, we used a VQVAE with a codebook size of 128 and a compression factor of 4. The tokenizer successfully captured key temporal patterns across different datasets and reconstructed input sequences with high fidelity. It generalizes well to unseen time series types, showing strong domain-agnostic behavior.
Forecasting Future Average Temperature on a Test Set
For the forecaster, we used an input length of 64 timesteps and a forecast horizon of 64 timesteps. The model performs well on in-domain tasks; that is, when trained and evaluated solely on weather data. However, its performance degrades in zero-shot and cross-domain scenarios, particularly on volatile time series such as stock prices.
There are several promising directions to improve this current implementation of TOTEM:
-
Scaling Up: Larger VQVAE codebooks can enhance the expressiveness of the tokenizer by allowing it to capture more fine-grained and diverse temporal features. Similarly, training on a broader and more varied dataset, especially with underrepresented domains, could significantly improve the model's ability to generalize across different time series types..
-
Enhancing the Forecaster: The current hybrid Transformer-MLP forecaster can be improved by exploring more expressive architectures.
-
Benchmarking and Evaluation: Benchmarking the model on well-established long-range forecasting datasets could offer a clearer picture of its robustness and generalization capabilities.
TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis by Sabera Talukder, Yisong Yue & Georgia Gkioxari
Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift by Taesung Kim, Jinhee Kim et al.
Neural Discrete Representation Learning by Aaron van den Oord, Oriol Vinyals & Koray Kavukcuoglu
Contributions are welcome! Feel free to open an issue or submit a pull request to improve this project.