A deep generative model combining Transformer-based Variational Autoencoders (VAE) and Normalizing Flows to generate and forecast financial time series data, conditioned on market volatility indicators.
Financial markets exhibit non-linear patterns, heavy-tailed distributions, and volatility clustering — making classical methods like ARIMA or GARCH insufficient for accurate modeling.
This project proposes a Transformer-VAE architecture with RealNVP normalizing flows, conditioned on the VIX volatility index, to generate synthetic financial data and learn latent financial dynamics.
Clone this repository and set up a Python environment:
git clone https://github.com/muralikarteek7/Transformer-VAE-Stock-Predictor.git
cd Transformer-VAE-Stock-Predictor
conda create -n transvae python=3.8
conda activate transvaeInstall the following dependencies via pip or conda:
torchtransformersnumpypandasmatplotlibyfinancescikit-learn
- Historical stock data from Yahoo Finance via
yfinance - Volatility Index (VIX) data
- 3 stocks: e.g., AAPL, MSFT, GOOGL
- Features:
[Close, Volume] - Time range: Multiple years (daily frequency)
Each sample is a sequence of shape (1000, 6):
- 1000 time steps
- 6 dimensions (3 stocks × 2 features each)
- VIX used as a conditional input
- Loss: ELBO (Reconstruction + KL) + Flow log-det Jacobian
- Cosine Similarity: Between real and generated log returns
- Volatility clustering check
- Time-series overlay of real vs. synthetic returns
| Model | Wasserstein Distance | Cosine Similarity |
|---|---|---|
| Transformer-VAE-Flow | 0.0629 | 0.9072 |
| TimeGrad | 0.3246 | 0.9905 |
| QuantGAN | 0.0159 | 0.9148 |
| DiM | 0.2018 | — |
| GARCH | 0.1499 | 0.9919 |
- Synthetic data generation for backtesting
- Risk modeling and stress testing
- Portfolio simulation
- Improving data diversity for trading models
- QuantGAN: Deep Generation of Financial Time Series
- Bollerslev, T. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics, 1986.
- TimeGrad: Diffusion-based Forecasting

