A comprehensive machine learning project for stock price prediction using LSTM, MC Dropout, Bayesian Neural Networks, and Transformers with probabilistic uncertainty estimation.
Author: Mohansree Vijayakumar
Email: mohansreesk14@gmail.com
- Project Overview
- Quick Start
- Project Structure
- Model Architectures
- Technical Indicators
- Documentation
- Results
- Contributing
- License
This project implements state-of-the-art deep learning models for financial time series forecasting with a focus on uncertainty quantification. The system provides not just point predictions, but confidence intervals that widen during volatile market periods, enabling better risk-aware trading decisions.
- β Multiple Model Architectures: LSTM, MC Dropout LSTM, Bayesian Neural Networks (Pyro), Transformers
- β Uncertainty Quantification: Probabilistic forecasting with confidence intervals
- β 21 Technical Indicators: Comprehensive feature engineering
- β Hyperparameter Optimization: Optuna-based automated tuning
- β Backtesting System: Strategy evaluation with uncertainty-aware position sizing
- β Interactive Dashboard: Streamlit web application
- β Professional Documentation: Detailed reports and visualizations
ML-Intern/
βββ app/ # Streamlit web application
β βββ streamlit_app.py
βββ configs/ # Model and experiment configurations
β βββ lstm_baseline.yaml
β βββ mc_dropout.yaml
β βββ bnn_vi.yaml
β βββ transformer_baseline.yaml
βββ data/ # Data storage (gitignored)
β βββ raw/
β βββ processed/
βββ reports/ # Documentation and analysis
β βββ figures/ # Generated visualizations
β βββ tables/ # Reference tables
β βββ PROJECT_ANALYSIS_REPORT.md
β βββ QUICK_REFERENCE.md
β βββ FIGURES_DOCUMENTATION.md
β βββ HPO_SEARCH_SPACE.md
βββ results/ # Experiment results (gitignored)
βββ scripts/ # Data preparation scripts
β βββ fetch_data.py
β βββ make_dataset.py
βββ src/ # Source code
β βββ data/ # Data processing
β β βββ indicators.py
β β βββ preprocess.py
β βββ evaluation/ # Model evaluation
β β βββ backtesting.py
β β βββ metrics.py
β βββ models/ # Model implementations
β β βββ lstm.py
β β βββ mc_dropout_lstm.py
β β βββ bnn_vi_pyro.py
β β βββ transformer.py
β βββ training/ # Training pipeline
β β βββ train_loop.py
β βββ utils/ # Utilities
β βββ config.py
β βββ logging_mlflow.py
β βββ visualization.py
βββ tests/ # Unit tests
β βββ test_metrics.py
β βββ test_models.py
β βββ test_preprocess.py
βββ utils/ # Utility scripts
β βββ generate_figures.py # Generate documentation figures
β βββ generate_hpo_table.py # Generate HPO table image
β βββ generate_indicators_table.py # Generate indicators table
βββ train.py # Main training script
βββ evaluate.py # Model evaluation script
βββ backtest.py # Backtesting script
βββ hparam_search.py # Hyperparameter optimization
βββ run_project.py # Project launcher (tests + app)
βββ requirements.txt # Python dependencies
βββ README.md # This file
- Python 3.8+
- pip package manager
- Clone or download this repository
- Install dependencies:
pip install -r requirements.txtOption 1: Run Everything (Tests + Web App)
python run_project.pyOption 2: Run Web Application Only
streamlit run app/streamlit_app.pyor
.\run_streamlit.bat # Windows batch file
.\run_streamlit.ps1 # PowerShell scriptOption 3: Access the Live Streamlit App Visit our hosted Streamlit app: https://bayesian-financial-forecasting.streamlit.app/
# LSTM Baseline
python train.py --config configs/lstm_baseline.yaml
# MC Dropout LSTM
python train.py --config configs/mc_dropout_lstm.yaml
# Bayesian Neural Network
python train_bnn.py --config configs/bnn_vi.yaml
# Transformer
python train.py --config configs/transformer_baseline.yamlpython hparam_search.py --config configs/lstm_baseline.yaml --study-name my_studypython evaluate.py --config configs/lstm_baseline.yaml --checkpoint results/experiment_XXXXX/best_model.ptpython backtest.py --config configs/mc_dropout.yaml --checkpoint results/experiment_XXXXX/best_model.pt# Generate all figures (6, 7, 8, 9, 10)
python utils/generate_figures.py
# Generate HPO table
python utils/generate_hpo_table.py
# Generate technical indicators table
python utils/generate_indicators_table.py- Standard LSTM for time series forecasting
- Dropout regularization
- Configuration:
configs/lstm_baseline.yaml
- Monte Carlo Dropout for uncertainty estimation
- Multiple forward passes at inference
- Provides prediction mean and variance
- Configuration:
configs/mc_dropout_lstm.yaml
- Full Bayesian treatment with Pyro
- Variational inference
- Posterior distribution over weights
- Configuration:
configs/bnn_vi.yaml
- Attention-based architecture
- Multi-head self-attention
- Positional encoding for sequences
- Configuration:
configs/transformer_baseline.yaml
The system uses 21 technical indicators across 6 categories:
- Price Features (5): Open, High, Low, Close, Adj Close
- Volume (1): Trading volume
- Returns (2): Daily return, Log return
- Trend (4): SMA(10), SMA(20), EMA(12), EMA(26)
- Momentum (6): RSI, MACD, MACD Signal, MACD Histogram, Stochastic %K, %D
- Volatility (3): Bollinger Bands (Upper, Middle, Lower)
See reports/tables/TECHNICAL_INDICATORS_TABLE.md for detailed formulas.
Optuna-based optimization with the following search space:
| Parameter | Type | Range | Distribution |
|---|---|---|---|
| Hidden Size | Integer | 64-256 (step 64) | Uniform |
| Num Layers | Integer | 1-3 | Uniform |
| Dropout | Float | 0.0-0.4 | Uniform |
| Learning Rate | Float | 1e-4 to 5e-3 | Log-Uniform |
| Batch Size | Categorical | [32, 64, 128] | Discrete |
See reports/HPO_SEARCH_SPACE.md for complete details.
- Validation Loss: ~0.009 MSE (LSTM Baseline)
- Uncertainty Calibration: Bands widen 1.48x during COVID-19 crash
- Risk-Adjusted Returns:
- Uncertainty-aware strategy: 22.7% lower volatility
- 22.4% smaller maximum drawdown
- Sharpe ratio improvement: 0.88 vs 0.78
All figures available in reports/figures/:
- Figure 6: AAPL Price History (2015-2024)
- Figure 7: Feature Correlation Heatmap
- Figure 8: Training/Validation Loss Curves
- Figure 9: Uncertainty Bands (COVID-19 volatility demonstration)
- Figure 10: Cumulative Returns Comparison
- PROJECT_ANALYSIS_REPORT.md: Complete project analysis
- FIGURES_DOCUMENTATION.md: Detailed figure documentation
- HPO_SEARCH_SPACE.md: Hyperparameter optimization details
- TECHNICAL_INDICATORS_TABLE.md: All 21 indicators with formulas
- QUICK_REFERENCE.md: At-a-glance project guide
- HPO_QUICK_REFERENCE.md: HPO quick guide
Run unit tests:
# All tests
python -m pytest tests/
# Specific test file
python -m pytest tests/test_models.py
# With coverage
python -m pytest tests/ --cov=srcOr use the project launcher:
python run_project.py # Runs tests first, then launches appAll experiments use YAML configuration files in configs/:
# Example: configs/lstm_baseline.yaml
data:
tickers: ["AAPL"]
data_dir: "data/processed"
train_ratio: 0.7
valid_ratio: 0.15
model:
type: "lstm"
hidden_size: 128
num_layers: 2
dropout: 0.1
training:
epochs: 50
batch_size: 64
learning_rate: 0.001
early_stopping_patience: 10
seed: 42Core libraries:
- PyTorch: Deep learning framework
- Pandas/NumPy: Data manipulation
- yfinance: Financial data fetching
- Streamlit: Web dashboard
- Optuna: Hyperparameter optimization
- Pyro: Bayesian deep learning
- Matplotlib/Seaborn: Visualization
- MLflow: Experiment tracking (optional)
See requirements.txt for complete list.
- Uncertainty Quantification: Not just predictions, but confidence intervals
- Volatility-Aware Trading: Dynamic position sizing based on uncertainty
- Comprehensive Technical Analysis: 21 engineered features
- Multiple Architectures: Compare LSTM, Bayesian, Transformer approaches
- Professional Documentation: Publication-ready reports and figures
If you use this project in your research or work, please cite:
Financial Time Series Forecasting with Uncertainty Quantification
ML Internship Project, 2025
GitHub: [Your Repository URL]
This is an educational project. Suggestions and improvements are welcome!
This project is for educational and research purposes.
- Data Source: Yahoo Finance (yfinance library)
- Frameworks: PyTorch, Pyro, Streamlit, Optuna
- Inspiration: Modern deep learning research in finance and uncertainty quantification
For questions or collaboration opportunities, please reach out via GitHub issues.
Last Updated: October 14, 2025
Version: 1.0
Status: Production-Ready β