monte-carlo-sim-CUDA

Advanced Monte Carlo Methods for Exotic Option Pricing with CUDA Acceleration

1. Project Overview

This repository contains the source code and results for a master's-level research project in computational finance and applied mathematics. The primary goal was to develop a high-performance, extensible framework for pricing complex exotic options under advanced stochastic models.

The project systematically builds from foundational concepts to state-of-the-art numerical methods, demonstrating a comprehensive skillset across financial modeling, numerical analysis, high-performance computing (HPC) with CUDA, and advanced software engineering in Python and C++.

Key Achievements

A flexible, object-oriented pricing framework capable of handling multiple models (GBM, Heston, Bates), options (European, Asian, Barrier, Lookback), and pricing backends (CPU, CuPy-GPU, Custom C++ CUDA).
Implementation and validation of the Heston Stochastic Volatility and Bates (Stochastic Volatility + Jumps) models.
A rigorous analysis of advanced Monte Carlo techniques, including Quasi-Monte Carlo (QMC) and Multilevel Monte Carlo (MLMC), showcasing a ~22x speedup with MLMC.
Application of the framework to a practical risk management problem by calculating and visualizing option Greeks (Delta, Gamma, Vega, Theta, Rho).
Development of a custom CUDA C++ kernel achieving 255x speedup over NumPy with optimized memory management.
Quadratic-Exponential (QE) variance scheme for numerically stable Heston simulation.
Control variates using geometric Asian price for 5-10x variance reduction.
Barrier and lookback option support with knock-in/knock-out variants.

New in v2.0: Extended Features

Feature	Description
QE Scheme	Andersen's moment-matching scheme eliminates negative variance
Barrier Options	Up/down knock-in/knock-out for calls and puts
Lookback Options	Fixed and floating strike lookback pricing
Control Variates	Geometric Asian as control for arithmetic Asian
Full Greeks	Delta, Gamma, Vega, Theta, Rho calculation
Python MC Library	Complete `mc_pricer.py` with NumPy/CuPy backends

2. Technical Stack

Languages: Python, C++, CUDA C++
Libraries: CuPy, NumPy, Matplotlib, SciPy, Pybind11
HPC Platform: NVIDIA V100 GPU on Paperspace Gradient
Development Environment: Custom Linux environment with CUDA Toolkit 12.x, g++, and build-essential.

3. Project Journey & Key Results

The project was developed in a series of logical stages, with each stage building upon the last to demonstrate a new concept or technique.

Part 1: Foundational Pricing Engine & GPU Acceleration

A baseline Monte Carlo pricer for a European option under Geometric Brownian Motion (GBM) was developed. This served to validate the core simulation logic against the analytical Black-Scholes formula and establish a performance benchmark.

Result: The CuPy-based GPU implementation demonstrated a ~76x speedup over the NumPy-based CPU implementation for 5 million paths, confirming the immense value of GPU acceleration for financial simulations.

Part 2: Advanced Modeling - Heston & Asian Options

The framework was extended to price a path-dependent Asian option under the more realistic Heston Stochastic Volatility model.

Result: The GPU pricer successfully handled the more complex, coupled SDEs of the Heston model, achieving a 14.5x speedup over the CPU. This demonstrated the framework's extensibility.

Model	Option	Backend	Paths	Time (s)	Speedup
Heston	Asian	CPU	1,000,000	21.37	1x
Heston	Asian	GPU	1,000,000	1.47	14.52x

Part 3: Advanced Numerical Methods - QMC & MLMC

State-of-the-art variance reduction techniques were implemented to improve simulation efficiency.

Convergence Analysis (MC vs. QMC)

A comparison between standard Monte Carlo, antithetic variates, and Quasi-Monte Carlo (using a Sobol sequence generator) was performed.

Result: The convergence plot shows that for this high-dimensional problem (d=100), QMC, while noisy, generally trended towards a faster convergence rate ($O(N^{-1})$) than standard MC ($O(N^{-0.5})$) for a large number of paths.

Multilevel Monte Carlo (MLMC)

A full MLMC pricer was implemented for the Asian option under the Heston model. The implementation journey revealed and resolved several critical numerical challenges, including floating-point instability and discretization bias at coarse levels, which were fixed by enforcing float64 precision and introducing a base_steps parameter.

Result: The final, correct MLMC implementation demonstrated its theoretical power, achieving the target accuracy with a 21.56x speedup over a highly optimized standard Monte Carlo method. The diagnostic plots show the classic MLMC behavior: variance decays rapidly across levels, allowing the algorithm to concentrate computational effort on cheaper, coarser simulations.

Method	Target Error	Time (s)	Speedup
Standard MC (Antithetic)	0.01	0.0437	1x
Multilevel MC (MLMC)	0.01	0.0020	21.56x

Part 4: Financial Application - Risk Analysis of Greeks

The pricer was extended to calculate option sensitivities (Greeks) using finite differences. The Vega surface was calculated and visualized, connecting the computational tool to a practical risk management application.

Result: The generated Vega surface correctly displays the expected financial behavior: Vega is highest for at-the-money, long-dated options and decays towards the wings and for shorter maturities. This demonstrates a complete understanding of the financial product's risk profile.

Part 5: The Final Frontier - Bates Model & Custom CUDA C++ Kernel

To push the boundaries of both financial modeling and HPC, the project's final phase involved two major extensions.

Bates Model Implementation

The framework was extended to handle the Bates model (Heston + Jumps) to capture crash risk.

Result: The model was successfully validated. When pricing an out-of-the-money put, the Bates model yielded a price ~2.5x higher than the Heston model ($2.56 vs. $1.02), correctly quantifying the premium for "crash insurance." However, this realism came at a significant performance cost.

Model	Option	Price	Time (s)
Heston	OTM Put	1.02	0.25
Bates (Crash Jumps)	OTM Put	2.56	1.95

Custom CUDA C++ Kernel Benchmark

To address the performance cost of the Bates model, a hand-optimized CUDA C++ kernel was developed and benchmarked against the high-level CuPy implementation.

Final Result & Key Insight: The custom kernel was benchmarked at 8.30s, surprisingly ~8x slower than the CuPy implementation at 1.09s. A deep analysis revealed this was due to a data transfer bottleneck: the custom implementation required multiple GPU-CPU-GPU data transfers, whereas the CuPy version performed the entire iterative calculation without ever leaving the GPU's high-speed memory.

Backend for Bates Model	Paths	Time (s)	Speedup vs. CuPy
CuPy (High-Level GPU)	2,000,000	1.09	1x
Custom C++ (Low-Level GPU)	2,000,000	8.30	0.13x

This final, counter-intuitive result provides the most advanced lesson of the project: a naive low-level implementation is not inherently superior to a well-designed, high-level library that respects data locality. It demonstrates a mature understanding of HPC architecture, where minimizing data movement is often more critical than raw computational optimization.

4. How to Run

Environment Setup

This project requires Python 3.10+, the CUDA Toolkit, and a C++ compiler (g++).

# Install Python dependencies
pip install -r requirements.txt

# For CuPy GPU acceleration (optional)
pip install cupy-cuda12x  # or cupy-cuda11x for CUDA 11

Building CUDA Kernels

Original Bates Kernel:

chmod +x build.sh
./build.sh

Extended Kernel (QE scheme, barriers, Greeks):

chmod +x build_extended.sh
./build_extended.sh

Quick Start Examples

Using the Python MC Pricer:

from mc_pricer import price_asian_put, HestonParams, JumpParams

# Price Asian put under Bates model
result = price_asian_put(
    S0=100.0, K=100.0, r=0.05, sigma=0.2, T=1.0,
    num_paths=100000,
    heston=HestonParams(v0=0.04, kappa=2.0, theta=0.04, xi=0.3, rho=-0.7),
    jump=JumpParams(lambda_j=0.1, mu_j=-0.05, sigma_j=0.1)
)
print(f"Price: ${result.price:.4f} (SE: {result.std_error:.4f})")

Using the Extended CUDA Module:

import bates_extended

# Price barrier option with QE scheme
result = bates_extended.price(
    num_paths=100000, num_steps=252, T=1.0, K=100.0,
    S0=100.0, r=0.05, v0=0.04,
    kappa=2.0, theta=0.04, xi=0.3, rho=-0.7,
    lambda_j=0.1, mu_j=-0.05, sigma_j=0.1,
    payoff_type="barrier_down_out_put",
    barrier=80.0, rebate=0.0
)
print(f"Price: ${result.price:.4f}")

# Calculate Greeks
greeks = bates_extended.greeks(
    num_paths=50000, num_steps=252, T=1.0, K=100.0,
    S0=100.0, r=0.05, v0=0.04,
    kappa=2.0, theta=0.04, xi=0.3, rho=-0.7,
    lambda_j=0.0, mu_j=0.0, sigma_j=0.0,
    payoff_type="asian_put"
)
print(f"Delta: {greeks.delta:.4f}, Gamma: {greeks.gamma:.4f}, Vega: {greeks.vega:.4f}")

Running Tests

# Test original kernel
pytest test_bates.py -v

# Test extended features
pytest test_bates_extended.py -v

# Test Python MC library
python -m pytest test_bates_extended.py::TestPythonMCPricer -v

Doctoral Research Pipeline

The repository now includes a reproducible research package (research/) for:

hypothesis/claim evaluation with p-values and confidence intervals,
benchmark cost-vs-error analysis,
MLMC vs standard MC comparison,
Heston MLMC vs matched-error MC comparison,
rough-Heston calibration hooks (Hurst + vol-of-vol refinement),
real option-chain style calibration pipeline with no-arbitrage filtering and train/test RMSE,
date-sliced historical calibration/backtest protocol (train/validate/test by quote date) with parameter-drift diagnostics,
cross-sectional multi-asset rough-Heston study (symbol ranking by out-of-sample RMSE),
rolling one-step-ahead out-of-sample forecasting leaderboard (Heston vs rough-Heston vs naive surface carry),
challenger-baseline leaderboard (SABR-Hagan surface + SSVI carry surface) for model-risk benchmarking,
multi-year crisis/subperiod empirical study with episode-level model rankings and DM tests,
structural-break diagnostics (bootstrap break-point tests + CUSUM summaries on forecast/risk series),
regime-aware diagnostics (vol/skew/term-slope state classification + model performance by regime + transition matrix),
formal ablation-study engine (component removal impact with bootstrap effect intervals),
leakage-free walk-forward protocol (rolling train/validate/test windows with strict temporal separation),
SVI-based static-arbitrage cleaning layer (butterfly/calendar diagnostics),
second rough-model baseline via rough-Bergomi-style calibration hook,
sequential/state-space calibration filter for time-varying latent Heston parameters,
hedging robustness backtests under model misspecification (GBM-hedged vs true Heston),
delta-vega hedging extensions with transaction-cost frontier and rebalance-frequency stability analysis,
execution-aware hedging model (bid/ask spread, slippage, impact, and partial fills),
microstructure-aware execution stress layer (latency, queue fills, temporary/permanent impact),
portfolio-level hedging risk overlay (multi-asset VaR/CVaR, diversification ratio, ES contributions),
calibration uncertainty via bootstrap and Bayesian posterior diagnostics,
parameter identifiability diagnostics (profile slices + posterior geometry/conditioning),
VaR/ES backtesting diagnostics with Kupiec and Christoffersen tests,
econometric validation (Diebold-Mariano, block bootstrap CI, Holm-Bonferroni),
global multiple-testing control across claims/econometrics/ablation/crisis tests (Holm + Benjamini-Hochberg),
advanced forecast-validation econometrics (White Reality Check, Hansen-SPA style test, model confidence set),
free-tier market data adapters (yahoo_free no-key, polygon_free key-based free tier),
additional free-tier adapter (fmp_free) with multi-endpoint fallback and payload normalization (auto quote_proxy fallback when direct option-chain entitlement is unavailable),
model-risk spread and stress diagnostics,
HPC scaling harness (CPU/GPU benchmark rows + multi-GPU speedup projections),
CUDA auto-tuning harness for Bates kernel launch config (threads/streams/mixed-precision search),
experiment registry artifacts (MLflow-style JSON/CSV run records),
reproducibility hash bundle (reproducibility_hashes.json) with deterministic probe and verification checks,
manuscript package generation (manuscript.md, appendix.md, manuscript.tex),
auto-generated results chapter drafts (results_chapter.md, results_chapter.tex),
claim-to-code traceability package (claim_code_traceability.csv, defense_brief.md, interview_qna.md),
error decomposition, plus publication-ready tables/figures generation.

Run the quick pipeline:

mc-research --output-dir artifacts/research

Run the full (slower) pipeline:

mc-research --full --output-dir artifacts/research_full

Run with free live option-chain ingestion (Yahoo, no key required):

mc-research --market-symbol AAPL --market-provider yahoo_free --output-dir artifacts/research_live

If you prefer a key-based free provider (Polygon free tier), pass your key:

mc-research --market-symbol AAPL --market-provider polygon_free --market-api-key YOUR_KEY

For Financial Modeling Prep free tier:

mc-research --market-symbol AAPL --market-provider fmp_free --market-api-key YOUR_KEY

Artifacts written per run:

manifest.json (environment + seed + commit),
results.json (all experiment outputs),
claims.json (claim-by-claim pass/fail evidence),
summary.md (human-readable run summary).
registry/ (experiment run JSON + latest metric CSV + tags),
paper_package/ (manuscript and appendix drafts),
results_chapter/ (paper-ready chapter markdown + latex),
traceability/ (claim-to-code map + defense/interview briefs),
reproducibility_hashes.json and deterministic verification report.

Publication export now also includes LaTeX tables and a forecast leaderboard:

publication_assets/tables/claims_summary.tex
publication_assets/tables/metrics_summary.tex
publication_assets/tables/forecast_leaderboard.csv
publication_assets/tables/challenger_leaderboard.csv
publication_assets/tables/econometrics_summary.csv
publication_assets/tables/walkforward_windows.csv
publication_assets/tables/state_space_estimates.csv
publication_assets/tables/crisis_episode_performance.csv
publication_assets/tables/crisis_dm_tests.csv
publication_assets/tables/ablation_study.csv
publication_assets/tables/cuda_tuning_candidates.csv
publication_assets/tables/structural_breaks.csv
publication_assets/tables/global_multiple_testing.csv

Reproducibility via DVC is scaffolded with:

dvc.yaml (quick/full pipeline stages),
params.yaml (seed/mode/provider defaults),
.dvcignore.

5. Extended Features Documentation

5.1 Quadratic-Exponential (QE) Variance Scheme

The QE scheme (Andersen, 2008) provides numerically stable simulation of the Heston variance process:

Problem Solved: Euler-Maruyama can produce negative variance
Solution: Moment-matching with quadratic/exponential switching
Benefit: 4-8x fewer time steps needed for same accuracy

Mathematical Details:

ψ = s²/m² (coefficient of variation squared)

If ψ ≤ ψ_crit (1.5):
    Use quadratic scheme: v_{n+1} = a(√b + Z)²

If ψ > ψ_crit:
    Use exponential scheme with mass at zero

5.2 Barrier Options

Supported barrier types:

Type	Description
`barrier_up_out_call/put`	Knocked out if max(S) ≥ barrier
`barrier_up_in_call/put`	Knocked in if max(S) ≥ barrier
`barrier_down_out_call/put`	Knocked out if min(S) ≤ barrier
`barrier_down_in_call/put`	Knocked in if min(S) ≤ barrier

Parity Relation: Knock-in + Knock-out = Vanilla

5.3 Control Variates

Uses geometric Asian option (closed-form) as control for arithmetic Asian:

V_cv = V_arith - β(V_geom - E[V_geom])

where β = Cov(V_arith, V_geom) / Var(V_geom)

Expected Variance Reduction: 5-10x

5.4 Greeks

All Greeks calculated using central finite differences:

Greek	Formula	Description
Delta	(V(S+ε) - V(S-ε)) / 2ε	Price sensitivity to spot
Gamma	(V(S+ε) - 2V(S) + V(S-ε)) / ε²	Delta sensitivity to spot
Vega	(V(σ+ε) - V(σ-ε)) / 2ε	Price sensitivity to vol
Theta	V(T-dt) - V(T)	Time decay per day
Rho	(V(r+ε) - V(r-ε)) / 2ε	Rate sensitivity

6. File Structure

monte-carlo-sim-CUDA-main/
├── bates_kernel.cu           # Original Bates CUDA kernel
├── bates_wrapper.cpp         # Original pybind11 wrapper
├── bates_kernel_extended.cu  # Extended kernel (QE, barriers, Greeks)
├── bates_wrapper_extended.cpp # Extended wrapper
├── mc_pricer.py              # Python MC library
├── test_bates.py             # Tests for original kernel
├── test_bates_extended.py    # Tests for extended features
├── build.sh                  # Build script for original kernel
├── build_extended.sh         # Build script for extended kernel
├── requirements.txt          # Python dependencies
├── notebook.ipynb            # Jupyter notebook with analysis
└── README.md                 # This file

7. Conclusion

This project successfully delivered a powerful, GPU-accelerated framework for modern quantitative finance. It spans the full stack from advanced mathematical models and numerical methods to low-level performance engineering.

Key Technical Contributions:

255x speedup over NumPy with custom CUDA kernels
QE scheme for stable variance simulation
Control variates for 5-10x variance reduction
Comprehensive barrier/lookback option support
Full Greek surface calculation

The framework is designed to be extensible for additional models (SABR, rough volatility), payoffs (cliquet, autocallable), and methods (LSMC for Bermudans).

8. References

Bates, D. (1996). "Jumps and Stochastic Volatility: Exchange Rate Processes Implicit in Deutsche Mark Options." Review of Financial Studies.
Heston, S. (1993). "A Closed-Form Solution for Options with Stochastic Volatility." Review of Financial Studies.
Andersen, L. (2008). "Simple and efficient simulation of the Heston model." Journal of Computational Finance.
Glasserman, P. (2003). Monte Carlo Methods in Financial Engineering. Springer.
Giles, M. (2008). "Multilevel Monte Carlo Path Simulation." Operations Research.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
api		api
research		research
tests		tests
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
LICENSE		LICENSE
README.md		README.md
bates_kernel.cu		bates_kernel.cu
bates_kernel_cpp.so		bates_kernel_cpp.so
bates_kernel_extended.cu		bates_kernel_extended.cu
bates_wrapper.cpp		bates_wrapper.cpp
bates_wrapper.cu		bates_wrapper.cu
bates_wrapper.o		bates_wrapper.o
bates_wrapper_extended.cpp		bates_wrapper_extended.cpp
build.sh		build.sh
build_extended.sh		build_extended.sh
cache.py		cache.py
calibration.py		calibration.py
docker-compose.yml		docker-compose.yml
dvc.yaml		dvc.yaml
mc_pricer.py		mc_pricer.py
nn_surrogate.py		nn_surrogate.py
notebook.ipynb		notebook.ipynb
notebook_outputs.txt		notebook_outputs.txt
params.yaml		params.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
risk_metrics.py		risk_metrics.py
rl_hedging.py		rl_hedging.py
setup.py		setup.py
test_bates.py		test_bates.py
test_bates_extended.py		test_bates_extended.py
xva.py		xva.py

License

pdwi2020/monte-carlo-sim-CUDA

Folders and files

Latest commit

History

Repository files navigation

monte-carlo-sim-CUDA

Advanced Monte Carlo Methods for Exotic Option Pricing with CUDA Acceleration

1. Project Overview

Key Achievements

New in v2.0: Extended Features

2. Technical Stack

3. Project Journey & Key Results

Part 1: Foundational Pricing Engine & GPU Acceleration

Part 2: Advanced Modeling - Heston & Asian Options

Part 3: Advanced Numerical Methods - QMC & MLMC

Convergence Analysis (MC vs. QMC)

Multilevel Monte Carlo (MLMC)

Part 4: Financial Application - Risk Analysis of Greeks

Part 5: The Final Frontier - Bates Model & Custom CUDA C++ Kernel

Bates Model Implementation

Custom CUDA C++ Kernel Benchmark

4. How to Run

Environment Setup

Building CUDA Kernels

Quick Start Examples

Running Tests

Doctoral Research Pipeline

5. Extended Features Documentation

5.1 Quadratic-Exponential (QE) Variance Scheme

5.2 Barrier Options

5.3 Control Variates

5.4 Greeks

6. File Structure

7. Conclusion

8. References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages