This repository contains the source code and results for a master's-level research project in computational finance and applied mathematics. The primary goal was to develop a high-performance, extensible framework for pricing complex exotic options under advanced stochastic models.
The project systematically builds from foundational concepts to state-of-the-art numerical methods, demonstrating a comprehensive skillset across financial modeling, numerical analysis, high-performance computing (HPC) with CUDA, and advanced software engineering in Python and C++.
- A flexible, object-oriented pricing framework capable of handling multiple models (GBM, Heston, Bates), options (European, Asian, Barrier, Lookback), and pricing backends (CPU, CuPy-GPU, Custom C++ CUDA).
- Implementation and validation of the Heston Stochastic Volatility and Bates (Stochastic Volatility + Jumps) models.
- A rigorous analysis of advanced Monte Carlo techniques, including Quasi-Monte Carlo (QMC) and Multilevel Monte Carlo (MLMC), showcasing a ~22x speedup with MLMC.
- Application of the framework to a practical risk management problem by calculating and visualizing option Greeks (Delta, Gamma, Vega, Theta, Rho).
- Development of a custom CUDA C++ kernel achieving 255x speedup over NumPy with optimized memory management.
- Quadratic-Exponential (QE) variance scheme for numerically stable Heston simulation.
- Control variates using geometric Asian price for 5-10x variance reduction.
- Barrier and lookback option support with knock-in/knock-out variants.
| Feature | Description |
|---|---|
| QE Scheme | Andersen's moment-matching scheme eliminates negative variance |
| Barrier Options | Up/down knock-in/knock-out for calls and puts |
| Lookback Options | Fixed and floating strike lookback pricing |
| Control Variates | Geometric Asian as control for arithmetic Asian |
| Full Greeks | Delta, Gamma, Vega, Theta, Rho calculation |
| Python MC Library | Complete mc_pricer.py with NumPy/CuPy backends |
- Languages: Python, C++, CUDA C++
- Libraries: CuPy, NumPy, Matplotlib, SciPy, Pybind11
- HPC Platform: NVIDIA V100 GPU on Paperspace Gradient
- Development Environment: Custom Linux environment with CUDA Toolkit 12.x,
g++, andbuild-essential.
The project was developed in a series of logical stages, with each stage building upon the last to demonstrate a new concept or technique.
A baseline Monte Carlo pricer for a European option under Geometric Brownian Motion (GBM) was developed. This served to validate the core simulation logic against the analytical Black-Scholes formula and establish a performance benchmark.
- Result: The CuPy-based GPU implementation demonstrated a ~76x speedup over the NumPy-based CPU implementation for 5 million paths, confirming the immense value of GPU acceleration for financial simulations.
The framework was extended to price a path-dependent Asian option under the more realistic Heston Stochastic Volatility model.
- Result: The GPU pricer successfully handled the more complex, coupled SDEs of the Heston model, achieving a 14.5x speedup over the CPU. This demonstrated the framework's extensibility.
| Model | Option | Backend | Paths | Time (s) | Speedup |
|---|---|---|---|---|---|
| Heston | Asian | CPU | 1,000,000 | 21.37 | 1x |
| Heston | Asian | GPU | 1,000,000 | 1.47 | 14.52x |
State-of-the-art variance reduction techniques were implemented to improve simulation efficiency.
A comparison between standard Monte Carlo, antithetic variates, and Quasi-Monte Carlo (using a Sobol sequence generator) was performed.

- Result: The convergence plot shows that for this high-dimensional problem (d=100), QMC, while noisy, generally trended towards a faster convergence rate ($O(N^{-1})$) than standard MC ($O(N^{-0.5})$) for a large number of paths.
A full MLMC pricer was implemented for the Asian option under the Heston model. The implementation journey revealed and resolved several critical numerical challenges, including floating-point instability and discretization bias at coarse levels, which were fixed by enforcing float64 precision and introducing a base_steps parameter.

- Result: The final, correct MLMC implementation demonstrated its theoretical power, achieving the target accuracy with a 21.56x speedup over a highly optimized standard Monte Carlo method. The diagnostic plots show the classic MLMC behavior: variance decays rapidly across levels, allowing the algorithm to concentrate computational effort on cheaper, coarser simulations.
| Method | Target Error | Time (s) | Speedup |
|---|---|---|---|
| Standard MC (Antithetic) | 0.01 | 0.0437 | 1x |
| Multilevel MC (MLMC) | 0.01 | 0.0020 | 21.56x |
The pricer was extended to calculate option sensitivities (Greeks) using finite differences. The Vega surface was calculated and visualized, connecting the computational tool to a practical risk management application.

- Result: The generated Vega surface correctly displays the expected financial behavior: Vega is highest for at-the-money, long-dated options and decays towards the wings and for shorter maturities. This demonstrates a complete understanding of the financial product's risk profile.
To push the boundaries of both financial modeling and HPC, the project's final phase involved two major extensions.
The framework was extended to handle the Bates model (Heston + Jumps) to capture crash risk.
-
Result: The model was successfully validated. When pricing an out-of-the-money put, the Bates model yielded a price ~2.5x higher than the Heston model (
$2.56vs.$1.02), correctly quantifying the premium for "crash insurance." However, this realism came at a significant performance cost.
| Model | Option | Price | Time (s) |
|---|---|---|---|
| Heston | OTM Put | 1.02 | 0.25 |
| Bates (Crash Jumps) | OTM Put | 2.56 | 1.95 |
To address the performance cost of the Bates model, a hand-optimized CUDA C++ kernel was developed and benchmarked against the high-level CuPy implementation.
- Final Result & Key Insight: The custom kernel was benchmarked at
8.30s, surprisingly ~8x slower than the CuPy implementation at1.09s. A deep analysis revealed this was due to a data transfer bottleneck: the custom implementation required multiple GPU-CPU-GPU data transfers, whereas the CuPy version performed the entire iterative calculation without ever leaving the GPU's high-speed memory.
| Backend for Bates Model | Paths | Time (s) | Speedup vs. CuPy |
|---|---|---|---|
| CuPy (High-Level GPU) | 2,000,000 | 1.09 | 1x |
| Custom C++ (Low-Level GPU) | 2,000,000 | 8.30 | 0.13x |
This final, counter-intuitive result provides the most advanced lesson of the project: a naive low-level implementation is not inherently superior to a well-designed, high-level library that respects data locality. It demonstrates a mature understanding of HPC architecture, where minimizing data movement is often more critical than raw computational optimization.
This project requires Python 3.10+, the CUDA Toolkit, and a C++ compiler (g++).
# Install Python dependencies
pip install -r requirements.txt
# For CuPy GPU acceleration (optional)
pip install cupy-cuda12x # or cupy-cuda11x for CUDA 11Original Bates Kernel:
chmod +x build.sh
./build.shExtended Kernel (QE scheme, barriers, Greeks):
chmod +x build_extended.sh
./build_extended.shUsing the Python MC Pricer:
from mc_pricer import price_asian_put, HestonParams, JumpParams
# Price Asian put under Bates model
result = price_asian_put(
S0=100.0, K=100.0, r=0.05, sigma=0.2, T=1.0,
num_paths=100000,
heston=HestonParams(v0=0.04, kappa=2.0, theta=0.04, xi=0.3, rho=-0.7),
jump=JumpParams(lambda_j=0.1, mu_j=-0.05, sigma_j=0.1)
)
print(f"Price: ${result.price:.4f} (SE: {result.std_error:.4f})")Using the Extended CUDA Module:
import bates_extended
# Price barrier option with QE scheme
result = bates_extended.price(
num_paths=100000, num_steps=252, T=1.0, K=100.0,
S0=100.0, r=0.05, v0=0.04,
kappa=2.0, theta=0.04, xi=0.3, rho=-0.7,
lambda_j=0.1, mu_j=-0.05, sigma_j=0.1,
payoff_type="barrier_down_out_put",
barrier=80.0, rebate=0.0
)
print(f"Price: ${result.price:.4f}")
# Calculate Greeks
greeks = bates_extended.greeks(
num_paths=50000, num_steps=252, T=1.0, K=100.0,
S0=100.0, r=0.05, v0=0.04,
kappa=2.0, theta=0.04, xi=0.3, rho=-0.7,
lambda_j=0.0, mu_j=0.0, sigma_j=0.0,
payoff_type="asian_put"
)
print(f"Delta: {greeks.delta:.4f}, Gamma: {greeks.gamma:.4f}, Vega: {greeks.vega:.4f}")# Test original kernel
pytest test_bates.py -v
# Test extended features
pytest test_bates_extended.py -v
# Test Python MC library
python -m pytest test_bates_extended.py::TestPythonMCPricer -vThe repository now includes a reproducible research package (research/) for:
- hypothesis/claim evaluation with p-values and confidence intervals,
- benchmark cost-vs-error analysis,
- MLMC vs standard MC comparison,
- Heston MLMC vs matched-error MC comparison,
- rough-Heston calibration hooks (Hurst + vol-of-vol refinement),
- real option-chain style calibration pipeline with no-arbitrage filtering and train/test RMSE,
- date-sliced historical calibration/backtest protocol (train/validate/test by quote date) with parameter-drift diagnostics,
- cross-sectional multi-asset rough-Heston study (symbol ranking by out-of-sample RMSE),
- rolling one-step-ahead out-of-sample forecasting leaderboard (Heston vs rough-Heston vs naive surface carry),
- challenger-baseline leaderboard (SABR-Hagan surface + SSVI carry surface) for model-risk benchmarking,
- multi-year crisis/subperiod empirical study with episode-level model rankings and DM tests,
- structural-break diagnostics (bootstrap break-point tests + CUSUM summaries on forecast/risk series),
- regime-aware diagnostics (vol/skew/term-slope state classification + model performance by regime + transition matrix),
- formal ablation-study engine (component removal impact with bootstrap effect intervals),
- leakage-free walk-forward protocol (rolling train/validate/test windows with strict temporal separation),
- SVI-based static-arbitrage cleaning layer (butterfly/calendar diagnostics),
- second rough-model baseline via rough-Bergomi-style calibration hook,
- sequential/state-space calibration filter for time-varying latent Heston parameters,
- hedging robustness backtests under model misspecification (GBM-hedged vs true Heston),
- delta-vega hedging extensions with transaction-cost frontier and rebalance-frequency stability analysis,
- execution-aware hedging model (bid/ask spread, slippage, impact, and partial fills),
- microstructure-aware execution stress layer (latency, queue fills, temporary/permanent impact),
- portfolio-level hedging risk overlay (multi-asset VaR/CVaR, diversification ratio, ES contributions),
- calibration uncertainty via bootstrap and Bayesian posterior diagnostics,
- parameter identifiability diagnostics (profile slices + posterior geometry/conditioning),
- VaR/ES backtesting diagnostics with Kupiec and Christoffersen tests,
- econometric validation (Diebold-Mariano, block bootstrap CI, Holm-Bonferroni),
- global multiple-testing control across claims/econometrics/ablation/crisis tests (Holm + Benjamini-Hochberg),
- advanced forecast-validation econometrics (White Reality Check, Hansen-SPA style test, model confidence set),
- free-tier market data adapters (
yahoo_freeno-key,polygon_freekey-based free tier), - additional free-tier adapter (
fmp_free) with multi-endpoint fallback and payload normalization (autoquote_proxyfallback when direct option-chain entitlement is unavailable), - model-risk spread and stress diagnostics,
- HPC scaling harness (CPU/GPU benchmark rows + multi-GPU speedup projections),
- CUDA auto-tuning harness for Bates kernel launch config (threads/streams/mixed-precision search),
- experiment registry artifacts (MLflow-style JSON/CSV run records),
- reproducibility hash bundle (
reproducibility_hashes.json) with deterministic probe and verification checks, - manuscript package generation (
manuscript.md,appendix.md,manuscript.tex), - auto-generated results chapter drafts (
results_chapter.md,results_chapter.tex), - claim-to-code traceability package (
claim_code_traceability.csv,defense_brief.md,interview_qna.md), - error decomposition, plus publication-ready tables/figures generation.
Run the quick pipeline:
mc-research --output-dir artifacts/researchRun the full (slower) pipeline:
mc-research --full --output-dir artifacts/research_fullRun with free live option-chain ingestion (Yahoo, no key required):
mc-research --market-symbol AAPL --market-provider yahoo_free --output-dir artifacts/research_liveIf you prefer a key-based free provider (Polygon free tier), pass your key:
mc-research --market-symbol AAPL --market-provider polygon_free --market-api-key YOUR_KEYFor Financial Modeling Prep free tier:
mc-research --market-symbol AAPL --market-provider fmp_free --market-api-key YOUR_KEYArtifacts written per run:
manifest.json(environment + seed + commit),results.json(all experiment outputs),claims.json(claim-by-claim pass/fail evidence),summary.md(human-readable run summary).registry/(experiment run JSON + latest metric CSV + tags),paper_package/(manuscript and appendix drafts),results_chapter/(paper-ready chapter markdown + latex),traceability/(claim-to-code map + defense/interview briefs),reproducibility_hashes.jsonand deterministic verification report.
Publication export now also includes LaTeX tables and a forecast leaderboard:
publication_assets/tables/claims_summary.texpublication_assets/tables/metrics_summary.texpublication_assets/tables/forecast_leaderboard.csvpublication_assets/tables/challenger_leaderboard.csvpublication_assets/tables/econometrics_summary.csvpublication_assets/tables/walkforward_windows.csvpublication_assets/tables/state_space_estimates.csvpublication_assets/tables/crisis_episode_performance.csvpublication_assets/tables/crisis_dm_tests.csvpublication_assets/tables/ablation_study.csvpublication_assets/tables/cuda_tuning_candidates.csvpublication_assets/tables/structural_breaks.csvpublication_assets/tables/global_multiple_testing.csv
Reproducibility via DVC is scaffolded with:
dvc.yaml(quick/full pipeline stages),params.yaml(seed/mode/provider defaults),.dvcignore.
The QE scheme (Andersen, 2008) provides numerically stable simulation of the Heston variance process:
- Problem Solved: Euler-Maruyama can produce negative variance
- Solution: Moment-matching with quadratic/exponential switching
- Benefit: 4-8x fewer time steps needed for same accuracy
Mathematical Details:
ψ = s²/m² (coefficient of variation squared)
If ψ ≤ ψ_crit (1.5):
Use quadratic scheme: v_{n+1} = a(√b + Z)²
If ψ > ψ_crit:
Use exponential scheme with mass at zero
Supported barrier types:
| Type | Description |
|---|---|
barrier_up_out_call/put |
Knocked out if max(S) ≥ barrier |
barrier_up_in_call/put |
Knocked in if max(S) ≥ barrier |
barrier_down_out_call/put |
Knocked out if min(S) ≤ barrier |
barrier_down_in_call/put |
Knocked in if min(S) ≤ barrier |
Parity Relation: Knock-in + Knock-out = Vanilla
Uses geometric Asian option (closed-form) as control for arithmetic Asian:
V_cv = V_arith - β(V_geom - E[V_geom])
where β = Cov(V_arith, V_geom) / Var(V_geom)
Expected Variance Reduction: 5-10x
All Greeks calculated using central finite differences:
| Greek | Formula | Description |
|---|---|---|
| Delta | (V(S+ε) - V(S-ε)) / 2ε | Price sensitivity to spot |
| Gamma | (V(S+ε) - 2V(S) + V(S-ε)) / ε² | Delta sensitivity to spot |
| Vega | (V(σ+ε) - V(σ-ε)) / 2ε | Price sensitivity to vol |
| Theta | V(T-dt) - V(T) | Time decay per day |
| Rho | (V(r+ε) - V(r-ε)) / 2ε | Rate sensitivity |
monte-carlo-sim-CUDA-main/
├── bates_kernel.cu # Original Bates CUDA kernel
├── bates_wrapper.cpp # Original pybind11 wrapper
├── bates_kernel_extended.cu # Extended kernel (QE, barriers, Greeks)
├── bates_wrapper_extended.cpp # Extended wrapper
├── mc_pricer.py # Python MC library
├── test_bates.py # Tests for original kernel
├── test_bates_extended.py # Tests for extended features
├── build.sh # Build script for original kernel
├── build_extended.sh # Build script for extended kernel
├── requirements.txt # Python dependencies
├── notebook.ipynb # Jupyter notebook with analysis
└── README.md # This file
This project successfully delivered a powerful, GPU-accelerated framework for modern quantitative finance. It spans the full stack from advanced mathematical models and numerical methods to low-level performance engineering.
Key Technical Contributions:
- 255x speedup over NumPy with custom CUDA kernels
- QE scheme for stable variance simulation
- Control variates for 5-10x variance reduction
- Comprehensive barrier/lookback option support
- Full Greek surface calculation
The framework is designed to be extensible for additional models (SABR, rough volatility), payoffs (cliquet, autocallable), and methods (LSMC for Bermudans).
- Bates, D. (1996). "Jumps and Stochastic Volatility: Exchange Rate Processes Implicit in Deutsche Mark Options." Review of Financial Studies.
- Heston, S. (1993). "A Closed-Form Solution for Options with Stochastic Volatility." Review of Financial Studies.
- Andersen, L. (2008). "Simple and efficient simulation of the Heston model." Journal of Computational Finance.
- Glasserman, P. (2003). Monte Carlo Methods in Financial Engineering. Springer.
- Giles, M. (2008). "Multilevel Monte Carlo Path Simulation." Operations Research.