This DeFi Financial Risk Management tutorial was researched and tested by Marcos.
He may be reached through Code Sport's contact us page
- Author Credits
- Monte Carlo Simulations in Financial Risk Management
- 1. What is a Monte Carlo Simulation?
- 1.1.0 What is a Regression Analysis?
- 2. Number of Simulations: 500 vs 5000
- 3. Probability of Touching a Price
- 4. Liquidation Risk Model
- 5. Value-at-Risk (VaR)
- 6. Drift (μ) and Volatility (σ)
- 7. Confidence Intervals and Percentiles
- 8. Business Requirement: Ensuring <5% Chance of Undercollateralization
- 8c. Python Implementation Set 2
- ✅ Conclusion
- Appendix I: GitHub Actions and Workflows
- Appendix II: Scikit-learn
- Appendix III: When Principal Component Analysis (PCA) is Used in Finance & Risk Modeling
This tutorial explains Monte Carlo methods through the lens of DeFi liquidation risk with equations, Python code, and charts for intuition.
Monte Carlo simulations allow us to model uncertainty, stress‑test financial portfolios, and make informed risk decisions.
We’ll use Ethereum (ETH) as the consistent underlying asset
A Monte Carlo simulation is a method for estimating the probability distribution of outcomes by generating a large number of random trials. A repeated random sampling technique used to estimate the probability distribution of uncertain outcomes.
Regression analysis attempts to establish a relationship between a dependent variable and one or more independent variables (factors). It helps predict the value of the dependent variable based on the values of the independent variables. In the case of housing, hme prices would be the dependent variable while employment rate, yield on the 10-year Treasury, and fed-funds rate are factors that drive home prices.
R-squared assesses how well independent variables predicts a dependent variable
In finance, asset prices are often modeled using Geometric Brownian Motion (GBM):
According to C. Browne:
GBM is the stochastic differential equation (SDE) that underlies the Black Scholes Merton (BSM) model. GBM is converted into the BSM partial differential equation (PDE) for option price movements using Ito's Lemma Source: C. Browne
Where:
-
$S_t$ : Asset price at time t -
$\mu$ : Drift (expected return per step) = mean of returns -
$\sigma$ : Volatility (standard deviation of returns) -
$\Delta t$ : Time increment -
$Z \sim \mathcal{N}(0,1)$ : Standard normal random variable
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests
# === STEP 1: Fetch historical price data (CoinGecko API for ETH) ===
def fetch_crypto_data(coin_id="ethereum", vs_currency="usd", days=365):
url = f"https://api.coingecko.com/api/v3/coins/{coin_id}/market_chart"
params = {"vs_currency": vs_currency, "days": days}
data = requests.get(url, params=params).json()
prices = [p[1] for p in data['prices']]
return pd.Series(prices)
eth_prices = fetch_crypto_data()
log_returns = np.log(eth_prices / eth_prices.shift(1)).dropna()
# === STEP 2: Compute log returns to estimate drift (mu) and vol (sigma) ===
mu = log_returns.mean()
sigma = log_returns.std()
S0 = eth_prices.iloc[-1] # last known ETH price. Latest ETH price (at t=0)
print(f"Estimated Daily Drift (mu): {mu:.6f}")
print(f"Estimated Daily Volatility (sigma): {sigma:.6f}")
print(f"Latest ETH Price): {S0:.2f}")
# === STEP 3: Monte Carlo Simulation ===
# ...
Note
Full code us availble at: notebooks/nb-base-mcs-stocks.ipynb
Monte Carlo methods are often used to price options. American options can be exercised at any time before expiration. European options cannot.
Monte Carlo methods are widely used to price complex derivatives like American or path‑dependent options (e.g., Asian options, barrier options).
The key idea:
-
Simulate many possible stock paths under GBM.
-
Compute the payoff of the option along each path.
-
Discount the average payoff back to today.
The Black–Scholes formula for a European call option is:
Where:
-
$C$ : Call price -
$S_0$ : Current asset price -
$K$ : Strike price -
$T$ : Time to maturity (in years) -
$r$ : Risk-free interest rate -
$N(\cdot)$ : Cumulative distribution function of the standard normal $d_1 = \frac{\ln(S_0/K) + (r + 0.5\sigma^2)T}{\sigma \sqrt{T}}$ $d_2 = d_1 - \sigma\sqrt{T}$
import numpy as np
from scipy.stats import norm
def european_call_price(S0, K, T, r, sigma):
d1 = (np.log(S0 / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
d2 = d1 - sigma * np.sqrt(T)
return S0 * norm.cdf(d1) - K * np.exp(-r * T) * norm.cdf(d2)
# Example: Pricing ETH European Call Option
S0 = 2000 # Current ETH price
K = 2100 # Strike price
T = 0.5 # Time to maturity (0.5 years)
r = 0.03 # Risk-free rate
sigma = 0.5 # Volatility (50%)
price = european_call_price(S0, K, T, r, sigma)
print(f"European Call Option Price: {price:.2f}")
American options allow early exercise, making closed-form solutions more complex. One numerical method is the Longstaff–Schwartz Monte Carlo algorithm, which uses regression to estimate the continuation value.
The valuation relies on the Longstaff–Schwartz least squares method, which estimates the continuation value via regression. The value of an American call option can be expressed as:
Where:
-
$C_0$ = value of the American call -
$S_t$ = simulated stock price at time$t$ -
$K$ = strike price -
$r$ = risk free interest rate -
$T$ = expiration time - “optimal exercise policy” = decision rule from regression continuation value
import numpy as np
# --- Step 1: Simulate Geometric Brownian Motion paths ---
def simulate_gbm(S0, r, sigma, T, steps, sims):
dt = T / steps
paths = np.zeros((steps+1, sims))
paths[0] = S0
for t in range(1, steps+1):
Z = np.random.standard_normal(sims)
paths[t] = paths[t-1] * np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z)
return paths
# --- Step 2: Price American Call using LSM ---
def price_american_call(S0, K, r, sigma, T, steps=50, sims=5000):
dt = T / steps
paths = simulate_gbm(S0, r, sigma, T, steps, sims)
payoffs = np.maximum(paths - K, 0)
V = payoffs[-1] # option values at maturity
for t in reversed(range(1, steps)):
itm = payoffs[t] > 0 # in-the-money paths
if np.any(itm):
X = paths[t, itm]
Y = V[itm] * np.exp(-r * dt) # discounted continuation values
# Regression to estimate continuation value
coeffs = np.polyfit(X, Y, 2)
continuation = np.polyval(coeffs, X)
# Exercise if immediate payoff better than continuation
exercise = payoffs[t, itm]
V[itm] = np.where(exercise > continuation, exercise, V[itm] * np.exp(-r * dt))
V[~itm] = V[~itm] * np.exp(-r * dt)
return np.mean(V) * np.exp(-r * dt)
S0 = 3700 # ETH starting price
K = 3800 # Strike price
r = 0.02 # 2% risk-free rate
sigma = 0.5 # 50% annual volatility
T = 0.25 # 3 months to maturity
price = price_american_call(S0, K, r, sigma, T)
print(f"American Call Option Price: ${price:.2f}")
Monte Carlo simulations generate random paths for the underlying asset's price. The number of simulations chosen directly affects both the accuracy and the computational cost.
For example, this line: paths = np.zeros((rows, columns))
generates a 2D matrix of size row x column and fills it with zeros
We then initialize position 0x0 of our matrix with the current asset price of S0: paths[0] = S0
-
500 simulations
- Faster to run, less accurate. Higher variance in estimates
- Less accurate results due to higher sampling error
- Useful for quick estimates or testing
-
5000 simulations
- Slower, but results converge toward true distribution (Law of Large Numbers).
- Smoother distributions of outcomes
- More accurate estimates of tail risks (e.g., Value-at-Risk)
- Higher computational cost but often necessary for risk-sensitive decisions
Pro Tip: Use more paths until results stabilize, balancing speed vs accuracy.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
sim_counts = [500, 5000]
colors = ["red", "blue"]
for sims, color in zip(sim_counts, colors):
paths = np.zeros((30, sims))
paths[0] = S0
for t in range(1, 30):
Z = np.random.standard_normal(sims)
paths[t] = paths[t-1] * np.exp((mu - 0.5*sigma**2) + sigma*Z)
plt.plot(paths[:, :10], color=color, alpha=0.4, label=f"{sims} sims" if sims==500 else "")
plt.title("ETH Monte Carlo Simulation: 500 vs 5000 Simulations")
plt.xlabel("Day")
plt.ylabel("ETH Price (USD)")
plt.legend()
plt.grid(True)
plt.show()
In finance, we often need to estimate the probability that an asset’s price touches (falls below or rises above) a critical level within a given time horizon.
For example, in lending protocols, liquidation may occur if the asset price drops below a certain threshold.
The probability that ETH touches a barrier
Tip
In probability/statistics, instead of #
for “number of,” use cardinality notation with absolute values:
#{sumlatioins}
becomes |simulations|
import numpy as np
def probability_of_touch(S0, mu, sigma, T, barrier, simulations=5000):
paths = np.zeros((T, simulations))
paths[0] = S0
for t in range(1, T):
Z = np.random.standard_normal(simulations)
paths[t] = paths[t-1] * np.exp((mu - 0.5*sigma**2) + sigma*Z)
# Check if barrier touched in each simulation
touched = np.any(paths <= barrier, axis=0)
return touched.mean()
# Example: Probability ETH touches $1800 in 30 days
barrier = 1800
T = 30
prob_touch = probability_of_touch(S0, mu, sigma, T, barrier)
print(f"Probability ETH touches ${barrier} in {T} days: {prob_touch:.2%}")
If the barrier is far below the current price or LTV, the probability of touch will be low.
If the barrier is close to or above the current price or LTV, the probability increases sharply.
This technique is widely used in barrier option pricing and in estimating probabilty of liquidation (risk assessment) in DeFi lending protocols.
In DeFi lending, borrowers provide collateral (e.g., ETH) to take a loan in stablecoins or another asset.
The Loan-to-Value (LTV) ratio measures the risk of liquidation:
Where:
-
$L$ : Loan value (USD) -
$C$ : Collateral amount (ETH) -
$S_t$ : ETH price at time$t$
A liquidation occurs if:
import numpy as np
def probability_of_liquidation(S0, mu, sigma, T, loan_usd, collateral_eth, ltv_crit, simulations=5000):
paths = np.zeros((T, simulations))
paths[0] = S0
for t in range(1, T):
Z = np.random.standard_normal(simulations)
paths[t] = paths[t-1] * np.exp((mu - 0.5*sigma**2) + sigma*Z)
# Compute LTV paths
collateral_values = paths * collateral_eth
ltv_paths = loan_usd / collateral_values
# Check if liquidation threshold breached
liquidations = np.any(ltv_paths >= ltv_crit, axis=0)
return liquidations.mean()
# Example: Liquidation risk
loan_usd = 25000
collateral_eth = 10
ltv_crit = 0.8
T = 30
prob_liq = probability_of_liquidation(S0, mu, sigma, T, loan_usd, collateral_eth, ltv_crit)
print(f"Probability of liquidation in {T} days: {prob_liq:.2%}")
- Higher collateral or lower loan amount reduces liquidation risk.
- Lower ETH price or higher volatility increases risk.
- DeFi Protocols set liquidation thresholds (
$\text{LTV}_{crit}$ ) to protect lenders
VaR answers: “What’s the worst I can lose with 95% confidence in T days?”
Value-at-Risk (VaR) is a widely used risk measure that estimates the maximum potential loss of a portfolio within a given time horizon at a specified confidence level.
For example, a 95% VaR represents the maximum loss you would expect 95% of the time over the period considered.
For a Monte Carlo simulation:
Where:
-
$VaR_{95}$ : Value-at-Risk at 95% confidence -
$\text{Percentile}_{5%}$ : 5th percentile of simulated returns or portfolio values
import numpy as np
def value_at_risk(S0, mu, sigma, T, loan_usd, collateral_eth, simulations=5000, percentile=5):
paths = np.zeros((T, simulations))
paths[0] = S0
for t in range(1, T):
Z = np.random.standard_normal(simulations)
paths[t] = paths[t-1] * np.exp((mu - 0.5*sigma**2) + sigma*Z)
final_prices = paths[-1]
portfolio_values = final_prices * collateral_eth - loan_usd
var_value = np.percentile(portfolio_values, percentile)
return var_value
loan_usd = 25000
collateral_eth = 10
T = 30
var_95 = value_at_risk(S0, mu, sigma, T, loan_usd, collateral_eth)
print(f"95% Value-at-Risk over {T} days: ${var_95:,.0f}")
We examine the 5th percentile of outcomes because:
- It represents a worst-case scenario within the 95% confidence band.
- It helps quantify tail risk (rare but severe losses).
- It provides a benchmark for determining capital requirements and collateral safety margins in DeFi lending.
Monte Carlo simulations of asset prices using Geometric Brownian Motion (GBM) require two critical parameters: drift (μ) and volatility (σ).
log_returns = np.log(prices / prices.shift(1)).dropna()
mu = log_returns.mean()
sigma = log_returns.std()
If you want to model n-day drift and vol, you scale depending on how many periods your horizon spans:
-
Drift (μ):
average expected log return per time step (e.g., per day if you data feed is daily closing prices).
Over$n$ days:$$ \mu_n = \mu_{daily} \cdot n $$
-
Volatility (σ):
Measures the uncertainty or dispersion of returns (per day).
Over$n$ days:$$ \sigma_n = \sigma_{daily} \cdot \sqrt{n} $$
Where:
-
$\mu_{daily}$ : mean daily log return -
$\sigma_{daily}$ : standard deviation of daily log returns -
$n$ : number of days in the forecast horizon
In other words:
- Drift gives the directional tendency of price.
- Volatility gives the scale of randomness (how wide the distribution of possible outcomes is).
trading_days = 252
mu_annual = mu_daily * trading_days
sigma_annual = sigma_daily * np.sqrt(trading_days)
# TODO: Add print f
Volatility is typically estimated from historical log returns:
Where:
-
$r_i$ : log return on day$i$ -
$\bar{r}$ : mean of the log returns -
$N$ : number of historical observations
Python Implementation
import pandas as pd
# dummy data set
eth_prices = pd.Series([3000, 3020, 3050, 3010, 3100])
# Compute drift (mu) and volatility (sigma) from ETH daily log returns
log_returns = np.log(eth_prices / eth_prices.shift(1)).dropna()
mu = log_returns.mean()
sigma = log_returns.std()
print(f"Estimated daily drift (μ): {mu:.6f}")
print(f"Estimated daily volatility (σ): {sigma:.6f}")
-
Short windows (e.g., last 30 days):
Capture recent market volatilty behavior. Useful for near‑term predictions. -
Longer windows (e.g., 365 days):
Provide stability in estimates but may lag in capturing regime changes.
Guideline 1: Use shorter lookback for predictions, longer lookback for risk limits.
-
Use shorter windows for trading models.
-
Use longer windows for risk management models.
Guideline 2:
For modeling ETH liquidation risk over 30 days, using 365 days of volatility data is common, as it balances accuracy and stability.
However, during highly volatile periods, shorter windows may better reflect the current environment.
Note
For our risk modeling we calculate volatility using n-days of volatility. Where: n = intended holding period of the asset
Additionally, to account for seasonality we may also use pricing data over the same period but 1 year prior. For example, for n = 7-day holding period:
If we intend to hold asset from August 7, 2025 to August 14, 2025, we'll compute σ using both August 7, 2024 to August 14, 2024 and July 31, 2025 to August 7, 2025.
Monte Carlo simulations produce a distribution of possible asset prices.
To interpret this distribution, we often use confidence intervals (percentiles).
-
5th Percentile (P5):
Represents a bearish / worst-case scenario.
In liquidation modeling, it helps quantify severe downside risk. -
50th Percentile (P50 / Median):
Represents the most likely outcome in the middle of the distribution.
This is often plotted as the central “expected path.” -
95th Percentile (P95):
Represents a bullish / best-case scenario, showing optimistic outcomes.
import numpy as np
import matplotlib.pyplot as plt
# Calculate percentiles
p5 = np.percentile(paths, 5, axis=1)
p50 = np.percentile(paths, 50, axis=1)
p95 = np.percentile(paths, 95, axis=1)
plt.figure(figsize=(10,6))
plt.plot(p50, label="Median (50th Percentile)", color="blue")
plt.fill_between(range(T), p5, p95, color="lightblue", alpha=0.4,
label="5th–95th Percentile Range")
plt.title(f"ETH Monte Carlo Simulation ({T} days)")
plt.xlabel("Day")
plt.ylabel("ETH Price (USD)")
plt.legend()
plt.grid(True)
plt.show()
-
Liquidation Risk:
If the 5th percentile path crosses your liquidation threshold, there’s at least a 5% chance you’ll be liquidated in the given time horizon. -
Protocol Risk Management:
DeFi lending protocols may use 95% confidence bands to set safe collateral ratios. -
Investor View:
Traders can evaluate the upside vs downside balance by comparing the 95th and 5th percentile paths.
A common business requirement in DeFi lending is to ensure that
there is less than a 5% probability of undercollateralization across all loans.
(i.e., loan value exceeding collateral value) within a specified horizon.
For example, you run a DeFi protocol that wants to manage the risk of bad debt:
“We want the probability of undercollateralization (liquidation) to stay below 5% in the next 30 days.”
Undercollateralization occurs when:
To maintain safety, protocols set a critical liquidation threshold (
import numpy as np
def probability_of_liquidation(S0, mu, sigma, T, loan_usd, collateral_eth, ltv_crit, simulations=5000):
paths = np.zeros((T, simulations))
paths[0] = S0
for t in range(1, T):
Z = np.random.standard_normal(simulations)
paths[t] = paths[t-1] * np.exp((mu - 0.5*sigma**2) + sigma*Z)
# Compute LTV paths
collateral_values = paths * collateral_eth
ltv_paths = loan_usd / collateral_values
# Check if liquidation threshold breached
liquidations = np.any(ltv_paths >= ltv_crit, axis=0)
return liquidations.mean()
def safe_liquidation_threshold(S0, mu, sigma, T, loan_usd, collateral_eth, simulations=5000, target_prob=0.05):
thresholds = np.linspace(0.5, 0.95, 20)
for ltv_crit in thresholds:
prob = probability_of_liquidation(S0, mu, sigma, T, loan_usd, collateral_eth, ltv_crit, simulations)
if prob < target_prob:
return ltv_crit, prob
return None, None
loan_usd = 25000
collateral_eth = 10
T = 30
ltv_safe, prob_safe = safe_liquidation_threshold(S0, mu, sigma, T, loan_usd, collateral_eth)
if ltv_safe:
print(f"Set LTV threshold at {ltv_safe:.0%} → Probability of liquidation: {prob_safe:.2%}")
else:
print("No safe threshold found within tested range.")
-
This function iterates over candidate LTV thresholds
to find the highest LTV level that keeps liquidation risk below 5%. -
If no threshold is found, the protocol must require
more collateral or smaller loans.
- Borrowers: Can assess how close they are to unsafe territory.
- Lenders/Protocols: Can set thresholds to minimize systemic risk.
- Risk Managers: Can justify risk frameworks with quantitative evidence.
for ltv in np.linspace(0.6, 0.9, 7):
liq_price = loan_usd / (ltv * collateral_eth)
liqs = np.sum(np.any(paths <= liq_price, axis=0))
prob_liq = liqs / sims
print(f"LTV {ltv:.0%}: Liquidation Probability = {prob_liq:.2%}")
if prob_liq < 0.05:
print(f"✅ Safe threshold: {ltv:.0%}")
- At LTV = 80%, probability of liquidation = 12% → too risky.
- At LTV = 65%, probability of liquidation = 3% → acceptable.
📈 Visualization:
ltvs = np.linspace(0.6, 0.9, 7)
probs = []
for ltv in ltvs:
liq_price = loan_usd / (ltv * collateral_eth)
liqs = np.sum(np.any(paths <= liq_price, axis=0))
probs.append(liqs / sims)
plt.plot(ltvs, probs, marker="o")
plt.axhline(0.05, color="red", linestyle="--", label="5% Risk Target")
plt.title("Probability of Liquidation vs. LTV Threshold")
plt.xlabel("LTV Threshold")
plt.ylabel("Probability of Liquidation")
plt.legend()
plt.show()
- The curve shows how risk increases with higher LTV thresholds.
- The red dashed line = 5% risk policy.
- The intersection = maximum safe LTV.
By combining math, Monte Carlo simulations, and visuals, we can:
- Understand ETH price uncertainty via GBM
- Quantify liquidation risk under different thresholds
- Set safe LTV levels that align with business risk appetite
This approach works in DeFi, US equities, or derivatives pricing. It turns abstract math into actionable risk management.
I created a Python script that converts static markdown files to interactive Jupyter notebook (.ipynb) files. This script is called by a .yml file that does the following:
-
Whenever I push a new version of the README.md GitHub actions bot runs my "Markdown to Jupyter" notebook conversion script
-
Then commit and push the newly generated Jupyter notebook to a folder called
mynotebooks
-
GITHUB_TOKEN
is automatically created by GitHub Actions for each workflow run, but it must be given write permissions via the repos setting:- Settings → Actions → General -> Scroll to Workflow permissions -> "Read and write permissions"
-
Tools for model selection, validation, and preprocessing
-
Suited for medium-sized datasets (unlike TensorFlow/PyTorch, which are better for very large-scale deep learning)
-
Ideal for prototyping, teaching, and applied machine learning projects
Scikit has libraries for:
-
Volatility forecasting (via regression models)
-
Classification (predicting likelihood of liquidation events)
-
Dimensionality reduction (e.g., analyzing factors affecting ETH prices)
PCA is most valuable when you’re modeling many correlated risk factors or assets. For example, a Monte Carlo simulation for a portfolio of 50 stocks (or multiple DeFi assets)
Common in bond pricing and risk models.
PCA often finds:
-
PC1: Level of the yield curve
-
PC2: Slope of the curve
-
PC3: Curvature of the curve
These three factors explain ~95% of yield curve movements