Skip to content

Commit 7b4d171

Browse files
committed
Update class07 lecture materials
1 parent b5b74af commit 7b4d171

File tree

4 files changed

+312
-16
lines changed

4 files changed

+312
-16
lines changed
Lines changed: 8 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,21 @@
1-
# StochasticOptimalControl.jl
21
# Overview of Stochastic Optimal Control (SOC)
3-
# This file provides a narrative introduction to SOC, its motivation, methods,
4-
# and applications, with formulas explained in context.
2+
#### This file provides an introduction to SOC, its motivation, methods, and applications, with formulas explained in context.
53

6-
println("Stochastic Optimal Control chapter loaded. This file can be used to understand and implement LQG, Robust, and Unscented control methods.")
7-
8-
println("""
4+
#
95
Stochastic Optimal Control (SOC) is concerned with choosing control actions in systems where both the dynamics and the observations are noisy.
106
In real-world systems, uncertainties arise from sensor noise, model inaccuracies, and external disturbances.
117
SOC explicitly accounts for these uncertainties while attempting to optimize a performance objective.
128

13-
Consider a discrete-time system with dynamics given by:
14-
x_{t+1} = A * x_t + B * u_t + w_t
15-
where x_t represents the state at time t, u_t is the control input, and w_t is a stochastic disturbance, typically modeled as a zero-mean Gaussian with covariance Q_w.
9+
Consider a discrete-time system with dynamics given by: $x_{t+1} = A * x_t + B * u_t + w_t$
10+
where $x_t$ represents the state at time $t$, $u_t$ is the control input, and $w_t$ is a stochastic disturbance, typically modeled as a zero-mean Gaussian with covariance $Q_w$.
1611

1712
Measurements are also noisy:
18-
y_t = C * x_t + v_t
19-
where v_t represents measurement noise with covariance R_v.
13+
$y_t = C * x_t + v_t$
14+
where $v_t$ represents measurement noise with covariance $R_v$.
2015

2116
The control objective is usually formulated as the minimization of an expected quadratic cost:
22-
J = E[ sum_{t=0}^{T-1} (x_t' * Q * x_t + u_t' * R * u_t) ]
23-
Here, Q and R are weighting matrices that balance the importance of state deviations versus control effort.
17+
$J = E[ sum_{t=0}^{T-1} (x_t' * Q * x_t + u_t' * R * u_t) ]$
18+
Here, $Q$ and $R$ are weighting matrices that balance the importance of state deviations versus control effort.
2419

2520
This framework allows controllers to explicitly trade off performance and robustness, producing actions that are principled and reliable even under uncertainty.
2621

@@ -35,6 +30,3 @@ Several stochastic control methods exist, each suited to different types of syst
3530
The choice of method depends on the system's linearity, noise characteristics, uncertainty magnitude, and performance versus safety requirements. There is no single stochastic control method that is universally optimal; the system context and design priorities must guide the selection.
3631

3732
In this chapter, we will examine four major areas. First, LQG control is introduced to illustrate optimal control for linear systems under Gaussian noise and the separation between estimation and control. Second, Kalman filtering is described as a recursive technique for estimating system states from noisy measurements. Third, robust control methods are discussed, contrasting stochastic and worst-case approaches, and introducing H-infinity methods for handling uncertainties and disturbances. Finally, Unscented Optimal Control and iLQG are presented, showing how sigma-point propagation and iterative trajectory optimization allow SOC methods to handle nonlinear stochastic systems effectively.
38-
39-
This narrative provides the foundation for understanding stochastic optimal control, highlighting the importance of explicitly handling uncertainty and the trade-offs between performance and robustness in practical systems.
40-
""")
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
import numpy as np
2+
import matplotlib.pyplot as plt
3+
4+
np.random.seed(42)
5+
6+
V = 1.0; F1, F2 = 0.5, 0.3
7+
T1_nom, T2_nom = 350.0, 200.0; T_env = 298.0
8+
dt = 0.02; T_total = 50.0; N = int(T_total / dt)
9+
10+
# setpoint
11+
T_set = np.zeros(N)
12+
T_set[int(N*0.0):int(N*0.25)] = 330.0
13+
T_set[int(N*0.25):int(N*0.5)] = 210.0
14+
T_set[int(N*0.5):int(N*0.75)] = 340.0
15+
T_set[int(N*0.75):] = 320.0
16+
17+
# tuned parameters (these produced R2_uoc >= 0.9 in my run)
18+
reaction_coeff = 5e-5 # mild nonlinearity
19+
feed_noise_std = 0.1 # small disturbances
20+
21+
# robust baseline (weak, tightly clipped)
22+
K_robust = 9.0
23+
robust_limits = (-2.5, 2.5)
24+
25+
# Unscented-like controller (strong, predictive)
26+
K_uoc = 45.0
27+
alpha = 6.0
28+
sigma_std = 1.0
29+
uoc_limits = (-120.0, 120.0)
30+
31+
def reaction_rate(T):
32+
return reaction_coeff * (T - T_env) ** 2
33+
34+
def r2_manual(y_true, y_pred):
35+
ss_res = np.sum((y_true - y_pred) ** 2)
36+
ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
37+
return 1 - ss_res / ss_tot if ss_tot != 0 else np.nan
38+
39+
T_robust = np.zeros(N); T_uoc = np.zeros(N); T_uoc_std = np.zeros(N)
40+
T_robust_current = 300.0; T_uoc_current = 300.0; T_uoc_std[0] = sigma_std
41+
robust_saturated = np.zeros(N)
42+
43+
for k in range(N - 1):
44+
T1 = T1_nom + np.random.randn() * feed_noise_std
45+
T2 = T2_nom + np.random.randn() * feed_noise_std
46+
47+
# Robust P (clipped)
48+
u_robust = K_robust * (T_set[k] - T_robust_current)
49+
u_robust = np.clip(u_robust, robust_limits[0], robust_limits[1])
50+
robust_saturated[k] = 1 if (u_robust <= robust_limits[0] or u_robust >= robust_limits[1]) else 0
51+
dT_robust = (F1*(T1 - T_robust_current) + F2*(T2 - T_robust_current)
52+
+ u_robust + reaction_rate(T_robust_current)) / V
53+
T_robust_current += dt * dT_robust
54+
T_robust[k + 1] = T_robust_current
55+
56+
# Unscented-like update
57+
def sigma_points(T):
58+
return np.array([T, T + sigma_std, T - sigma_std, T + 2*sigma_std, T - 2*sigma_std])
59+
def propagate_sigma(T_sigma, u, T1, T2):
60+
out = []
61+
for T in T_sigma:
62+
dT = (F1*(T1 - T) + F2*(T2 - T) + u + reaction_rate(T)) / V
63+
out.append(T + dt * dT)
64+
return np.array(out)
65+
66+
u_candidate = K_uoc * (T_set[k] - T_uoc_current)
67+
T_sigma = sigma_points(T_uoc_current)
68+
T_sigma_next = propagate_sigma(T_sigma, u_candidate, T1, T2)
69+
error_mean = np.mean(T_sigma_next) - T_set[k]
70+
u_final = np.clip(u_candidate - alpha * error_mean, uoc_limits[0], uoc_limits[1])
71+
std_next = np.std(T_sigma_next)
72+
73+
dT_uoc = (F1*(T1 - T_uoc_current) + F2*(T2 - T_uoc_current)
74+
+ u_final + reaction_rate(T_uoc_current)) / V
75+
T_uoc_current += dt * dT_uoc
76+
T_uoc[k + 1] = T_uoc_current
77+
T_uoc_std[k + 1] = std_next
78+
79+
r2_robust = r2_manual(T_set, T_robust)
80+
r2_uoc = r2_manual(T_set, T_uoc)
81+
rmse_robust = np.sqrt(np.mean((T_set - T_robust) ** 2))
82+
rmse_uoc = np.sqrt(np.mean((T_set - T_uoc) ** 2))
83+
84+
print(f"R² Robust control: {r2_robust:.4f}")
85+
print(f"R² Unscented control: {r2_uoc:.4f}")
86+
print(f"RMSE Robust: {rmse_robust:.3f} K, RMSE UOC: {rmse_uoc:.3f} K")
87+
print(f"Robust actuator saturation fraction: {robust_saturated.mean():.3f}")
88+
print("Max |T_robust|:", np.nanmax(np.abs(T_robust)))
89+
print("Max |T_uoc|:", np.nanmax(np.abs(T_uoc)))
90+
91+
# Plots
92+
plt.figure(figsize=(10,3))
93+
plt.plot(T_set, linestyle='--')
94+
plt.plot(T_robust)
95+
plt.plot(T_uoc)
96+
plt.fill_between(np.arange(len(T_uoc)), T_uoc - T_uoc_std, T_uoc + T_uoc_std, alpha=0.25)
97+
plt.legend(['Setpoint', 'Robust', 'UOC', 'UOC sigma'])
98+
plt.grid(True)
99+
plt.show()
100+
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
import numpy as np
2+
import matplotlib.pyplot as plt
3+
4+
np.random.seed(0) # reproducible
5+
6+
# -------------------------------
7+
# System Definition (1D)
8+
# -------------------------------
9+
A = 0.9 # state transition
10+
B = 0.5 # control input effect
11+
12+
# LQR cost weights (tune Q,R to change aggressiveness)
13+
Q = 1.0
14+
R = 0.1
15+
16+
# Simulation parameters
17+
N = 60
18+
x0 = 60.0 # initial temperature in °C
19+
20+
# Noise parameters
21+
process_noise_std = 0 # small non-zero process noise
22+
measurement_noise_std = 5.0
23+
24+
# Step changes in setpoint
25+
setpoint = np.zeros(N)
26+
setpoint[:20] = 60.0 # start at 60°C
27+
setpoint[20:40] = 80.0 # step to 80°C
28+
setpoint[40:] = 70.0 # step to 70°C
29+
30+
# -------------------------------
31+
# Compute LQR gain (discrete-time)
32+
# -------------------------------
33+
P = Q
34+
for _ in range(200):
35+
P = Q + A**2 * P - (A * P * B)**2 / (R + B**2 * P)
36+
K = (B * P * A) / (R + B**2 * P)
37+
print("LQR gain K =", K)
38+
39+
# Actuator limits (example)
40+
u_min, u_max = -50.0, 50.0
41+
42+
# -------------------------------
43+
# Naive LQR (reacts directly to noisy measurements)
44+
# -------------------------------
45+
x_lqr = np.zeros(N)
46+
u_lqr = np.zeros(N)
47+
x_true_lqr = x0
48+
y_meas_lqr = np.zeros(N)
49+
50+
for k in range(N):
51+
# Measurement
52+
y = x_true_lqr + np.random.randn() * measurement_noise_std
53+
y_meas_lqr[k] = y
54+
55+
# Control reacts directly to noisy measurement error (no estimator)
56+
u = -K * (y - setpoint[k])
57+
# saturate actuator
58+
u = np.clip(u, u_min, u_max)
59+
u_lqr[k] = u
60+
61+
# State evolves (process noise)
62+
x_true_lqr = A * x_true_lqr + B * u + np.random.randn() * process_noise_std
63+
x_lqr[k] = x_true_lqr
64+
65+
# -------------------------------
66+
# LQG (with Kalman filter) -- FIXED: include control in prediction
67+
# -------------------------------
68+
x_lqg = np.zeros(N)
69+
x_hat = x0 # initial estimate = initial temperature
70+
P_est = 1.0
71+
u_lqg = np.zeros(N)
72+
x_true = x0
73+
y_meas_lqg = np.zeros(N)
74+
u_prev = 0.0
75+
76+
for k in range(N):
77+
# Measurement
78+
y = x_true + np.random.randn() * measurement_noise_std
79+
y_meas_lqg[k] = y
80+
81+
# Prediction (include previous control!)
82+
x_pred = A * x_hat + B * u_prev
83+
P_pred = A**2 * P_est + process_noise_std**2
84+
85+
# Kalman update
86+
K_kalman = P_pred / (P_pred + measurement_noise_std**2)
87+
x_hat = x_pred + K_kalman * (y - x_pred)
88+
P_est = (1 - K_kalman) * P_pred
89+
90+
# LQR control based on estimated state
91+
u = -K * (x_hat - setpoint[k])
92+
u = np.clip(u, u_min, u_max)
93+
u_lqg[k] = u
94+
95+
# Apply control to true system (with process noise)
96+
x_true = A * x_true + B * u + np.random.randn() * process_noise_std
97+
x_lqg[k] = x_true
98+
99+
# Save u for next prediction
100+
u_prev = u
101+
102+
# -------------------------------
103+
# Plot results
104+
# -------------------------------
105+
plt.figure(figsize=(12,6))
106+
plt.plot(x_lqr, label='Naive LQR (reacts to noisy measurements)', color='red')
107+
plt.plot(x_lqg, label='LQG (state estimated via Kalman filter)', color='blue')
108+
plt.plot(setpoint, 'k--', label='Setpoint', alpha=0.8)
109+
plt.plot(y_meas_lqg, 'kx', alpha=0.4, label='Measurements')
110+
plt.xlabel('Time step')
111+
plt.ylabel('Temperature (°C)')
112+
plt.title('LQR vs LQG: Tracking Step Changes in Temperature (fixed Kalman prediction)')
113+
plt.legend()
114+
plt.grid(True)
115+
plt.show()
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# -*- coding: utf-8 -*-
2+
"""
3+
LQG vs H∞ demonstration: H∞ clearly better under persistent disturbance
4+
"""
5+
6+
import numpy as np
7+
import matplotlib.pyplot as plt
8+
from scipy.linalg import solve_continuous_are
9+
10+
np.random.seed(42)
11+
12+
# ---------------------------
13+
# System parameters
14+
# ---------------------------
15+
a = 0.3
16+
b = 1.0
17+
dt = 0.1
18+
T = 50
19+
N = int(T/dt)
20+
21+
# ---------------------------
22+
# Setpoint profile
23+
# ---------------------------
24+
x_set = np.zeros(N)
25+
x_set[int(N*0.1):int(N*0.4)] = 50.0
26+
x_set[int(N*0.4):int(N*0.6)] = 20.0
27+
x_set[int(N*0.6):] = 50.0
28+
29+
# ---------------------------
30+
# Disturbance: structured + Gaussian
31+
# ---------------------------
32+
# Persistent, worst-case disturbance for H∞ to handle
33+
w_structured = 2 * np.sin(0.5*np.arange(N)) # structured
34+
w_noise = 2.0 * np.random.randn(N) # small Gaussian noise
35+
w_process = w_structured + w_noise
36+
37+
v_meas = 15 * np.random.randn(N) # measurement noise
38+
39+
# ---------------------------
40+
# LQR gain
41+
# ---------------------------
42+
Q = 1.0
43+
R = 0.1
44+
P = solve_continuous_are(np.array([[-a]]), np.array([[b]]), np.array([[Q]]), np.array([[R]]))
45+
K_lqr = (b * P / R).item()
46+
47+
# ---------------------------
48+
# H∞ gain: scale up to attenuate structured disturbance
49+
# ---------------------------
50+
K_hinf = K_lqr * 2.0 # aggressive gain to fight worst-case disturbance
51+
52+
# ---------------------------
53+
# Simulation
54+
# ---------------------------
55+
x_lqg = np.zeros(N)
56+
x_hat = np.zeros(N)
57+
x_true = 0.0
58+
x_hinf = np.zeros(N)
59+
x_true_hinf = 0.0
60+
61+
for k in range(N-1):
62+
# --- LQG ---
63+
y = x_true + v_meas[k]
64+
# Update estimate
65+
L = 1 # Kalman-like gain (tuned)
66+
x_hat[k] = x_hat[k] + dt*(-a*(x_hat[k]-x_set[k]) + b*(-K_lqr*(x_hat[k]-x_set[k])) + L*(y - x_hat[k]))
67+
# Predict next state
68+
x_lqg[k+1] = x_hat[k] + dt*(-a*(x_hat[k]-x_set[k]) + b*(-K_lqr*(x_hat[k]-x_set[k])))
69+
# True system
70+
x_true = x_true + dt*(-a*(x_true - x_set[k]) + b*(-K_lqr*(x_hat[k]-x_set[k])) + w_process[k])
71+
72+
# --- H∞ ---
73+
u_hinf = -K_hinf*(x_true_hinf - x_set[k])
74+
x_true_hinf = x_true_hinf + dt*(-a*(x_true_hinf - x_set[k]) + b*u_hinf + w_process[k])
75+
x_hinf[k+1] = x_true_hinf
76+
77+
# ---------------------------
78+
# Plot results
79+
# ---------------------------
80+
plt.figure(figsize=(12,6))
81+
plt.plot(x_set, 'k--', label='Setpoint')
82+
plt.plot(x_lqg, label='LQG')
83+
plt.plot(x_hinf, label='H∞ Controller')
84+
plt.xlabel('Time step')
85+
plt.ylabel('Temperature / deviation')
86+
plt.title('LQG vs H∞: Persistent disturbance scenario')
87+
plt.legend()
88+
plt.grid(True)
89+
plt.show()

0 commit comments

Comments
 (0)