Skip to content

Commit 9374491

Browse files
Merge pull request #27 from LearningToOptimize/ad/class02
Class 2 - Numerical optimization for control
2 parents b41df23 + b37c41b commit 9374491

27 files changed

+65851
-1
lines changed
505 KB
Binary file not shown.

class02/Manifest.toml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# This file is machine-generated - editing it directly is not advised
2+
3+
julia_version = "1.11.6"
4+
manifest_format = "2.0"
5+
project_hash = "da39a3ee5e6b4b0d3255bfef95601890afd80709"
6+
7+
[deps]

class02/Project.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
[deps]

class02/SQP.tex

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
\section{Sequential Quadratic Programming (SQP)}
2+
3+
% ------------------------------------------------
4+
\begin{frame}{What is SQP?}
5+
\textbf{Idea:} Solve a nonlinear, constrained problem by repeatedly solving a \emph{quadratic program (QP)} built from local models.\\[4pt]
6+
\begin{itemize}
7+
\item Linearize constraints; quadratic model of the Lagrangian/objective.
8+
\item Each iteration: solve a QP to get a step \(d\), update \(x \leftarrow x + \alpha d\).
9+
\item Strength: strong local convergence (often superlinear) with good Hessian info.
10+
\end{itemize}
11+
\end{frame}
12+
13+
% ------------------------------------------------
14+
\begin{frame}{Target Problem (NLP)}
15+
\[
16+
\min_{x \in \R^n} \ f(x)
17+
\quad
18+
\text{s.t.}\quad
19+
g(x)=0,\quad h(x)\le 0
20+
\]
21+
\begin{itemize}
22+
\item \(f:\R^n\!\to\!\R\), \(g:\R^n\!\to\!\R^{m}\) (equalities), \(h:\R^n\!\to\!\R^{p}\) (inequalities).
23+
\item KKT recap (at candidate optimum \(x^\star\)):
24+
\[
25+
\exists \ \lambda \in \R^{m},\ \mu \in \R^{p}_{\ge 0}:
26+
\ \grad f(x^\star) + \nabla g(x^\star)^T\lambda + \nabla h(x^\star)^T \mu = 0,
27+
\]
28+
\[
29+
g(x^\star)=0,\quad h(x^\star)\le 0,\quad \mu \ge 0,\quad \mu \odot h(x^\star) = 0.
30+
\]
31+
\end{itemize}
32+
\end{frame}
33+
34+
% ------------------------------------------------
35+
\begin{frame}{From NLP to a QP (Local Model)}
36+
At iterate \(x_k\) with multipliers \((\lambda_k,\mu_k)\):\\[4pt]
37+
\textbf{Quadratic model of the Lagrangian}
38+
\[
39+
m_k(d) = \ip{\grad f(x_k)}{d} + \tfrac{1}{2} d^T B_k d
40+
\]
41+
with \(B_k \approx \nabla^2_{xx}\Lag(x_k,\lambda_k,\mu_k)\).\\[6pt]
42+
\textbf{Linearized constraints}
43+
\[
44+
g(x_k) + \nabla g(x_k)\, d = 0,\qquad
45+
h(x_k) + \nabla h(x_k)\, d \le 0.
46+
\]
47+
\end{frame}
48+
49+
% ------------------------------------------------
50+
\begin{frame}{The SQP Subproblem (QP)}
51+
\[
52+
\begin{aligned}
53+
\min_{d \in \R^n}\quad & \grad f(x_k)^T d + \tfrac{1}{2} d^T B_k d \\
54+
\text{s.t.}\quad & \nabla g(x_k)\, d + g(x_k) = 0, \\
55+
& \nabla h(x_k)\, d + h(x_k) \le 0.
56+
\end{aligned}
57+
\]
58+
\begin{itemize}
59+
\item Solve QP \(\Rightarrow\) step \(d_k\) and updated multipliers \((\lambda_{k+1},\mu_{k+1})\).
60+
\item Update \(x_{k+1} = x_k + \alpha_k d_k\) (line search or trust-region).
61+
\end{itemize}
62+
\end{frame}
63+
64+
% ------------------------------------------------
65+
\begin{frame}{Algorithm Sketch (SQP)}
66+
\begin{enumerate}
67+
\item Start with \(x_0\), multipliers \((\lambda_0,\mu_0)\), and \(B_0 \succ 0\).
68+
\item Build QP at \(x_k\) with \(B_k\), linearized constraints.
69+
\item Solve QP \(\Rightarrow\) get \(d_k\), \((\lambda_{k+1},\mu_{k+1})\).
70+
\item Globalize: line search on merit or use filter/TR to choose \(\alpha_k\).
71+
\item Update \(x_{k+1} = x_k + \alpha_k d_k\), update \(B_{k+1}\) (e.g., BFGS).
72+
\end{enumerate}
73+
\end{frame}
74+
75+
% ------------------------------------------------
76+
\begin{frame}{Toy Example (Local Models)}
77+
\textbf{Problem:}
78+
\[
79+
\min_{x\in\R^2} \ \tfrac{1}{2}\norm{x}^2
80+
\quad \text{s.t.} \quad g(x)=x_1^2 + x_2 - 1 = 0,\ \ h(x)=x_2 - 0.2 \le 0.
81+
\]
82+
At \(x_k\), build QP with
83+
\[
84+
\grad f(x_k)=x_k,\quad B_k=I,\quad
85+
\nabla g(x_k) = \begin{bmatrix} 2x_{k,1} & 1 \end{bmatrix},\
86+
\nabla h(x_k) = \begin{bmatrix} 0 & 1 \end{bmatrix}.
87+
\]
88+
Solve for \(d_k\), then \(x_{k+1}=x_k+\alpha_k d_k\).
89+
\end{frame}
90+
91+
92+
% ------------------------------------------------
93+
\begin{frame}{Globalization: Making SQP Robust}
94+
SQP is an important method, and there are many issues to be considered to obtain an \textbf{efficient} and \textbf{reliable} implementation:
95+
\begin{itemize}
96+
\item Efficient solution of the linear systems at each Newton Iteration (Matrix block structure can be exploited.
97+
\item Quasi-Newton approximations to the Hessian.
98+
\item Trust region, line search, etc. to improve robustnes (i.e TR: restrict \(\norm{d}\) to maintain model validity.)
99+
\item Treatment of constraints (equality and inequality) during the iterative process.
100+
\item Selection of good starting guess for $\lambda$.
101+
\end{itemize}
102+
\end{frame}
103+
104+
105+
106+
107+
108+
109+
% ------------------------------------------------
110+
\begin{frame}{Final Takeaways on SQP}
111+
\textbf{When SQP vs.\ Interior-Point?}
112+
\begin{itemize}
113+
\item \textbf{SQP}: strong local convergence; warm-start friendly; natural for NMPC.
114+
\item \textbf{IPM}: very robust for large, strictly feasible problems; good for dense inequality sets.
115+
\item In practice: both are valuable—choose to match problem structure and runtime needs.
116+
\end{itemize}
117+
\textbf{Takeaways of SQP}
118+
\begin{itemize}
119+
\item SQP = Newton-like method using a sequence of structured QPs.
120+
\item Globalization (merit/filter/TR) makes it reliable from poor starts.
121+
\item Excellent fit for control (NMPC/trajectory optimization) due to sparsity and warm starts.
122+
\end{itemize}
123+
\end{frame}

class02/class02.md

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,50 @@
66

77
---
88

9-
Add notes, links, and resources below.
9+
## Overview
10+
11+
This class covers the fundamental numerical optimization techniques essential for optimal control problems. We explore gradient-based methods, Sequential Quadratic Programming (SQP), and various approaches to handling constraints including Augmented Lagrangian Methods (ALM), interior-point methods, and penalty methods.
12+
13+
## Interactive Materials
14+
15+
The class is structured around 1 slide deck and four interactive Jupyter notebooks:
16+
17+
1. **[Part 1a: Root Finding & Backward Euler](part1_root_finding.html)**
18+
- Root-finding algorithms for implicit integration
19+
- Fixed-point iteration vs. Newton's method
20+
- Application to pendulum dynamics
21+
22+
23+
2. **[Part 1b: Minimization via Newton's Method](part1_minimization.html)**
24+
- Unconstrained optimization fundamentals
25+
- Newton's method implementation
26+
- Globalization strategies: Hessian matrix and regularization
27+
28+
3. **[Part 2: Equality Constraints](part2_eq_constraints.html)**
29+
- Lagrange multiplier theory
30+
- KKT conditions for equality constraints
31+
- Quadratic programming implementation
32+
33+
4. **[Part 3: Interior-Point Methods](part3_ipm.html)**
34+
- Inequality constraint handling
35+
- Barrier methods and log-barrier functions
36+
- Comparison with penalty methods
37+
38+
## Additional Resources
39+
40+
- **[Lecture Slides (PDF)](ISYE_8803___Lecture_2___Slides.pdf)** - Complete slide deck
41+
- **[LaTeX Source](main.tex)** - Source code for lecture slides
42+
43+
## Key Learning Outcomes
44+
45+
- Understand gradient-based optimization methods
46+
- Implement Newton's method for minimization
47+
- Apply root-finding techniques for implicit integration
48+
- Solve equality-constrained optimization problems
49+
- Compare different constraint handling methods
50+
- Implement Sequential Quadratic Programming (SQP)
51+
52+
## Next Steps
53+
54+
This class provides the foundation for advanced topics in subsequent classes, including Pontryagin's Maximum Principle, nonlinear trajectory optimization, and stochastic optimal control.
1055

class02/eq_constraints.tex

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
2+
%\section{Part II -- Equality constraints: KKT, Newton vs. Gauss–Newton}
3+
\section{Constrained Optimization}
4+
5+
% ==== Equality constraints: KKT, Newton vs. Gauss–Newton ====
6+
7+
\begin{frame}{Equality-constrained minimization: geometry and conditions}
8+
\textbf{Problem}; $\min_{x\in\mathbb{R}^n} f(x)\quad \text{s.t.}\quad C(x)=0, C:\mathbb{R}^n\to\mathbb{R}^m$.
9+
10+
\medskip
11+
\textbf{Geometric picture.} At an optimum on the manifold $C(x)=0$, the negative gradient must lie in the tangent space:
12+
13+
$$
14+
\grad f(x^\star)\ \perp\ \mathcal{T}_{x^\star}=\{p:\; J_C(x^\star)p=0\}.
15+
$$
16+
17+
Equivalently, the gradient is a linear combination of constraint normals:
18+
19+
$$
20+
\grad f(x^\star)+J_C(x^\star)^{\!T}\lambda^\star=0,\qquad C(x^\star)=0\quad(\lambda^\star\in\mathbb{R}^m).
21+
$$
22+
23+
\medskip
24+
\textbf{Lagrangian.}; $L(x,\lambda)=f(x)+\lambda^{\!T}C(x)$.
25+
\end{frame}
26+
27+
\begin{frame}{A nicer visual explanation/derivation of KKT conditions}
28+
\begin{center}
29+
Quick little whiteboard derivation
30+
\end{center}
31+
32+
\end{frame}
33+
34+
35+
36+
\section{Constrained Optimization}
37+
38+
% ==== Slide 1: Picture-first intuition ====
39+
\begin{frame}[t]{Equality constraints: picture first}
40+
\setbeamercovered{invisible}
41+
42+
\textbf{Goal.} Minimize $f(x)$ while staying on the surface $C(x)=0$.
43+
44+
\uncover<2->{\textbf{Feasible set as a surface.} Think of $C(x)=0$ as a smooth surface embedded in $\mathbb{R}^n$ (a manifold).}
45+
46+
\uncover<3->{\textbf{Move without breaking the constraint.} Tangent directions are the “along-the-surface” moves that keep $C(x)$ unchanged to first order. Intuitively: tiny steps that slide on the surface.}
47+
48+
\uncover<4->{\textbf{What must be true at the best point.} At $x^\star$, there is no downhill direction that stays on the surface. Equivalently, the usual gradient of $f$ has \emph{no component along the surface}.}
49+
50+
\uncover<5->{\textbf{Normals enter the story.} If the gradient can’t point along the surface, it must point \emph{through} it—i.e., it aligns with a combination of the surface’s normal directions (one normal per constraint).}
51+
\end{frame}
52+
53+
% ==== Slide 2: From picture to KKT ====
54+
\begin{frame}[t]{From the picture to KKT (equality case)}
55+
\setbeamercovered{invisible}
56+
57+
\textbf{KKT conditions at a regular local minimum (equality only):}
58+
59+
\uncover<1->{\textbf{1) Feasibility:} $C(x^\star)=0$. \emph{(We’re on the surface.)}}
60+
61+
\uncover<2->{\textbf{2) Stationarity:} $\nabla f(x^\star) + J_C(x^\star)^{\!T}\lambda^\star = 0$. \emph{(The gradient is a linear combination of the constraint normals.)}}
62+
63+
\uncover<3->{\textbf{Lagrangian viewpoint.} Define $L(x,\lambda)=f(x)+\lambda^{\!T}C(x)$. At a solution, $x^\star$ is a stationary point of $L$ w.r.t.\ $x$ (that’s the stationarity equation), while $C(x^\star)=0$ enforces feasibility.}
64+
65+
\uncover<4->{\textbf{What the multipliers mean.} The vector $\lambda^\star$ tells how strongly each constraint “pushes back” at the optimum; it also measures sensitivity of the optimal value to small changes in the constraints.}
66+
67+
\end{frame}
68+
69+
70+
\begin{frame}{KKT system for equalities (first-order necessary conditions)}
71+
\textbf{KKT (FOC).}
72+
73+
$$
74+
\grad_x L(x,\lambda)=\grad f(x)+J_C(x)^{\!T}\lambda=0,\qquad \grad_\lambda L(x,\lambda)=C(x)=0.
75+
$$
76+
77+
\textbf{Solve by Newton on KKT:} linearize both optimality and feasibility:
78+
79+
$$
80+
\begin{bmatrix}
81+
\hess f(x) + \sum_{i=1}^m \lambda_i\,\hess C_i(x) & J_C(x)^{\!T}\\[2pt]
82+
J_C(x) & 0
83+
\end{bmatrix}
84+
\begin{bmatrix}\Delta x\\ \Delta\lambda\end{bmatrix}
85+
=-
86+
\begin{bmatrix}
87+
\grad f(x)+J_C(x)^{\!T}\lambda\\ C(x)
88+
\end{bmatrix}.
89+
$$
90+
91+
\textit{Notes.} This is a symmetric \emph{saddle-point} system; typical solves use block elimination (Schur complement) or sparse factorizations.
92+
\end{frame}
93+
94+
95+
96+
97+
98+
99+
\begin{frame}{Move to Julia Code}
100+
\begin{center}
101+
\textbf{Quick Demo of Julia Notebook: part2\_eq\_constraints.ipynb}
102+
\end{center}
103+
\end{frame}
104+
105+
\begin{frame}{Numerical practice: Newton on KKT}
106+
\setbeamercovered{invisible}
107+
108+
109+
\textbf{When it works best.}
110+
\begin{itemize}
111+
\item Near a regular solution with $J_{C}(x^\star)$ full row rank and positive-definite reduced Hessian.
112+
\item With a globalization (line search on a merit function) and mild regularization for robustness.
113+
\end{itemize}
114+
115+
% --- Part 2: appears on the 2nd click only ---
116+
\uncover<2->{%
117+
\textbf{Common safeguards.}
118+
\begin{itemize}
119+
\item \emph{Regularize} the $(1,1)$ block to ensure a good search direction (e.g., add $\beta I$).
120+
\item \emph{Merit/penalty} line search to balance feasibility vs.\ optimality during updates.
121+
\item \emph{Scaling} constraints to improve conditioning of the KKT system.
122+
\end{itemize}
123+
}
124+
\end{frame}
125+
126+
127+
\begin{frame}{Gauss--Newton vs. full Newton on KKT}
128+
129+
\uncover<1->{
130+
\textbf{Full Newton Hessian of the Lagrangian:}\quad
131+
$\nabla_{xx}^2 L(x,\lambda) = \nabla^2 f(x) + \sum_{i=1}^m \lambda_i\, \nabla^2 C_i(x)$
132+
}
133+
134+
\vspace{0.6em}
135+
136+
\uncover<2->{
137+
\textbf{Gauss--Newton approximation:} drop the \emph{constraint-curvature} term
138+
$\sum_{i=1}^m \lambda_i\, \nabla^2 C_i(x)$:
139+
\begin{align*}
140+
H_{\text{GN}}(x) &\approx \nabla^2 f(x).
141+
\end{align*}
142+
}
143+
144+
\uncover<3->{
145+
\textbf{Trade-offs (high level).}
146+
\begin{itemize}
147+
\item \emph{Full Newton:} fewer iterations near the solution, but each step is costlier and can be less robust far from it.
148+
\item \emph{Gauss--Newton:} cheaper per step and often more stable; may need more iterations but wins in wall-clock on many problems.
149+
\end{itemize}
150+
}
151+
152+
\end{frame}
153+
154+
155+
% ==== Inequalities & KKT: complementarity ====
156+
157+
\begin{frame}{Inequality-constrained minimization and KKT}
158+
\textbf{Problem.} $\quad \quad \min f(x)\quad\text{s.t.}\quad c(x)\ge 0, \quad \quad c:\mathbb{R}^n\to\mathbb{R}^p$.
159+
160+
\textbf{KKT conditions (first-order).}
161+
162+
$$
163+
\begin{aligned}
164+
&\text{Stationarity:} && \grad f(x)-J_c(x)^{\!T}\lambda=0,\\
165+
&\text{Primal feasibility:} && c(x)\ge 0,\\
166+
&\text{Dual feasibility:} && \lambda\ge 0,\\
167+
&\text{Complementarity:} && \lambda^{\!T}c(x)=0\quad(\text{i.e., }\lambda_i c_i(x)=0\ \forall i).
168+
\end{aligned}
169+
$$
170+
171+
\textbf{Interpretation.}
172+
\begin{itemize}
173+
\item \emph{Active} constraints: $c_i(x)=0 \Rightarrow \lambda_i\ge 0$ can be nonzero (acts like an equality).
174+
\item \emph{Inactive} constraints: $c_i(x)>0 \Rightarrow \lambda_i=0$ (no influence on optimality).
175+
\end{itemize}
176+
\end{frame}
177+
178+
179+
180+
181+
\begin{frame}{Complementarity in plain English (and why Newton is tricky)}
182+
\footnotesize
183+
184+
\textbf{What $\lambda_i c_i(x)=0$ means.}
185+
\begin{itemize}
186+
\item Tight constraint ($c_i=0$) $\Rightarrow$ can press back ($\lambda_i\ge0$).
187+
\item Loose constraint ($c_i>0$) $\Rightarrow$ no force ($\lambda_i=0$).
188+
\end{itemize}
189+
190+
\textbf{Why naive Newton fails.}
191+
\begin{itemize}
192+
\item Complementarity = nonsmooth + inequalities ($\lambda\ge0$, $c(x)\ge0$).
193+
\item Equality-style Newton can violate nonnegativity or bounce across boundary.
194+
\end{itemize}
195+
196+
\textbf{Two main strategies (preview).}
197+
\begin{itemize}
198+
\item \emph{Active-set:} guess actives $\Rightarrow$ solve equality-constrained subproblem, update set.
199+
\item \emph{Barrier/PDIP/ALM:} smooth or relax complementarity, damped Newton, drive relaxation $\to 0$.
200+
\end{itemize}
201+
\end{frame}
202+
203+
204+
205+

class02/figures/log_barrier.png

31.6 KB
Loading
60 KB
Loading

class02/figures/tri_paper.png

73.1 KB
Loading

0 commit comments

Comments
 (0)