|
| 1 | + |
| 2 | +%\section{Part II -- Equality constraints: KKT, Newton vs. Gauss–Newton} |
| 3 | +\section{Constrained Optimization} |
| 4 | + |
| 5 | +% ==== Equality constraints: KKT, Newton vs. Gauss–Newton ==== |
| 6 | + |
| 7 | +\begin{frame}{Equality-constrained minimization: geometry and conditions} |
| 8 | +\textbf{Problem}; $\min_{x\in\mathbb{R}^n} f(x)\quad \text{s.t.}\quad C(x)=0, C:\mathbb{R}^n\to\mathbb{R}^m$. |
| 9 | + |
| 10 | +\medskip |
| 11 | +\textbf{Geometric picture.} At an optimum on the manifold $C(x)=0$, the negative gradient must lie in the tangent space: |
| 12 | + |
| 13 | +$$ |
| 14 | +\grad f(x^\star)\ \perp\ \mathcal{T}_{x^\star}=\{p:\; J_C(x^\star)p=0\}. |
| 15 | +$$ |
| 16 | + |
| 17 | +Equivalently, the gradient is a linear combination of constraint normals: |
| 18 | + |
| 19 | +$$ |
| 20 | +\grad f(x^\star)+J_C(x^\star)^{\!T}\lambda^\star=0,\qquad C(x^\star)=0\quad(\lambda^\star\in\mathbb{R}^m). |
| 21 | +$$ |
| 22 | + |
| 23 | +\medskip |
| 24 | +\textbf{Lagrangian.}; $L(x,\lambda)=f(x)+\lambda^{\!T}C(x)$. |
| 25 | +\end{frame} |
| 26 | + |
| 27 | +\begin{frame}{A nicer visual explanation/derivation of KKT conditions} |
| 28 | +\begin{center} |
| 29 | + Quick little whiteboard derivation |
| 30 | +\end{center} |
| 31 | + |
| 32 | +\end{frame} |
| 33 | + |
| 34 | + |
| 35 | + |
| 36 | +\section{Constrained Optimization} |
| 37 | + |
| 38 | +% ==== Slide 1: Picture-first intuition ==== |
| 39 | +\begin{frame}[t]{Equality constraints: picture first} |
| 40 | +\setbeamercovered{invisible} |
| 41 | + |
| 42 | +\textbf{Goal.} Minimize $f(x)$ while staying on the surface $C(x)=0$. |
| 43 | + |
| 44 | +\uncover<2->{\textbf{Feasible set as a surface.} Think of $C(x)=0$ as a smooth surface embedded in $\mathbb{R}^n$ (a manifold).} |
| 45 | + |
| 46 | +\uncover<3->{\textbf{Move without breaking the constraint.} Tangent directions are the “along-the-surface” moves that keep $C(x)$ unchanged to first order. Intuitively: tiny steps that slide on the surface.} |
| 47 | + |
| 48 | +\uncover<4->{\textbf{What must be true at the best point.} At $x^\star$, there is no downhill direction that stays on the surface. Equivalently, the usual gradient of $f$ has \emph{no component along the surface}.} |
| 49 | + |
| 50 | +\uncover<5->{\textbf{Normals enter the story.} If the gradient can’t point along the surface, it must point \emph{through} it—i.e., it aligns with a combination of the surface’s normal directions (one normal per constraint).} |
| 51 | +\end{frame} |
| 52 | + |
| 53 | +% ==== Slide 2: From picture to KKT ==== |
| 54 | +\begin{frame}[t]{From the picture to KKT (equality case)} |
| 55 | +\setbeamercovered{invisible} |
| 56 | + |
| 57 | +\textbf{KKT conditions at a regular local minimum (equality only):} |
| 58 | + |
| 59 | +\uncover<1->{\textbf{1) Feasibility:} $C(x^\star)=0$. \emph{(We’re on the surface.)}} |
| 60 | + |
| 61 | +\uncover<2->{\textbf{2) Stationarity:} $\nabla f(x^\star) + J_C(x^\star)^{\!T}\lambda^\star = 0$. \emph{(The gradient is a linear combination of the constraint normals.)}} |
| 62 | + |
| 63 | +\uncover<3->{\textbf{Lagrangian viewpoint.} Define $L(x,\lambda)=f(x)+\lambda^{\!T}C(x)$. At a solution, $x^\star$ is a stationary point of $L$ w.r.t.\ $x$ (that’s the stationarity equation), while $C(x^\star)=0$ enforces feasibility.} |
| 64 | + |
| 65 | +\uncover<4->{\textbf{What the multipliers mean.} The vector $\lambda^\star$ tells how strongly each constraint “pushes back” at the optimum; it also measures sensitivity of the optimal value to small changes in the constraints.} |
| 66 | + |
| 67 | +\end{frame} |
| 68 | + |
| 69 | + |
| 70 | +\begin{frame}{KKT system for equalities (first-order necessary conditions)} |
| 71 | +\textbf{KKT (FOC).} |
| 72 | + |
| 73 | +$$ |
| 74 | +\grad_x L(x,\lambda)=\grad f(x)+J_C(x)^{\!T}\lambda=0,\qquad \grad_\lambda L(x,\lambda)=C(x)=0. |
| 75 | +$$ |
| 76 | + |
| 77 | +\textbf{Solve by Newton on KKT:} linearize both optimality and feasibility: |
| 78 | + |
| 79 | +$$ |
| 80 | +\begin{bmatrix} |
| 81 | +\hess f(x) + \sum_{i=1}^m \lambda_i\,\hess C_i(x) & J_C(x)^{\!T}\\[2pt] |
| 82 | +J_C(x) & 0 |
| 83 | +\end{bmatrix} |
| 84 | +\begin{bmatrix}\Delta x\\ \Delta\lambda\end{bmatrix} |
| 85 | +=- |
| 86 | +\begin{bmatrix} |
| 87 | +\grad f(x)+J_C(x)^{\!T}\lambda\\ C(x) |
| 88 | +\end{bmatrix}. |
| 89 | +$$ |
| 90 | + |
| 91 | +\textit{Notes.} This is a symmetric \emph{saddle-point} system; typical solves use block elimination (Schur complement) or sparse factorizations. |
| 92 | +\end{frame} |
| 93 | + |
| 94 | + |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | + |
| 99 | +\begin{frame}{Move to Julia Code} |
| 100 | +\begin{center} |
| 101 | + \textbf{Quick Demo of Julia Notebook: part2\_eq\_constraints.ipynb} |
| 102 | +\end{center} |
| 103 | +\end{frame} |
| 104 | + |
| 105 | +\begin{frame}{Numerical practice: Newton on KKT} |
| 106 | + \setbeamercovered{invisible} |
| 107 | + |
| 108 | + |
| 109 | + \textbf{When it works best.} |
| 110 | + \begin{itemize} |
| 111 | + \item Near a regular solution with $J_{C}(x^\star)$ full row rank and positive-definite reduced Hessian. |
| 112 | + \item With a globalization (line search on a merit function) and mild regularization for robustness. |
| 113 | + \end{itemize} |
| 114 | + |
| 115 | + % --- Part 2: appears on the 2nd click only --- |
| 116 | + \uncover<2->{% |
| 117 | + \textbf{Common safeguards.} |
| 118 | + \begin{itemize} |
| 119 | + \item \emph{Regularize} the $(1,1)$ block to ensure a good search direction (e.g., add $\beta I$). |
| 120 | + \item \emph{Merit/penalty} line search to balance feasibility vs.\ optimality during updates. |
| 121 | + \item \emph{Scaling} constraints to improve conditioning of the KKT system. |
| 122 | + \end{itemize} |
| 123 | + } |
| 124 | +\end{frame} |
| 125 | + |
| 126 | + |
| 127 | +\begin{frame}{Gauss--Newton vs. full Newton on KKT} |
| 128 | + |
| 129 | +\uncover<1->{ |
| 130 | +\textbf{Full Newton Hessian of the Lagrangian:}\quad |
| 131 | +$\nabla_{xx}^2 L(x,\lambda) = \nabla^2 f(x) + \sum_{i=1}^m \lambda_i\, \nabla^2 C_i(x)$ |
| 132 | +} |
| 133 | + |
| 134 | +\vspace{0.6em} |
| 135 | + |
| 136 | +\uncover<2->{ |
| 137 | +\textbf{Gauss--Newton approximation:} drop the \emph{constraint-curvature} term |
| 138 | +$\sum_{i=1}^m \lambda_i\, \nabla^2 C_i(x)$: |
| 139 | +\begin{align*} |
| 140 | +H_{\text{GN}}(x) &\approx \nabla^2 f(x). |
| 141 | +\end{align*} |
| 142 | +} |
| 143 | + |
| 144 | +\uncover<3->{ |
| 145 | +\textbf{Trade-offs (high level).} |
| 146 | +\begin{itemize} |
| 147 | + \item \emph{Full Newton:} fewer iterations near the solution, but each step is costlier and can be less robust far from it. |
| 148 | + \item \emph{Gauss--Newton:} cheaper per step and often more stable; may need more iterations but wins in wall-clock on many problems. |
| 149 | +\end{itemize} |
| 150 | +} |
| 151 | + |
| 152 | +\end{frame} |
| 153 | + |
| 154 | + |
| 155 | +% ==== Inequalities & KKT: complementarity ==== |
| 156 | + |
| 157 | +\begin{frame}{Inequality-constrained minimization and KKT} |
| 158 | +\textbf{Problem.} $\quad \quad \min f(x)\quad\text{s.t.}\quad c(x)\ge 0, \quad \quad c:\mathbb{R}^n\to\mathbb{R}^p$. |
| 159 | + |
| 160 | +\textbf{KKT conditions (first-order).} |
| 161 | + |
| 162 | +$$ |
| 163 | +\begin{aligned} |
| 164 | +&\text{Stationarity:} && \grad f(x)-J_c(x)^{\!T}\lambda=0,\\ |
| 165 | +&\text{Primal feasibility:} && c(x)\ge 0,\\ |
| 166 | +&\text{Dual feasibility:} && \lambda\ge 0,\\ |
| 167 | +&\text{Complementarity:} && \lambda^{\!T}c(x)=0\quad(\text{i.e., }\lambda_i c_i(x)=0\ \forall i). |
| 168 | +\end{aligned} |
| 169 | +$$ |
| 170 | + |
| 171 | +\textbf{Interpretation.} |
| 172 | +\begin{itemize} |
| 173 | +\item \emph{Active} constraints: $c_i(x)=0 \Rightarrow \lambda_i\ge 0$ can be nonzero (acts like an equality). |
| 174 | +\item \emph{Inactive} constraints: $c_i(x)>0 \Rightarrow \lambda_i=0$ (no influence on optimality). |
| 175 | +\end{itemize} |
| 176 | +\end{frame} |
| 177 | + |
| 178 | + |
| 179 | + |
| 180 | + |
| 181 | +\begin{frame}{Complementarity in plain English (and why Newton is tricky)} |
| 182 | +\footnotesize |
| 183 | + |
| 184 | +\textbf{What $\lambda_i c_i(x)=0$ means.} |
| 185 | +\begin{itemize} |
| 186 | +\item Tight constraint ($c_i=0$) $\Rightarrow$ can press back ($\lambda_i\ge0$). |
| 187 | +\item Loose constraint ($c_i>0$) $\Rightarrow$ no force ($\lambda_i=0$). |
| 188 | +\end{itemize} |
| 189 | + |
| 190 | +\textbf{Why naive Newton fails.} |
| 191 | +\begin{itemize} |
| 192 | +\item Complementarity = nonsmooth + inequalities ($\lambda\ge0$, $c(x)\ge0$). |
| 193 | +\item Equality-style Newton can violate nonnegativity or bounce across boundary. |
| 194 | +\end{itemize} |
| 195 | + |
| 196 | +\textbf{Two main strategies (preview).} |
| 197 | +\begin{itemize} |
| 198 | +\item \emph{Active-set:} guess actives $\Rightarrow$ solve equality-constrained subproblem, update set. |
| 199 | +\item \emph{Barrier/PDIP/ALM:} smooth or relax complementarity, damped Newton, drive relaxation $\to 0$. |
| 200 | +\end{itemize} |
| 201 | +\end{frame} |
| 202 | + |
| 203 | + |
| 204 | + |
| 205 | + |
0 commit comments