Skip to content

Commit 700c4ae

Browse files
Refine Chapter 5
1 parent 843c127 commit 700c4ae

File tree

1 file changed

+28
-21
lines changed

1 file changed

+28
-21
lines changed

class05/class05.jl

Lines changed: 28 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -78,10 +78,8 @@ end
7878

7979
# ╔═╡ bfc7cced-3ce7-4f2b-8ee9-424d6d5ba682
8080
md"
81-
Trajectory optimization problems of systems with linear dynamics can likely be modeled as LQR (refer to Lecture 3), since quadratic functions are often good enough to be used as the cost.
82-
Many nice properties then ensue.
83-
84-
However, the reality is often harsh.
81+
For systems with linear dynamics, if quadratic functions are good enough to represent the cost (which often is the case), then the trajectory optimization problem can be modeled as LQR (refer to Lecture 3).
82+
However, the real world is hardly linear, and a linear approximation of the dynamics could prove insufficient given the specific setting of the problem.
8583
8684
# Nonlinear Trajectory Optimization
8785
@@ -96,7 +94,7 @@ However, the reality is often harsh.
9694
```
9795
9896
This trajectory optimization problem is often nonlinear in practice.
99-
Nonlinear dynamics causes the problem to be nonconvex. Nonlinearity could also arise from additional constraints.
97+
Nonlinear dynamics causes the problem to be nonconvex. Nonlinearity could also arise from additional constraints, not just the dynamics.
10098
"
10199

102100
# ╔═╡ 5f190a4e-d4b6-4279-8757-b0ec89df987f
@@ -223,7 +221,7 @@ may be enforced to prevent slipping, where $f$ is the applied force, $\mu$ is th
223221

224222
# ╔═╡ 055ac28a-2fbd-4777-b683-688ae6b10a89
225223
Foldable(md"Model choice: when does a linear controller suffice in robotics?", md"
226-
> If you're not pushing the performance limit (e.g. of the actuators), then you can probably use a linear model. [^cmu11]
224+
> If you're not pushing the performance limit, then you can probably use a linear model. [^cmu11]
227225
228226
In a recent paper [^li2024], legged robots are controlled with linear controllers using data-driven Koopman linearization to walk.
229227
![unitree](https://arxiv.org/html/2411.14321v3/x3.png)
@@ -291,6 +289,8 @@ The optimization problem is
291289
& ...
292290
\end{align*}
293291
```
292+
293+
How to express these variables at the collocation points? If we know the splines, then we can express them. We will explore how to do this in the following part.
294294
"
295295

296296
# ╔═╡ 165297a6-854f-475c-a16a-637de6dc9b69
@@ -345,7 +345,7 @@ C_{0} \\ C_{1} \\ C_{2} \\ C_{3}
345345
\end{pmatrix}
346346
```
347347
348-
Now we can represent the values at the collocation point:
348+
Now we can represent the variables at the collocation point, by plugging in the expressions of the $C$ we just derived:
349349
```math
350350
\begin{align*}
351351
x(t_{k + \frac{1}{2}}) &= x(t_{k} + \frac{h}{2}) \\
@@ -362,13 +362,10 @@ u(t_{k + \frac{1}{2}}) = u(t_{k} + \frac{h}{2}) = \frac{1}{2} (u[k] + u[k+1])
362362
```
363363
since its trajectory is approximated with a linear spline.
364364
365-
And we can replace the expressions into the dynamics constraint at the collocation point
365+
Now we can plug these expressions into the dynamics constraint at the collocation point
366366
```math
367367
\dot{x}(t_{k + \frac{1}{2}}) = f(x(t_{k + \frac{1}{2}}), u(t_{k + \frac{1}{2}}))
368368
```
369-
370-
(Note that all these are specific to one particular interval $(t_{k}, t_{k+1})$, even though the time indices are omitted for most of the notations.)
371-
372369
"
373370

374371
# ╔═╡ d75262d5-24b0-47f3-9010-264c43fa72e5
@@ -386,15 +383,15 @@ One can approximate integrals with Simpson's rule for integration:
386383
\int_{t_{0}}^{t_{f}} w(\tau) d\tau \approx \sum_{k=0}^{N-1} \frac{h_{k}}{6} (w_{k} + 4w_{k+\frac{1}{2}} + w_{k+1})
387384
```
388385
389-
This approximation can be applied both to to the **dynamics**:
386+
When applied to the following expression implied by the dynamics
390387
```math
391388
\int_{t_{k}}^{t_{k+1}} \dot{x}(\tau) d\tau = \int_{t_{k}}^{t_{k+1}} f(x(\tau), u(\tau)) d\tau
392389
```
393-
can be approximated with
394-
(notations have been abbreviated)
390+
the approximation is
395391
```math
396392
x[k+1] - x[k] = \frac{1}{6} h_{k} (f_{k} + 4f_{k + \frac{1}{2}} + f_{k+1})
397393
```
394+
where the short-hand notations like $f_{k}$ represent $f(x(t_{k}), u(t_{k}))$, etc.
398395
399396
Rearranging the terms, we can get
400397
```math
@@ -636,6 +633,8 @@ md"
636633
## Differential Dynamic Programming
637634
638635
##### Approximate dynamic programming
636+
This is an approach to solving the often intractable dynamic programming problem. We have covered dynamic programming in Class 4.
637+
639638
Instead of computing the value function at each time step exactly in its entirety,
640639
1. Simulate one particular trajectory
641640
2. (Backward pass) Update value function approximations to match the simulated data as well as possible:
@@ -655,7 +654,7 @@ md"
655654
---
656655
The general idea of differential dynamic programming is approximating the value function with the second-order Taylor approximation around a nominal trajectory and updating the trajectory little-by-little in every iteration.
657656
658-
In the following, time step subscripts and indices will be omitted for conciseness, unless the context involves terms corresponding to different time steps, in which case we will use $t$ to denote the time step as $k$ will be used to denote something else later.
657+
In the following, time step subscripts and indices will be omitted for conciseness, unless the context involves terms corresponding to different time steps, in which case we will use $t$ to denote the time step.
659658
660659
Let us write the second-order Taylor expansion of the value function near $x$ at a particular time step as
661660
```math
@@ -668,7 +667,10 @@ V_{x} = \nabla_{x} \ell_{f}(x), \qquad V_{xx} = \nabla^{2}_{xx} \ell_{f}(x)
668667
669668
In our case, the definition of the action-value function (Q-function) is:
670669
```math
671-
Q[t](x[t] + \Delta x[t], u[t] + \Delta u[t]) = \ell_{t}(x[t] + \Delta x[t], u[t] + \Delta u[t]) + V[t+1](f(x[t] + \Delta x[t], u[t] + \Delta u[t]))
670+
Q[t](x[t] + \Delta x[t], u[t] + \Delta u[t]) =
671+
```
672+
```math
673+
\ell_{t}(x[t] + \Delta x[t], u[t] + \Delta u[t]) + V[t+1](f(x[t] + \Delta x[t], u[t] + \Delta u[t]))
672674
```
673675
674676
The second-order Taylor expansion of the action-value function (Q-function, the cost of the current action in the current state plus the value function of the new state) near $x$ and $u$ is
@@ -681,15 +683,14 @@ Q(x + \Delta x, u + \Delta u) \approx Q(x, u) + \begin{pmatrix} Q_{x} \\ Q_{u} \
681683
\end{pmatrix}
682684
\begin{pmatrix} \Delta x \\ \Delta u \end{pmatrix}
683685
```
684-
685-
Note that these gradient and Hessian terms of $Q[t]$ are expressed in terms of $V_{x}[t+1]$ and $V_{xx}[t+1]$ (as well as gradient and Hessian of $f$).
686+
where the gradient and Hessian terms of $Q[t]$ are expressed in terms of $V_{x}[t+1]$ and $V_{xx}[t+1]$ (as well as gradient and Hessian of $f$; recall the definition of the $Q$ function).
686687
687688
By definition,
688689
```math
689690
V(x + \Delta x) = \min_{\Delta u} Q(x + \Delta x, u + \Delta u)
690691
```
691692
692-
The gradient of $Q(x + \Delta x, u + \Delta u)$ with respect to $\Delta u$ is
693+
The gradient of the approximation of $Q(x + \Delta x, u + \Delta u)$ with respect to $\Delta u$ is
693694
```math
694695
Q_{u} + Q_{uu} \Delta u + Q_{ux} \Delta x
695696
```
@@ -702,7 +703,7 @@ As mentioned earlier, $k[t]$ and $K[t]$ depend on $V[t+1]$.
702703
This implies that $V_{x}$ and $V_{xx}$ of each time step should be iteratively updated, starting from the last time step backward to the first time step.
703704
So let us assume that we have updated $V[t+1]$, and would like to now use the updated $\Delta u^{*}$ to update $V_{x}[t]$ and $V_{xx}[t]$.
704705
705-
Plugging $\Delta u^{*}$ into $Q(x + \Delta x, u + \Delta u)$ produces an expression of $V(x + \Delta x)$ (since $\Delta u^{*}$ is the minimizer).
706+
Plugging $\Delta u^{*}$ into $Q(x + \Delta x, u + \Delta u)$, we get $V(x + \Delta x)$ (since $\Delta u^{*}$ is a minimizer).
706707
With some computation, we get the updated values
707708
```math
708709
V_{x} = Q_{x} + Q_{xu}^{\top} k \qquad V_{xx} = Q_{xx} + Q_{xu}^{\top} K
@@ -718,7 +719,12 @@ Overall, the algorithm can be summarized as:
718719
"
719720

720721
# ╔═╡ 71322a24-2eb6-48ef-b652-bd7105ccdea8
721-
question_box(md"Can you think of one advantage collocation has over the vanilla differential DP? (Hint: think about what is easy to be added to an optimization problem but not to the backward pass of differential DP)")
722+
question_box(md"Can you think of one advantage collocation has over differential DP? (Hint: think about what is easy to be added to an optimization problem but not to the backward pass of differential DP).")
723+
724+
# ╔═╡ 98a56727-c565-4359-8c9d-73f2566e3413
725+
Foldable(md"Answer...", md"
726+
Additional constraints (on $x$ or $u$) can be easily imposed in collocation, but not so in differential DP. Techniques for imposing constraints in differential DP have been developed in the recent years, and you may read about them if interested.
727+
")
722728

723729
# ╔═╡ Cell order:
724730
# ╟─2fe513ba-9310-11f0-2266-9730fc13e5da
@@ -767,3 +773,4 @@ question_box(md"Can you think of one advantage collocation has over the vanilla
767773
# ╟─9932b4dc-4b6e-4a81-8f14-dc71c4c597fc
768774
# ╟─65269bed-858b-4aa6-b8fc-c631a5b5b429
769775
# ╟─71322a24-2eb6-48ef-b652-bd7105ccdea8
776+
# ╟─98a56727-c565-4359-8c9d-73f2566e3413

0 commit comments

Comments
 (0)