You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: class05/class05.jl
+28-21Lines changed: 28 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -78,10 +78,8 @@ end
78
78
79
79
# ╔═╡ bfc7cced-3ce7-4f2b-8ee9-424d6d5ba682
80
80
md"
81
-
Trajectory optimization problems of systems with linear dynamics can likely be modeled as LQR (refer to Lecture 3), since quadratic functions are often good enough to be used as the cost.
82
-
Many nice properties then ensue.
83
-
84
-
However, the reality is often harsh.
81
+
For systems with linear dynamics, if quadratic functions are good enough to represent the cost (which often is the case), then the trajectory optimization problem can be modeled as LQR (refer to Lecture 3).
82
+
However, the real world is hardly linear, and a linear approximation of the dynamics could prove insufficient given the specific setting of the problem.
85
83
86
84
# Nonlinear Trajectory Optimization
87
85
@@ -96,7 +94,7 @@ However, the reality is often harsh.
96
94
```
97
95
98
96
This trajectory optimization problem is often nonlinear in practice.
99
-
Nonlinear dynamics causes the problem to be nonconvex. Nonlinearity could also arise from additional constraints.
97
+
Nonlinear dynamics causes the problem to be nonconvex. Nonlinearity could also arise from additional constraints, not just the dynamics.
100
98
"
101
99
102
100
# ╔═╡ 5f190a4e-d4b6-4279-8757-b0ec89df987f
@@ -223,7 +221,7 @@ may be enforced to prevent slipping, where $f$ is the applied force, $\mu$ is th
223
221
224
222
# ╔═╡ 055ac28a-2fbd-4777-b683-688ae6b10a89
225
223
Foldable(md"Model choice: when does a linear controller suffice in robotics?", md"
226
-
> If you're not pushing the performance limit (e.g. of the actuators), then you can probably use a linear model. [^cmu11]
224
+
> If you're not pushing the performance limit, then you can probably use a linear model. [^cmu11]
227
225
228
226
In a recent paper [^li2024], legged robots are controlled with linear controllers using data-driven Koopman linearization to walk.
How to express these variables at the collocation points? If we know the splines, then we can express them. We will explore how to do this in the following part.
where the short-hand notations like $f_{k}$ represent $f(x(t_{k}), u(t_{k}))$, etc.
398
395
399
396
Rearranging the terms, we can get
400
397
```math
@@ -636,6 +633,8 @@ md"
636
633
## Differential Dynamic Programming
637
634
638
635
##### Approximate dynamic programming
636
+
This is an approach to solving the often intractable dynamic programming problem. We have covered dynamic programming in Class 4.
637
+
639
638
Instead of computing the value function at each time step exactly in its entirety,
640
639
1. Simulate one particular trajectory
641
640
2. (Backward pass) Update value function approximations to match the simulated data as well as possible:
@@ -655,7 +654,7 @@ md"
655
654
---
656
655
The general idea of differential dynamic programming is approximating the value function with the second-order Taylor approximation around a nominal trajectory and updating the trajectory little-by-little in every iteration.
657
656
658
-
In the following, time step subscripts and indices will be omitted for conciseness, unless the context involves terms corresponding to different time steps, in which case we will use $t$ to denote the time step as $k$ will be used to denote something else later.
657
+
In the following, time step subscripts and indices will be omitted for conciseness, unless the context involves terms corresponding to different time steps, in which case we will use $t$ to denote the time step.
659
658
660
659
Let us write the second-order Taylor expansion of the value function near $x$ at a particular time step as
The second-order Taylor expansion of the action-value function (Q-function, the cost of the current action in the current state plus the value function of the new state) near $x$ and $u$ is
Note that these gradient and Hessian terms of $Q[t]$ are expressed in terms of $V_{x}[t+1]$ and $V_{xx}[t+1]$ (as well as gradient and Hessian of $f$).
686
+
where the gradient and Hessian terms of $Q[t]$ are expressed in terms of $V_{x}[t+1]$ and $V_{xx}[t+1]$ (as well as gradient and Hessian of $f$; recall the definition of the $Q$ function).
The gradient of $Q(x + \Delta x, u + \Delta u)$ with respect to $\Delta u$ is
693
+
The gradient of the approximation of $Q(x + \Delta x, u + \Delta u)$ with respect to $\Delta u$ is
693
694
```math
694
695
Q_{u} + Q_{uu} \Delta u + Q_{ux} \Delta x
695
696
```
@@ -702,7 +703,7 @@ As mentioned earlier, $k[t]$ and $K[t]$ depend on $V[t+1]$.
702
703
This implies that $V_{x}$ and $V_{xx}$ of each time step should be iteratively updated, starting from the last time step backward to the first time step.
703
704
So let us assume that we have updated $V[t+1]$, and would like to now use the updated $\Delta u^{*}$ to update $V_{x}[t]$ and $V_{xx}[t]$.
704
705
705
-
Plugging $\Delta u^{*}$ into $Q(x + \Delta x, u + \Delta u)$ produces an expression of $V(x + \Delta x)$ (since $\Delta u^{*}$ is the minimizer).
706
+
Plugging $\Delta u^{*}$ into $Q(x + \Delta x, u + \Delta u)$, we get $V(x + \Delta x)$ (since $\Delta u^{*}$ is a minimizer).
706
707
With some computation, we get the updated values
707
708
```math
708
709
V_{x} = Q_{x} + Q_{xu}^{\top} k \qquad V_{xx} = Q_{xx} + Q_{xu}^{\top} K
@@ -718,7 +719,12 @@ Overall, the algorithm can be summarized as:
718
719
"
719
720
720
721
# ╔═╡ 71322a24-2eb6-48ef-b652-bd7105ccdea8
721
-
question_box(md"Can you think of one advantage collocation has over the vanilla differential DP? (Hint: think about what is easy to be added to an optimization problem but not to the backward pass of differential DP)")
722
+
question_box(md"Can you think of one advantage collocation has over differential DP? (Hint: think about what is easy to be added to an optimization problem but not to the backward pass of differential DP).")
723
+
724
+
# ╔═╡ 98a56727-c565-4359-8c9d-73f2566e3413
725
+
Foldable(md"Answer...", md"
726
+
Additional constraints (on $x$ or $u$) can be easily imposed in collocation, but not so in differential DP. Techniques for imposing constraints in differential DP have been developed in the recent years, and you may read about them if interested.
727
+
")
722
728
723
729
# ╔═╡ Cell order:
724
730
# ╟─2fe513ba-9310-11f0-2266-9730fc13e5da
@@ -767,3 +773,4 @@ question_box(md"Can you think of one advantage collocation has over the vanilla
0 commit comments