This repo extends a whole-body humanoid MPC stack with a cost-matching pipeline: learn a parameter vector
Purpose
- Offline compute
$Q^{\mathrm{MPC}}_{\boldsymbol{\theta}}$ over a rollout horizon$N$ . - Provide
$\nabla_{\boldsymbol{\theta}} Q$ (and later optionally$\nabla_{\boldsymbol{x}} Q$ ,$\nabla_{\boldsymbol{u}} Q$ ). - Keep a clean split between AD computation and runtime gating/parameter wiring.
Status
- All cost + penalty terms are implemented.
- AD penalties are re-implemented with
CondExpfor numerical consistency with OCS2 scalar penalties. - Key integration pitfalls are fixed and standardized (const-correctness, CG value extraction, frame index includes, redefinition).
- Loads rollout
.npzlogs (state/input/mode + measured stage cost + refpack fields if present). - Builds
$Q^{\mathrm{meas}}$ using discounted returns. - Calls
humanoid_cost_matching_offline_eval_pyto get$Q^{\mathrm{MPC}}_{\boldsymbol{\theta}}$ and$\frac{\partial Q}{\partial \boldsymbol{\theta}}$ . - Optimizes
$\boldsymbol{\theta}$ (Adam) with positivity constraint:$\boldsymbol{\theta}=\mathrm{softplus}(\boldsymbol{r})+\varepsilon$ . - Exports
theta_latest.jsonand deploys a.infofile for the online MPC stack.
-
humanoid_cost_matching_cpp/-
include/.../offline/cost/tapes/— CppAD models:$\mathrm{value}(\mathbf{xu},\mathbf{p})$ ,$\nabla_{\mathbf{xu}}\mathrm{value}(\mathbf{xu},\mathbf{p})$ . -
include/.../offline/cost/terms/— runtime gating + params + mapping tovalue/grad. -
include/.../offline/cost/penalties/barrier_penalties_ad.h— AD barrier penalties (shared). -
src/.../offline/...— implementations +OfflineQEvaluator+CostBundle.
-
-
humanoid_cost_matching/-
data/—.npzloading + batch building. -
train/— training scripts + logs/ckpt. -
utils/export_theta_info.py— deploy.infofor MPC. -
humanoid_cost_matching_offline_eval_py— Python bindings to C++ evaluator.
-
New offline cost components must follow:
include/.../offline/cost/tapes/penalty_xxx_tapes.hsrc/.../offline/cost/tapes/penalty_xxx_tapes.cppinclude/.../offline/cost/terms/penalty_xxx_terms.h
Tape
- Only builds AD model:
$\mathrm{value}(\mathbf{xu},\mathbf{p})$ ,$\nabla_{\mathbf{xu}}\mathrm{value}(\mathbf{xu},\mathbf{p})$ (optionally$\nabla_{\boldsymbol{\theta}}\mathrm{value}$ ).
Term (ICostTerm)
- Runtime gating via
refMgr_->getContactFlags(t). - Assemble
$(\mathbf{x},\mathbf{u})\mapsto \mathbf{xu}$ and params$\mathbf{p}$ . - Map tape outputs to
$\mathrm{value}$ ,$\nabla_{\mathbf{x}}$ , and$\nabla_{\boldsymbol{\theta}}$ .
- No
TerminalSumTerm. - Terminal is accumulated the same way as stage inside
CostBundle.
Offline evaluator stores Pinocchio/robot model AD as const pointers.
If a tape needs non-const APIs, do not const_cast in the evaluator.
Required pattern (inside tape)
- copy Pinocchio AD by value.
- clone robot model into a writable
unique_ptr.
The current trainer minimizes a simple value-matching objective:
- For sampled indices
$k$ from logged rollouts:-
$Q^{\mathrm{meas}}_k$ = discounted return from measured stage costs. -
$Q^{\mathrm{MPC}}_{\boldsymbol{\theta},k}$ and$\frac{\partial Q}{\partial \boldsymbol{\theta}}$ fromOfflineQEvaluatorusing rollout horizon$N$ .
-
Loss $ \mathcal{L}(\boldsymbol{\theta}) = \mathbb{E}\left[\left(Q^{\mathrm{MPC}}_{\boldsymbol{\theta}}-Q^{\mathrm{meas}}\right)^2\right] $
Update
- Adam on raw parameters
$\boldsymbol{r}$ , with: $ \boldsymbol{\theta}=\mathrm{softplus}(\boldsymbol{r})+\varepsilon $
Theta layout
Must match C++ ThetaLayout. Current python layout:
$\theta = [\mathrm{dyn}(6),, Q(n_x),, R(n_u),, Q_f(n_x),, \mathrm{base}(12),, \mathrm{com}(2),, \mathrm{swing}(24),, \mathrm{torque}(12)]$
Cost-rate note
If measured_stage_cost is a cost-rate, the script multiplies by
Prepare .npz logs (typical keys):
-
time$(T,)$ -
state$(T,n_x)$ -
input$(T,n_u)$ -
mode$(T,)$ - a measured stage-cost key expected by
load_measured_stage_cost().
Run:
python3 humanoid_cost_matching/train/train_cost_matching.py \
--task /wb_humanoid_mpc_ws/src/wb_humanoid_mpc/robot_models/unitree_g1/g1_centroidal_mpc/config/mpc/task_cost_matching.info \
--urdf /wb_humanoid_mpc_ws/src/wb_humanoid_mpc/robot_models/unitree_g1/g1_description/urdf/g1_29dof.urdf \
--ref /wb_humanoid_mpc_ws/src/wb_humanoid_mpc/robot_models/unitree_g1/g1_centroidal_mpc/config/command/reference.info \
--npz /wb_humanoid_mpc_ws/src/wb_humanoid_mpc/humanoid_cost_matching/logs/obs_data \
--gamma 0.985 --dt 0.02 --N 60 \
--iters 2000 --batch 64 --lr 1e-3Outputs:
-
humanoid_cost_matching/logs/cm_runs/<run_id>/train_log*.csv -
humanoid_cost_matching/logs/cm_runs/<run_id>/ckpt_step_*.json -
humanoid_cost_matching/logs/cm_runs/<run_id>/theta_latest.json -
deployed theta artifacts:
humanoid_cost_matching/logs/deployed_theta/(+ history)
This project is built upon the foundational work of several open-source libraries and repositories. I would especially like to acknowledge:
- wb_humanoid_mpc: the core centroidal dynamics MPC framework used in this project. The Cost-Matching implementation and training pipeline described here are the original contributions of this repository, built on top of this excellent base.
- ocs2
- pinocchio
- hpipm
