HorizonRobotics
diff --git a/‎README.md‎
Lines changed: 3 additions & 2 deletions b/‎README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/train_motion_tracking.md‎
Lines changed: 56 additions & 11 deletions b/‎docs/train_motion_tracking.md‎
Lines changed: 56 additions & 11 deletions
diff --git a/‎holomotion/config/algo/dagger.yaml‎
Lines changed: 24 additions & 16 deletions b/‎holomotion/config/algo/dagger.yaml‎
Lines changed: 24 additions & 16 deletions
diff --git a/‎holomotion/config/algo/ppo.yaml‎
Lines changed: 7 additions & 6 deletions b/‎holomotion/config/algo/ppo.yaml‎
Lines changed: 7 additions & 6 deletions
diff --git a/‎holomotion/config/env/domain_randomization/domain_rand_base.yaml‎
Lines changed: 1 addition & 1 deletion b/‎holomotion/config/env/domain_randomization/domain_rand_base.yaml‎
Lines changed: 1 addition & 1 deletion
@@ -8,6 +8,7 @@
 [![Python](https://img.shields.io/badge/Python3.8-3776AB?logo=python&logoColor=fff)](#)
 [![Ubuntu](https://img.shields.io/badge/Ubuntu22.04-E95420?logo=ubuntu&logoColor=white)](#)
 [![License](https://img.shields.io/badge/License-Apache_2.0-green?logo=apache&logoColor=white)](./LICENSE)
+[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/HorizonRobotics/HoloMotion)
 
 <!-- [![arXiv](https://img.shields.io/badge/arXiv-2025.00000-red?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2025.00000) -->
 <!-- [![arXiv](https://img.shields.io/badge/arXiv-2025.00000-red?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2025.00000) -->
@@ -121,8 +122,8 @@ Deploy the exported ONNX model using our ROS2 package to run on real-world robot
   author = {Maiyue Chen, Kaihui Wang, Bo Zhang, Yi Ren, Zihao Zhu, Yucheng Wang, Zhizhong Su},
   title = {HoloMotion: A Foundation Model for Whole-Body Humanoid Motion Tracking},
   year = {2025},
-  month = july,
-  version = {0.4.0},
+  month = september,
+  version = {0.4.1},
   url = {https://github.com/HorizonRobotics/HoloMotion},
   license = {Apache-2.0}
 }
 
@@ -49,9 +49,25 @@ The LMDB database will be created at the specified `dump_dir`.
 
 ### 2. Train the Motion Tracking Model
 
-The training entry point is `holomotion/src/training/train_motion_tracking.py`, which uses the training config to start distributed training across multiple GPUs.
+```mermaid
+flowchart LR
+
+A[Teacher Stage 1] --> B[Teacher Stage 2]
+B --> C[Student Distillation]
+C --> D[ONNX Model]
+
+classDef dashed stroke-dasharray: 5 5, rx:10, ry:10, fill:#c9d9f5
+classDef normal fill:#c9d9f5, rx:10, ry:10
+class D dashed
+class A,B,C normal
+```
+
+The training entry point is `holomotion/src/training/train_motion_tracking.py`, which uses the training config to start distributed training across multiple GPUs. It is recommended to train with a three-stage procedure:
+1. Teacher training stage 1: train the teacher policy without domain randomization;
+2. Teacher training stage 2: load the stage1 checkpoint and train with domain randomization;
+3. Student training stage 3: load the teacher stage2 checkpoint and conduct distillation with domain randomization.
 
-#### 2.1 Prepare the Training Config
+#### 2.1 Explaining the Training Config
 
 Use the demo config at `holomotion/config/training/motion_tracking/exp_unitree_g1_21dof_teacher.yaml` as a template. Key configuration groups to modify (configs are located in the `holomotion/config/` directory):
 
@@ -80,18 +96,47 @@ defaults:
 project_name: HoloMotion
 ```
 
-#### 2.2 Prepare the Training Script for Teacher
+#### 2.2 Prepare the Training Scripts for Teacher (Stage 1 and 2)
 
 Review and modify the training script at `holomotion/scripts/training/train_motion_tracking_teacher.sh`. Ensure `config_name` and `motion_file` match your training config and LMDB database directory.
 
+Start training your teacher policy from stage1, where domain randomization is turned off.
+
+```shell
+source train.env
+export CUDA_VISIBLE_DEVICES="0"
+
+config_name="train_your_robot_teacher_stage1"
+motion_file="data/lmdb_datasets/your_lmdb_path"
+num_envs=2048
+
+${Train_CONDA_PREFIX}/bin/accelerate launch \
+    --multi_gpu \
+    --mixed_precision=bf16 \
+    holomotion/src/training/train_motion_tracking.py \
+    --config-name=training/motion_tracking/${config_name} \
+    use_accelerate=True \
+    num_envs=${num_envs} \
+    headless=True \
+    experiment_name=${config_name} \
+    motion_lmdb_path=${motion_file}
+```
+
+```shell
+bash holomotion/scripts/training/train_motion_tracking_teacher_stage1.sh
+```
+
+After finishing your stage1 teacher training, you should start teacher stage2 training loading this checkpoint:
 ```shell
 source train.env
 export CUDA_VISIBLE_DEVICES="0"
 
-config_name="exp_unitree_g1_21dof_teacher"
-motion_file="data/lmdb_datasets/lmdb_g1_21dof_test"
+config_name="train_your_robot_teacher_stage2"
+motion_file="data/lmdb_datasets/your_lmdb_path"
 num_envs=2048
 
+checkpoint="your_stage1_teacher_ckpt_path.pt"
+
 ${Train_CONDA_PREFIX}/bin/accelerate launch \
     --multi_gpu \
     --mixed_precision=bf16 \
@@ -107,7 +152,7 @@ ${Train_CONDA_PREFIX}/bin/accelerate launch \
 Start training by running:
 
 ```shell
-bash holomotion/scripts/training/train_motion_tracking_teacher.sh
+bash holomotion/scripts/training/train_motion_tracking_teacher_stage2.sh
 ```
 
 #### 2.3 Prepare the Training Script for Student
@@ -120,10 +165,10 @@ To start the studnet training, you should specify the studnet training config na
 source train.env
 export CUDA_VISIBLE_DEVICES="0"
 
-config_name="train_unitree_g1_21dof_student"
-teacher_ckpt_path="logs/HoloMotion/xxxxxxxx_xxxxxx-train_unitree_g1_21dof_teacher/model_x.pt"
-motion_file="data/lmdb_datasets/lmdb_g1_21dof_test"
-num_envs=16
+config_name="train_your_robot_student"
+teacher_ckpt_path="your_teacher_stage2_ckpt_path.pt"
+motion_file="data/lmdb_datasets/your_lmdb_path"
+num_envs=2048
 
 ${Train_CONDA_PREFIX}/bin/accelerate launch \
     --multi_gpu \
@@ -171,6 +216,6 @@ You may want to have more or less frequent logging and model dumping intervals.
 
 By default, the model checkpoint will be dumped into a folder named `logs/HoloMotion`. You can change this path by explictly setting `project_name=X`, which results in dumping the checkpoints into the `logs/X` directory.
 
-#### How to resume training from a checkpoint ?
+#### How to resume training or load pretrained model from a checkpoint ?
 
 To resume training from a pretrained checkpoint, you can find the checkpoint in the log directory, and then add the option like this: `checkpoint=logs/HoloMotion/20250728_214414-train_unitree_g1_21dof_teacher/model_X.pt`
@@ -89,40 +89,48 @@ algo:
           action_scale: ${robot.control.action_scale}
           default_dof_pos_dict: ${robot.init_state.default_joint_angles}
           dof_order: ${robot.dof_names}
-        num_fine_experts: 3
+        num_fine_experts: 5
         num_shared_experts: 1
         top_k: 2
         load_balancing_loss_alpha: 0.01
         bound_loss_alpha: 10.0
-        projection_dim: 512
-        hidden_dim: 512
+        projection_dim: 2048
+        hidden_dim: 1024
 
       actor:
-        type: MLP
+        type: MoEMLP
+        predict_local_body_pos: ${algo.algo.config.predict_local_body_pos}
+        pred_local_body_pos_dim:
+          ${eval:'(${algo.algo.config.num_rigid_bodies} + ${algo.algo.config.num_extended_bodies})
+          * 3'}
+        predict_local_body_vel: ${algo.algo.config.predict_local_body_vel}
+        pred_local_body_vel_dim:
+          ${eval:'(${algo.algo.config.num_rigid_bodies} + ${algo.algo.config.num_extended_bodies})
+          * 3'}
+        predict_root_lin_vel: ${algo.algo.config.predict_root_lin_vel}
         fix_sigma: true
+        use_layernorm: false
         input_dim:
           - actor_obs
         output_dim:
           - robot_action_dim
-        predict_local_body_pos: false
-        predict_local_body_vel: false
-        predict_root_lin_vel: false
-        layer_config:
-          hidden_dims:
-            - 2048
-            - 1024
-            - 512
-            - 256
-          activation: SiLU
-          use_layernorm: false
-
+        max_sigma: 1.2
+        min_sigma: 0.2
         clamp_output:
           enabled: true
           raw_lower_bound: ${robot.dof_pos_lower_limit_list}
           raw_upper_bound: ${robot.dof_pos_upper_limit_list}
           action_scale: ${robot.control.action_scale}
           default_dof_pos_dict: ${robot.init_state.default_joint_angles}
           dof_order: ${robot.dof_names}
+        num_fine_experts: 5
+        num_shared_experts: 1
+        top_k: 2
+        load_balancing_loss_alpha: 0.01
+        bound_loss_alpha: 10.0
+        projection_dim: 2048
+        hidden_dim: 1024
+
 
       critic: {}
 
 
@@ -30,6 +30,7 @@ algo:
     schedule: adaptive
     desired_kl: 0.01
     init_noise_std: 0.8
+    donot_load_critic: false
     # ---
 
     # --- Dagger Related Settings ---
@@ -94,26 +95,26 @@ algo:
           action_scale: ${robot.control.action_scale}
           default_dof_pos_dict: ${robot.init_state.default_joint_angles}
           dof_order: ${robot.dof_names}
-        num_fine_experts: 3
+        num_fine_experts: 5
         num_shared_experts: 1
         top_k: 2
         load_balancing_loss_alpha: 0.01
         bound_loss_alpha: 10.0
-        projection_dim: 512
-        hidden_dim: 512
+        projection_dim: 2048
+        hidden_dim: 1024
 
       critic:
         type: MoEMLP
         input_dim:
           - critic_obs
         output_dim:
           - 1
-        num_fine_experts: 3
+        num_fine_experts: 5
         num_shared_experts: 1
         top_k: 2
         load_balancing_loss_alpha: 0.01
-        projection_dim: 512
-        hidden_dim: 512
+        projection_dim: 2048
+        hidden_dim: 1024
 
       disc: {}
 
 
@@ -34,7 +34,7 @@ domain_rand:
   randomize_torque_rfi: True
   rfi_lim: 0.1
   randomize_rfi_lim: True
-  rfi_lim_range: [0.8, 1.2]
+  rfi_lim_range: [0.9, 1.1]
 
   # control delay
   randomize_ctrl_delay: True