Changes Pink IK solver and null space computation to reduce runtime speed (#3904)

michaellin6 · kellyguo11 · web-flow · commit 64a97f205fd1 · 2025-11-03T21:44:01.000-08:00
# Description This PR optimizes the Pink IK controller solver by changing the qpsolver to use **daqp**, and also optimizing the matrix pseudo inverse in Null Space Posture Task. This achieves a **2x performance improvement** by reducing runtime from 1.23 ms to 0.52 ms. Perf experiments documented in third page here: https://docs.google.com/document/d/1E9UiYUU-oCOIetUkqAIva8oK0NvdNkMeqP2gxmeqxNA/edit?tab=t.0#heading=h.snu74q2v857w ## Key Changes 1. **Optimized Pseudoinverse Computation**: Replaced `np.linalg.pinv()` with a custom implementation using direct LAPACK/BLAS calls in the null space projector calculation. The new approach uses Cholesky factorization (`dpotrf`) and triangular solvers (`dpotrs`) for faster computation. 2. **QP Solver Update**: Changed the Pink IK solver from `osqp` to `daqp` for improved performance. 3. **New Dependency**: Added `daqp==0.7.2` to `setup.py` for Linux platforms. ## Type of change  - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Co-authored-by: Kelly Guo <kellyg@nvidia.com>
diff --git a/docs/source/overview/imitation-learning/teleop_imitation.rst b/docs/source/overview/imitation-learning/teleop_imitation.rst
@@ -292,6 +292,15 @@ Using the Mimic generated data we can now train a state-based BC agent for ``Isa
 Visualizing results
 ^^^^^^^^^^^^^^^^^^^
 
+.. tip::
+
+   **Important: Testing Multiple Checkpoint Epochs**
+
+   When evaluating policy performance, it is common for different training epochs to yield significantly different results.
+   If you don't see the expected performance, **always test policies from various epochs** (not just the final checkpoint)
+   to find the best-performing model. Model performance can vary substantially across training, and the final epoch
+   is not always optimal.
+
 By inferencing using the generated model, we can visualize the results of the policy:
 
 .. tab-set::
@@ -315,6 +324,11 @@ By inferencing using the generated model, we can visualize the results of the po
          --device cpu --enable_cameras --task Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-v0 --num_rollouts 50 \
          --checkpoint /PATH/TO/desired_model_checkpoint.pth
 
+.. tip::
+
+   **If you don't see expected performance results:** Test policies from multiple checkpoint epochs, not just the final one.
+   Policy performance can vary significantly across training epochs, and intermediate checkpoints often outperform the final model.
+
 .. note::
 
    **Expected Success Rates and Timings for Franka Cube Stack Task**
@@ -323,6 +337,7 @@ By inferencing using the generated model, we can visualize the results of the po
    * Data generation time: ~30 mins for state, ~4 hours for visuomotor (varies based on num envs the user runs)
    * BC RNN training time: 1000 epochs + ~30 mins (for state), 600 epochs + ~6 hours (for visuomotor)
    * BC RNN policy success rate: ~40-60% (for both state + visuomotor)
+   * **Recommendation:** Evaluate checkpoints from various epochs throughout training to identify the best-performing model
 
 
 Demo 1: Data Generation and Policy Training for a Humanoid Robot
@@ -513,6 +528,11 @@ Visualize the results of the trained policy by running the following command, us
 .. note::
    Change the ``NORM_FACTOR`` in the above command with the values generated in the training step.
 
+.. tip::
+
+   **If you don't see expected performance results:** It is critical to test policies from various checkpoint epochs.
+   Performance can vary significantly between epochs, and the best-performing checkpoint is often not the final one.
+
 .. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/gr-1_steering_wheel_pick_place_policy.gif
    :width: 100%
    :align: center
@@ -528,7 +548,7 @@ Visualize the results of the trained policy by running the following command, us
    * Success rate for data generation depends on the quality of human demonstrations (how well the user performs them) and dataset annotation quality. Both data generation and downstream policy success are sensitive to these factors and can show high variance. See :ref:`Common Pitfalls when Generating Data <common-pitfalls-generating-data>` for tips to improve your dataset.
    * Data generation success for this task is typically 65-80% over 1000 demonstrations, taking 18-40 minutes depending on GPU hardware and success rate (19 minutes on a RTX ADA 6000 @ 80% success rate).
    * Behavior Cloning (BC) policy success is typically 75-86% (evaluated on 50 rollouts) when trained on 1000 generated demonstrations for 2000 epochs (default), depending on demonstration quality. Training takes approximately 29 minutes on a RTX ADA 6000.
-   * Recommendation: Train for 2000 epochs with 1000 generated demonstrations, and evaluate multiple checkpoints saved between the 1500th and 2000th epochs to select the best-performing policy.
+   * **Recommendation:** Train for 2000 epochs with 1000 generated demonstrations, and **evaluate multiple checkpoints saved between the 1000th and 2000th epochs** to select the best-performing policy. Testing various epochs is essential for finding optimal performance.
 
 
 Demo 2: Data Generation and Policy Training for Humanoid Robot Locomanipulation with Unitree G1
@@ -642,6 +662,11 @@ Visualize the trained policy performance:
 .. note::
    Change the ``NORM_FACTOR`` in the above command with the values generated in the training step.
 
+.. tip::
+
+   **If you don't see expected performance results:** Always test policies from various checkpoint epochs.
+   Different epochs can produce significantly different results, so evaluate multiple checkpoints to find the optimal model.
+
 .. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/locomanipulation-g-1_steering_wheel_pick_place.gif
    :width: 100%
    :align: center
@@ -657,7 +682,7 @@ Visualize the trained policy performance:
    * Success rate for data generation depends on the quality of human demonstrations (how well the user performs them) and dataset annotation quality. Both data generation and downstream policy success are sensitive to these factors and can show high variance. See :ref:`Common Pitfalls when Generating Data <common-pitfalls-generating-data>` for tips to improve your dataset.
    * Data generation success for this task is typically 65-82% over 1000 demonstrations, taking 18-40 minutes depending on GPU hardware and success rate (18 minutes on a RTX ADA 6000 @ 82% success rate).
    * Behavior Cloning (BC) policy success is typically 75-85% (evaluated on 50 rollouts) when trained on 1000 generated demonstrations for 2000 epochs (default), depending on demonstration quality. Training takes approximately 40 minutes on a RTX ADA 6000.
-   * Recommendation: Train for 2000 epochs with 1000 generated demonstrations, and evaluate multiple checkpoints saved between the 1500th and 2000th epochs to select the best-performing policy.
+   * **Recommendation:** Train for 2000 epochs with 1000 generated demonstrations, and **evaluate multiple checkpoints saved between the 1000th and 2000th epochs** to select the best-performing policy. Testing various epochs is essential for finding optimal performance.
 
 Generate the dataset with manipulation and point-to-point navigation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -851,6 +876,11 @@ Visualize the results of the trained policy by running the following command, us
 .. note::
    Change the ``NORM_FACTOR`` in the above command with the values generated in the training step.
 
+.. tip::
+
+   **If you don't see expected performance results:** Test policies from various checkpoint epochs, not just the final one.
+   Policy performance can vary substantially across training, and intermediate checkpoints often yield better results.
+
 .. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/gr-1_nut_pouring_policy.gif
    :width: 100%
    :align: center
@@ -866,7 +896,7 @@ Visualize the results of the trained policy by running the following command, us
    * Success rate for data generation depends on the quality of human demonstrations (how well the user performs them) and dataset annotation quality. Both data generation and downstream policy success are sensitive to these factors and can show high variance. See :ref:`Common Pitfalls when Generating Data <common-pitfalls-generating-data>` for tips to improve your dataset.
    * Data generation for 1000 demonstrations takes approximately 10 hours on a RTX ADA 6000.
    * Behavior Cloning (BC) policy success is typically 50-60% (evaluated on 50 rollouts) when trained on 1000 generated demonstrations for 600 epochs (default). Training takes approximately 15 hours on a RTX ADA 6000.
-   * Recommendation: Train for 600 epochs with 1000 generated demonstrations, and evaluate multiple checkpoints saved between the 300th and 600th epochs to select the best-performing policy.
+   * **Recommendation:** Train for 600 epochs with 1000 generated demonstrations, and **evaluate multiple checkpoints saved between the 300th and 600th epochs** to select the best-performing policy. Testing various epochs is critical for achieving optimal performance.
 
 .. _common-pitfalls-generating-data:
 
diff --git a/source/isaaclab/config/extension.toml b/source/isaaclab/config/extension.toml
@@ -1,7 +1,7 @@
 [package]
 
 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.47.6"
+version = "0.47.7"
 
 # Description
 title = "Isaac Lab framework for Robot Learning"
diff --git a/source/isaaclab/docs/CHANGELOG.rst b/source/isaaclab/docs/CHANGELOG.rst
@@ -1,6 +1,15 @@
 Changelog
 ---------
 
+0.47.7 (2025-10-31)
+~~~~~~~~~~~~~~~~~~~
+
+Changed
+^^^^^^^
+
+* Changed Pink IK controller qpsolver from osqp to daqp.
+* Changed Null Space matrix computation in Pink IK's Null Space Posture Task to a faster matrix pseudo inverse computation.
+
 
 0.47.6 (2025-11-01)
 ~~~~~~~~~~~~~~~~~~~~
diff --git a/source/isaaclab/isaaclab/controllers/pink_ik/null_space_posture_task.py b/source/isaaclab/isaaclab/controllers/pink_ik/null_space_posture_task.py
@@ -4,6 +4,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 
 import numpy as np
+import scipy.linalg.blas as blas
+import scipy.linalg.lapack as lapack
 
 import pinocchio as pin
 from pink.configuration import Configuration
@@ -75,6 +77,9 @@ class NullSpacePostureTask(Task):
 
     """
 
+    # Regularization factor for pseudoinverse computation to ensure numerical stability
+    PSEUDOINVERSE_DAMPING_FACTOR: float = 1e-9
+
     def __init__(
         self,
         cost: float,
@@ -237,6 +242,30 @@ def compute_jacobian(self, configuration: Configuration) -> np.ndarray:
         J_combined = np.concatenate(J_frame_tasks, axis=0)
 
         # Compute null space projector: N = I - J^+ * J
-        N_combined = np.eye(J_combined.shape[1]) - np.linalg.pinv(J_combined) @ J_combined
+        # Use fast pseudoinverse computation with direct LAPACK/BLAS calls
+        m, n = J_combined.shape
+
+        # Wide matrix (typical for robotics): use left pseudoinverse
+        # J^+ = J^T @ inv(J @ J^T + λ²I)
+        # This is faster because we invert an m×m matrix instead of n×n
+
+        # Compute J @ J^T using BLAS (faster than numpy)
+        JJT = blas.dgemm(1.0, J_combined, J_combined.T)
+        np.fill_diagonal(JJT, JJT.diagonal() + self.PSEUDOINVERSE_DAMPING_FACTOR**2)
+
+        # Use LAPACK's Cholesky factorization (dpotrf = Positive definite TRiangular Factorization)
+        L, info = lapack.dpotrf(JJT, lower=1, clean=False, overwrite_a=True)
+
+        if info != 0:
+            # Fallback if not positive definite: use numpy's pseudoinverse
+            J_pinv = np.linalg.pinv(J_combined)
+            return np.eye(n) - J_pinv @ J_combined
+
+        # Solve (J @ J^T + λ²I) @ X = J using LAPACK's triangular solver (dpotrs)
+        # This directly solves the system without computing the full inverse
+        X, _ = lapack.dpotrs(L, J_combined, lower=1)
+
+        # Compute null space projector: N = I - J^T @ X
+        N_combined = np.eye(n) - J_combined.T @ X
 
         return N_combined
diff --git a/source/isaaclab/isaaclab/controllers/pink_ik/pink_ik.py b/source/isaaclab/isaaclab/controllers/pink_ik/pink_ik.py
@@ -221,7 +221,7 @@ def compute(
                 self.pink_configuration,
                 self.cfg.variable_input_tasks + self.cfg.fixed_input_tasks,
                 dt,
-                solver="osqp",
+                solver="daqp",
                 safety_break=self.cfg.fail_on_joint_limit_violation,
             )
             joint_angle_changes = velocity * dt
diff --git a/source/isaaclab/setup.py b/source/isaaclab/setup.py
@@ -52,6 +52,7 @@
 INSTALL_REQUIRES += [
     # required by isaaclab.isaaclab.controllers.pink_ik
     f"pin-pink==3.1.0 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS_ARM})",
+    f"daqp==0.7.2 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS_ARM})",
     # required by isaaclab.devices.openxr.retargeters.humanoid.fourier.gr1_t2_dex_retargeting_utils
     f"dex-retargeting==0.4.6 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS})",
 ]

Original file line number	Diff line number	Diff line change
`@@ -221,7 +221,7 @@ def compute(`
`221`	`221`	`self.pink_configuration,`
`222`	`222`	`self.cfg.variable_input_tasks + self.cfg.fixed_input_tasks,`
`223`	`223`	`dt,`
`224`		`- solver="osqp",`
	`224`	`+ solver="daqp",`
`225`	`225`	`safety_break=self.cfg.fail_on_joint_limit_violation,`
`226`	`226`	`)`
`227`	`227`	`joint_angle_changes = velocity * dt`
Original file line number	Diff line number	Diff line change
`@@ -52,6 +52,7 @@`
`52`	`52`	`INSTALL_REQUIRES += [`
`53`	`53`	`# required by isaaclab.isaaclab.controllers.pink_ik`
`54`	`54`	`f"pin-pink==3.1.0 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS_ARM})",`
	`55`	`+ f"daqp==0.7.2 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS_ARM})",`
`55`	`56`	`# required by isaaclab.devices.openxr.retargeters.humanoid.fourier.gr1_t2_dex_retargeting_utils`
`56`	`57`	`f"dex-retargeting==0.4.6 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS})",`
`57`	`58`	`]`