Skip to content

[Feature] Add TensorDict support to log_metrics#3455

Merged
vmoens merged 3 commits intomainfrom
feat/log-metrics-tensordict
Feb 6, 2026
Merged

[Feature] Add TensorDict support to log_metrics#3455
vmoens merged 3 commits intomainfrom
feat/log-metrics-tensordict

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 6, 2026

Summary

Extend log_metrics() to accept TensorDict in addition to dict inputs. This is a follow-up to #3452.

Changes

  • Add TensorDictBase to the metrics type signature (dict[str, Any] | TensorDictBase)
  • Add _make_metrics_safe_tensordict() helper for TensorDict-specific handling
  • Add keys_sep parameter to control how nested TensorDict keys are flattened (defaults to "/" for hierarchical metric names like "train/loss")
  • Update WandbLogger and MLFlowLogger implementations accordingly

Benefits

This leverages TensorDict's efficient batch .to() method for CUDA→CPU transfers, which is more efficient than transferring tensors individually. TensorDict can transfer all its leaf tensors in a single optimized operation.

Example Usage

from tensordict import TensorDict
import torch

# Create metrics as a TensorDict (possibly on CUDA)
metrics = TensorDict({
    "train": TensorDict({
        "loss": torch.tensor(0.5),
        "reward": torch.tensor(100.0),
    }),
    "eval": TensorDict({
        "reward": torch.tensor(150.0),
    }),
}, device="cuda")

# Log with automatic flattening and efficient transfer
logger.log_metrics(metrics, step=1000)
# Logs: {"train/loss": 0.5, "train/reward": 100.0, "eval/reward": 150.0}

# Custom separator
logger.log_metrics(metrics, step=1000, keys_sep="_")
# Logs: {"train_loss": 0.5, "train_reward": 100.0, "eval_reward": 150.0}

Test plan

  • Verify existing tests pass
  • Manual testing with TensorDict inputs

Made with Cursor

Extend log_metrics() to accept TensorDict in addition to dict inputs.
This leverages TensorDict's efficient batch .to() method for CUDA->CPU
transfers, which is more efficient than transferring tensors individually.

Changes:
- Add TensorDictBase to the metrics type signature (dict | TensorDict)
- Add _make_metrics_safe_tensordict() helper for TensorDict-specific handling
- Add keys_sep parameter to control how nested TensorDict keys are flattened
  (defaults to "/" for hierarchical metric names like "train/loss")
- Update WandbLogger and MLFlowLogger implementations accordingly

Co-authored-by: Cursor <[email protected]>
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3455

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 35 Pending

As of commit fe9360a with merge base 1415062 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 6, 2026
@github-actions github-actions bot added Feature New feature Record labels Feb 6, 2026
vmoens and others added 2 commits February 6, 2026 09:10
Check for CUDA tensors explicitly rather than relying on TensorDict.device,
which can be None even when individual tensors are on CUDA (e.g., mixed
devices or lazy structures).

Always call .to("cpu") but only sync if there were actually CUDA tensors.

Co-authored-by: Cursor <[email protected]>
Use torch.cuda.is_initialized() instead of iterating over all values.
The event sync is cheap if there's no pending CUDA work, so we can
just always sync when CUDA is in use rather than checking each tensor.

Co-authored-by: Cursor <[email protected]>
@vmoens vmoens merged commit daa87db into main Feb 6, 2026
104 of 106 checks passed
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}27$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 80.2823μs 79.1601μs 12.6326 KOps/s 12.3425 KOps/s $\color{#35bf28}+2.35\%$
test_tensor_to_bytestream_speed[torch.save] 0.1402ms 0.1396ms 7.1637 KOps/s 7.0944 KOps/s $\color{#35bf28}+0.98\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1120s 0.1118s 8.9481 Ops/s 9.2452 Ops/s $\color{#d91a1a}-3.21\%$
test_tensor_to_bytestream_speed[numpy] 2.7211μs 2.7025μs 370.0300 KOps/s 380.1627 KOps/s $\color{#d91a1a}-2.67\%$
test_tensor_to_bytestream_speed[safetensors] 36.6988μs 36.5152μs 27.3859 KOps/s 27.4227 KOps/s $\color{#d91a1a}-0.13\%$
test_simple 0.5467s 0.5441s 1.8380 Ops/s 1.7542 Ops/s $\color{#35bf28}+4.77\%$
test_transformed 1.1222s 1.1179s 0.8946 Ops/s 0.8751 Ops/s $\color{#35bf28}+2.22\%$
test_serial 1.6971s 1.6806s 0.5950 Ops/s 0.5918 Ops/s $\color{#35bf28}+0.55\%$
test_parallel 1.3675s 1.1879s 0.8419 Ops/s 0.8766 Ops/s $\color{#d91a1a}-3.96\%$
test_step_mdp_speed[True-True-True-True-True] 0.4646ms 43.0654μs 23.2205 KOps/s 22.1463 KOps/s $\color{#35bf28}+4.85\%$
test_step_mdp_speed[True-True-True-True-False] 55.4930μs 24.9496μs 40.0808 KOps/s 39.8033 KOps/s $\color{#35bf28}+0.70\%$
test_step_mdp_speed[True-True-True-False-True] 0.4461ms 24.8966μs 40.1661 KOps/s 38.6930 KOps/s $\color{#35bf28}+3.81\%$
test_step_mdp_speed[True-True-True-False-False] 61.6330μs 13.6371μs 73.3296 KOps/s 70.3181 KOps/s $\color{#35bf28}+4.28\%$
test_step_mdp_speed[True-True-False-True-True] 81.0940μs 46.8374μs 21.3504 KOps/s 20.2852 KOps/s $\textbf{\color{#35bf28}+5.25\%}$
test_step_mdp_speed[True-True-False-True-False] 0.4634ms 27.7892μs 35.9852 KOps/s 35.5803 KOps/s $\color{#35bf28}+1.14\%$
test_step_mdp_speed[True-True-False-False-True] 0.4568ms 27.1725μs 36.8019 KOps/s 34.5327 KOps/s $\textbf{\color{#35bf28}+6.57\%}$
test_step_mdp_speed[True-True-False-False-False] 63.6230μs 16.2055μs 61.7075 KOps/s 58.3208 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_step_mdp_speed[True-False-True-True-True] 78.1440μs 49.1769μs 20.3347 KOps/s 19.0158 KOps/s $\textbf{\color{#35bf28}+6.94\%}$
test_step_mdp_speed[True-False-True-True-False] 58.1830μs 30.2398μs 33.0690 KOps/s 32.3266 KOps/s $\color{#35bf28}+2.30\%$
test_step_mdp_speed[True-False-True-False-True] 55.8230μs 27.2571μs 36.6877 KOps/s 34.6682 KOps/s $\textbf{\color{#35bf28}+5.82\%}$
test_step_mdp_speed[True-False-True-False-False] 45.0520μs 16.3743μs 61.0712 KOps/s 59.0159 KOps/s $\color{#35bf28}+3.48\%$
test_step_mdp_speed[True-False-False-True-True] 90.4350μs 51.9491μs 19.2496 KOps/s 18.5484 KOps/s $\color{#35bf28}+3.78\%$
test_step_mdp_speed[True-False-False-True-False] 68.9740μs 32.9997μs 30.3033 KOps/s 29.9858 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[True-False-False-False-True] 65.2630μs 30.0152μs 33.3164 KOps/s 32.3620 KOps/s $\color{#35bf28}+2.95\%$
test_step_mdp_speed[True-False-False-False-False] 45.1320μs 19.3221μs 51.7543 KOps/s 51.5031 KOps/s $\color{#35bf28}+0.49\%$
test_step_mdp_speed[False-True-True-True-True] 79.9440μs 49.6230μs 20.1519 KOps/s 19.6749 KOps/s $\color{#35bf28}+2.42\%$
test_step_mdp_speed[False-True-True-True-False] 61.9630μs 30.4408μs 32.8507 KOps/s 31.9921 KOps/s $\color{#35bf28}+2.68\%$
test_step_mdp_speed[False-True-True-False-True] 2.4137ms 31.1158μs 32.1381 KOps/s 30.3009 KOps/s $\textbf{\color{#35bf28}+6.06\%}$
test_step_mdp_speed[False-True-True-False-False] 87.3440μs 17.5458μs 56.9937 KOps/s 53.9487 KOps/s $\textbf{\color{#35bf28}+5.64\%}$
test_step_mdp_speed[False-True-False-True-True] 0.4708ms 52.2313μs 19.1456 KOps/s 18.7462 KOps/s $\color{#35bf28}+2.13\%$
test_step_mdp_speed[False-True-False-True-False] 0.4505ms 32.6007μs 30.6742 KOps/s 29.4991 KOps/s $\color{#35bf28}+3.98\%$
test_step_mdp_speed[False-True-False-False-True] 0.4502ms 33.5533μs 29.8033 KOps/s 28.4287 KOps/s $\color{#35bf28}+4.84\%$
test_step_mdp_speed[False-True-False-False-False] 49.5430μs 20.8026μs 48.0709 KOps/s 46.6019 KOps/s $\color{#35bf28}+3.15\%$
test_step_mdp_speed[False-False-True-True-True] 0.4691ms 55.1385μs 18.1361 KOps/s 17.6221 KOps/s $\color{#35bf28}+2.92\%$
test_step_mdp_speed[False-False-True-True-False] 0.4635ms 35.9818μs 27.7918 KOps/s 27.4975 KOps/s $\color{#35bf28}+1.07\%$
test_step_mdp_speed[False-False-True-False-True] 0.4599ms 34.2762μs 29.1748 KOps/s 28.6636 KOps/s $\color{#35bf28}+1.78\%$
test_step_mdp_speed[False-False-True-False-False] 55.5730μs 21.1793μs 47.2160 KOps/s 47.3274 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-False-False-True-True] 0.4815ms 57.3716μs 17.4302 KOps/s 17.0482 KOps/s $\color{#35bf28}+2.24\%$
test_step_mdp_speed[False-False-False-True-False] 0.4625ms 38.4336μs 26.0189 KOps/s 25.5228 KOps/s $\color{#35bf28}+1.94\%$
test_step_mdp_speed[False-False-False-False-True] 0.4553ms 36.0198μs 27.7625 KOps/s 27.0694 KOps/s $\color{#35bf28}+2.56\%$
test_step_mdp_speed[False-False-False-False-False] 60.6630μs 23.3446μs 42.8365 KOps/s 41.5141 KOps/s $\color{#35bf28}+3.19\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7311s 0.7309s 1.3682 Ops/s 1.2985 Ops/s $\textbf{\color{#35bf28}+5.37\%}$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7206s 0.6224s 1.6067 Ops/s 1.5727 Ops/s $\color{#35bf28}+2.16\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7104s 1.6335s 0.6122 Ops/s 0.5969 Ops/s $\color{#35bf28}+2.56\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4967s 1.4183s 0.7051 Ops/s 0.6875 Ops/s $\color{#35bf28}+2.55\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9631s 1.8823s 0.5313 Ops/s 0.5186 Ops/s $\color{#35bf28}+2.43\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7445s 1.6658s 0.6003 Ops/s 0.5851 Ops/s $\color{#35bf28}+2.61\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7112s 4.4952s 0.2225 Ops/s 0.2207 Ops/s $\color{#35bf28}+0.79\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4470s 4.3377s 0.2305 Ops/s 0.2268 Ops/s $\color{#35bf28}+1.66\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9969s 1.9210s 0.5206 Ops/s 0.5182 Ops/s $\color{#35bf28}+0.45\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7568s 1.6757s 0.5968 Ops/s 0.6031 Ops/s $\color{#d91a1a}-1.05\%$
test_values[generalized_advantage_estimate-True-True] 10.4031ms 9.9883ms 100.1170 Ops/s 101.6514 Ops/s $\color{#d91a1a}-1.51\%$
test_values[vec_generalized_advantage_estimate-True-True] 13.7530ms 11.1090ms 90.0168 Ops/s 56.6003 Ops/s $\textbf{\color{#35bf28}+59.04\%}$
test_values[td0_return_estimate-False-False] 0.2432ms 0.1217ms 8.2197 KOps/s 7.8371 KOps/s $\color{#35bf28}+4.88\%$
test_values[td1_return_estimate-False-False] 27.1880ms 26.5040ms 37.7302 Ops/s 38.5704 Ops/s $\color{#d91a1a}-2.18\%$
test_values[vec_td1_return_estimate-False-False] 11.6854ms 11.0565ms 90.4444 Ops/s 56.7078 Ops/s $\textbf{\color{#35bf28}+59.49\%}$
test_values[td_lambda_return_estimate-True-False] 39.4788ms 38.9598ms 25.6675 Ops/s 26.0539 Ops/s $\color{#d91a1a}-1.48\%$
test_values[vec_td_lambda_return_estimate-True-False] 11.9185ms 11.0530ms 90.4735 Ops/s 56.7154 Ops/s $\textbf{\color{#35bf28}+59.52\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.8633ms 8.7922ms 113.7370 Ops/s 113.8950 Ops/s $\color{#d91a1a}-0.14\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.8450ms 1.4297ms 699.4319 Ops/s 672.2800 Ops/s $\color{#35bf28}+4.04\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4733ms 0.4035ms 2.4785 KOps/s 2.4792 KOps/s $\color{#d91a1a}-0.03\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 34.6154ms 30.1035ms 33.2187 Ops/s 28.6372 Ops/s $\textbf{\color{#35bf28}+16.00\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.1959ms 1.7105ms 584.6232 Ops/s 590.6720 Ops/s $\color{#d91a1a}-1.02\%$
test_dqn_speed[False-None] 1.7292ms 1.3942ms 717.2600 Ops/s 725.2195 Ops/s $\color{#d91a1a}-1.10\%$
test_dqn_speed[False-backward] 1.9836ms 1.8811ms 531.5953 Ops/s 534.1496 Ops/s $\color{#d91a1a}-0.48\%$
test_dqn_speed[True-None] 0.9537ms 0.5428ms 1.8422 KOps/s 1.7401 KOps/s $\textbf{\color{#35bf28}+5.87\%}$
test_dqn_speed[True-backward] 1.0420ms 1.0030ms 997.0087 Ops/s 842.0833 Ops/s $\textbf{\color{#35bf28}+18.40\%}$
test_dqn_speed[reduce-overhead-None] 0.9323ms 0.5406ms 1.8499 KOps/s 1.7690 KOps/s $\color{#35bf28}+4.57\%$
test_ddpg_speed[False-None] 3.0915ms 2.7970ms 357.5204 Ops/s 349.6309 Ops/s $\color{#35bf28}+2.26\%$
test_ddpg_speed[False-backward] 4.1360ms 3.9972ms 250.1775 Ops/s 246.9182 Ops/s $\color{#35bf28}+1.32\%$
test_ddpg_speed[True-None] 1.8135ms 1.4049ms 711.8158 Ops/s 713.7799 Ops/s $\color{#d91a1a}-0.28\%$
test_ddpg_speed[True-backward] 2.4367ms 2.3780ms 420.5249 Ops/s 415.3331 Ops/s $\color{#35bf28}+1.25\%$
test_ddpg_speed[reduce-overhead-None] 1.7348ms 1.3936ms 717.5409 Ops/s 763.0544 Ops/s $\textbf{\color{#d91a1a}-5.96\%}$
test_sac_speed[False-None] 8.5613ms 7.8844ms 126.8334 Ops/s 128.3913 Ops/s $\color{#d91a1a}-1.21\%$
test_sac_speed[False-backward] 11.4015ms 10.9735ms 91.1289 Ops/s 91.4194 Ops/s $\color{#d91a1a}-0.32\%$
test_sac_speed[True-None] 2.2768ms 2.1533ms 464.4043 Ops/s 482.3572 Ops/s $\color{#d91a1a}-3.72\%$
test_sac_speed[True-backward] 4.2683ms 4.0321ms 248.0067 Ops/s 253.6614 Ops/s $\color{#d91a1a}-2.23\%$
test_sac_speed[reduce-overhead-None] 2.5559ms 2.1344ms 468.5251 Ops/s 479.5658 Ops/s $\color{#d91a1a}-2.30\%$
test_redq_speed[False-None] 10.6264ms 10.3014ms 97.0742 Ops/s 97.9324 Ops/s $\color{#d91a1a}-0.88\%$
test_redq_speed[False-backward] 18.2642ms 17.7079ms 56.4720 Ops/s 59.0843 Ops/s $\color{#d91a1a}-4.42\%$
test_redq_speed[True-None] 4.7189ms 4.4309ms 225.6885 Ops/s 223.1018 Ops/s $\color{#35bf28}+1.16\%$
test_redq_speed[True-backward] 10.0568ms 9.7102ms 102.9849 Ops/s 84.3866 Ops/s $\textbf{\color{#35bf28}+22.04\%}$
test_redq_speed[reduce-overhead-None] 4.7751ms 4.4031ms 227.1109 Ops/s 221.2234 Ops/s $\color{#35bf28}+2.66\%$
test_redq_deprec_speed[False-None] 11.3915ms 10.8821ms 91.8942 Ops/s 90.6403 Ops/s $\color{#35bf28}+1.38\%$
test_redq_deprec_speed[False-backward] 16.1681ms 15.6266ms 63.9934 Ops/s 62.4784 Ops/s $\color{#35bf28}+2.42\%$
test_redq_deprec_speed[True-None] 4.1183ms 3.6953ms 270.6149 Ops/s 272.6279 Ops/s $\color{#d91a1a}-0.74\%$
test_redq_deprec_speed[True-backward] 7.8799ms 7.6283ms 131.0914 Ops/s 132.9356 Ops/s $\color{#d91a1a}-1.39\%$
test_redq_deprec_speed[reduce-overhead-None] 3.9254ms 3.6420ms 274.5768 Ops/s 279.1174 Ops/s $\color{#d91a1a}-1.63\%$
test_td3_speed[False-None] 8.1707ms 7.9185ms 126.2873 Ops/s 126.5155 Ops/s $\color{#d91a1a}-0.18\%$
test_td3_speed[False-backward] 11.1690ms 10.6985ms 93.4713 Ops/s 93.7247 Ops/s $\color{#d91a1a}-0.27\%$
test_td3_speed[True-None] 1.9169ms 1.8445ms 542.1402 Ops/s 544.3172 Ops/s $\color{#d91a1a}-0.40\%$
test_td3_speed[True-backward] 3.6648ms 3.5718ms 279.9700 Ops/s 258.2936 Ops/s $\textbf{\color{#35bf28}+8.39\%}$
test_td3_speed[reduce-overhead-None] 1.8880ms 1.8085ms 552.9490 Ops/s 547.7567 Ops/s $\color{#35bf28}+0.95\%$
test_cql_speed[False-None] 28.8565ms 26.0107ms 38.4458 Ops/s 37.3846 Ops/s $\color{#35bf28}+2.84\%$
test_cql_speed[False-backward] 40.5767ms 35.4496ms 28.2091 Ops/s 28.5090 Ops/s $\color{#d91a1a}-1.05\%$
test_cql_speed[True-None] 15.5452ms 12.5128ms 79.9182 Ops/s 80.6385 Ops/s $\color{#d91a1a}-0.89\%$
test_cql_speed[True-backward] 21.5317ms 18.5202ms 53.9952 Ops/s 55.4534 Ops/s $\color{#d91a1a}-2.63\%$
test_cql_speed[reduce-overhead-None] 13.4276ms 12.5311ms 79.8015 Ops/s 78.8970 Ops/s $\color{#35bf28}+1.15\%$
test_a2c_speed[False-None] 5.8864ms 5.4258ms 184.3042 Ops/s 184.5790 Ops/s $\color{#d91a1a}-0.15\%$
test_a2c_speed[False-backward] 12.2853ms 11.8635ms 84.2922 Ops/s 84.5338 Ops/s $\color{#d91a1a}-0.29\%$
test_a2c_speed[True-None] 3.9067ms 3.7630ms 265.7484 Ops/s 267.7160 Ops/s $\color{#d91a1a}-0.73\%$
test_a2c_speed[True-backward] 8.8427ms 8.6310ms 115.8612 Ops/s 113.1858 Ops/s $\color{#35bf28}+2.36\%$
test_a2c_speed[reduce-overhead-None] 4.0086ms 3.7583ms 266.0807 Ops/s 268.1597 Ops/s $\color{#d91a1a}-0.78\%$
test_ppo_speed[False-None] 6.0741ms 5.8088ms 172.1532 Ops/s 169.4951 Ops/s $\color{#35bf28}+1.57\%$
test_ppo_speed[False-backward] 12.9933ms 12.4123ms 80.5650 Ops/s 80.9993 Ops/s $\color{#d91a1a}-0.54\%$
test_ppo_speed[True-None] 3.9090ms 3.7032ms 270.0355 Ops/s 266.3533 Ops/s $\color{#35bf28}+1.38\%$
test_ppo_speed[True-backward] 8.9334ms 8.4689ms 118.0790 Ops/s 103.9969 Ops/s $\textbf{\color{#35bf28}+13.54\%}$
test_ppo_speed[reduce-overhead-None] 3.9079ms 3.6936ms 270.7378 Ops/s 272.3397 Ops/s $\color{#d91a1a}-0.59\%$
test_reinforce_speed[False-None] 5.0916ms 4.5821ms 218.2393 Ops/s 211.4258 Ops/s $\color{#35bf28}+3.22\%$
test_reinforce_speed[False-backward] 7.5444ms 7.3595ms 135.8795 Ops/s 132.5123 Ops/s $\color{#35bf28}+2.54\%$
test_reinforce_speed[True-None] 3.1730ms 2.9233ms 342.0817 Ops/s 330.6307 Ops/s $\color{#35bf28}+3.46\%$
test_reinforce_speed[True-backward] 8.1174ms 7.8130ms 127.9921 Ops/s 121.9059 Ops/s $\color{#35bf28}+4.99\%$
test_reinforce_speed[reduce-overhead-None] 3.3606ms 2.8991ms 344.9362 Ops/s 343.9232 Ops/s $\color{#35bf28}+0.29\%$
test_iql_speed[False-None] 24.8616ms 20.1599ms 49.6035 Ops/s 49.2995 Ops/s $\color{#35bf28}+0.62\%$
test_iql_speed[False-backward] 30.8986ms 30.3949ms 32.9002 Ops/s 32.8970 Ops/s $+0.01\%$
test_iql_speed[True-None] 8.9276ms 8.5874ms 116.4498 Ops/s 115.1988 Ops/s $\color{#35bf28}+1.09\%$
test_iql_speed[True-backward] 17.3806ms 16.8186ms 59.4580 Ops/s 59.3614 Ops/s $\color{#35bf28}+0.16\%$
test_iql_speed[reduce-overhead-None] 9.0435ms 8.6330ms 115.8343 Ops/s 109.7635 Ops/s $\textbf{\color{#35bf28}+5.53\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1347ms 6.0466ms 165.3825 Ops/s 166.5681 Ops/s $\color{#d91a1a}-0.71\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0275ms 0.2847ms 3.5129 KOps/s 3.4067 KOps/s $\color{#35bf28}+3.12\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5974ms 0.2665ms 3.7528 KOps/s 3.8534 KOps/s $\color{#d91a1a}-2.61\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1580ms 5.8857ms 169.9039 Ops/s 170.6108 Ops/s $\color{#d91a1a}-0.41\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9989ms 0.2788ms 3.5869 KOps/s 3.3108 KOps/s $\textbf{\color{#35bf28}+8.34\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4861ms 0.2616ms 3.8233 KOps/s 3.2564 KOps/s $\textbf{\color{#35bf28}+17.41\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4904ms 1.2392ms 806.9531 Ops/s 773.9040 Ops/s $\color{#35bf28}+4.27\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4257ms 1.1549ms 865.8807 Ops/s 814.6201 Ops/s $\textbf{\color{#35bf28}+6.29\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.1143ms 6.2234ms 160.6845 Ops/s 164.4839 Ops/s $\color{#d91a1a}-2.31\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1095ms 0.4929ms 2.0287 KOps/s 2.1203 KOps/s $\color{#d91a1a}-4.32\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6841ms 0.4665ms 2.1438 KOps/s 2.3375 KOps/s $\textbf{\color{#d91a1a}-8.29\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0532ms 5.9185ms 168.9625 Ops/s 168.5861 Ops/s $\color{#35bf28}+0.22\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6411ms 0.3143ms 3.1812 KOps/s 2.9507 KOps/s $\textbf{\color{#35bf28}+7.81\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5506ms 0.3309ms 3.0223 KOps/s 3.4307 KOps/s $\textbf{\color{#d91a1a}-11.90\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1010ms 5.8702ms 170.3518 Ops/s 169.4660 Ops/s $\color{#35bf28}+0.52\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.1479ms 0.3638ms 2.7488 KOps/s 3.0044 KOps/s $\textbf{\color{#d91a1a}-8.51\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5746ms 0.3408ms 2.9341 KOps/s 3.7177 KOps/s $\textbf{\color{#d91a1a}-21.08\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1743ms 6.0531ms 165.2047 Ops/s 165.5249 Ops/s $\color{#d91a1a}-0.19\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9126ms 0.4886ms 2.0469 KOps/s 2.1560 KOps/s $\textbf{\color{#d91a1a}-5.06\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7831ms 0.4542ms 2.2018 KOps/s 2.1011 KOps/s $\color{#35bf28}+4.79\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4841ms 5.0389ms 198.4559 Ops/s 191.2443 Ops/s $\color{#35bf28}+3.77\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.6820ms 1.9056ms 524.7758 Ops/s 521.4243 Ops/s $\color{#35bf28}+0.64\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.2060ms 0.8616ms 1.1606 KOps/s 1.0609 KOps/s $\textbf{\color{#35bf28}+9.39\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.5515ms 5.0651ms 197.4303 Ops/s 195.1539 Ops/s $\color{#35bf28}+1.17\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 11.1691ms 1.9067ms 524.4531 Ops/s 559.3732 Ops/s $\textbf{\color{#d91a1a}-6.24\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.6057ms 1.2960ms 771.6261 Ops/s 827.7042 Ops/s $\textbf{\color{#d91a1a}-6.78\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5576s 16.3150ms 61.2932 Ops/s 55.9730 Ops/s $\textbf{\color{#35bf28}+9.51\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 3.9571ms 1.9097ms 523.6528 Ops/s 440.3234 Ops/s $\textbf{\color{#35bf28}+18.92\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.3260ms 1.0417ms 959.9371 Ops/s 780.4266 Ops/s $\textbf{\color{#35bf28}+23.00\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.4221ms 35.5797ms 28.1059 Ops/s 27.5839 Ops/s $\color{#35bf28}+1.89\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.5840ms 17.7082ms 56.4712 Ops/s 56.0355 Ops/s $\color{#35bf28}+0.78\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.4315ms 36.5705ms 27.3444 Ops/s 26.3723 Ops/s $\color{#35bf28}+3.69\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.3448ms 17.9013ms 55.8617 Ops/s 53.9859 Ops/s $\color{#35bf28}+3.47\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.3764ms 38.3612ms 26.0680 Ops/s 25.3074 Ops/s $\color{#35bf28}+3.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.9202ms 19.4425ms 51.4337 Ops/s 51.5505 Ops/s $\color{#d91a1a}-0.23\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8452ms 0.2155ms 4.6411 KOps/s 4.7195 KOps/s $\color{#d91a1a}-1.66\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7368ms 1.3861ms 721.4274 Ops/s 724.6623 Ops/s $\color{#d91a1a}-0.45\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7994ms 2.3448ms 426.4792 Ops/s 413.8647 Ops/s $\color{#35bf28}+3.05\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0687ms 2.8952ms 345.3967 Ops/s 343.6065 Ops/s $\color{#35bf28}+0.52\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2137ms 0.1316ms 7.6000 KOps/s 7.4259 KOps/s $\color{#35bf28}+2.34\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3561ms 0.1830ms 5.4645 KOps/s 4.9029 KOps/s $\textbf{\color{#35bf28}+11.45\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.8780ms 1.7664ms 566.1367 Ops/s 563.7529 Ops/s $\color{#35bf28}+0.42\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4954ms 1.3285ms 752.7144 Ops/s 768.6247 Ops/s $\color{#d91a1a}-2.07\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2184ms 1.1215ms 891.6640 Ops/s 885.9015 Ops/s $\color{#35bf28}+0.65\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7026ms 3.5710ms 280.0312 Ops/s 270.0251 Ops/s $\color{#35bf28}+3.71\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.0601ms 5.7149ms 174.9810 Ops/s 176.7055 Ops/s $\color{#d91a1a}-0.98\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.3833ms 7.2376ms 138.1680 Ops/s 143.9503 Ops/s $\color{#d91a1a}-4.02\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4155ms 0.2706ms 3.6958 KOps/s 3.7283 KOps/s $\color{#d91a1a}-0.87\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6807ms 1.5062ms 663.9028 Ops/s 667.1550 Ops/s $\color{#d91a1a}-0.49\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6710ms 2.4722ms 404.4921 Ops/s 393.0365 Ops/s $\color{#35bf28}+2.91\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3968ms 3.0967ms 322.9256 Ops/s 319.3564 Ops/s $\color{#35bf28}+1.12\%$
test_collector_without_rb[100-img_shape0-atari] 33.9046ms 33.1796ms 30.1390 Ops/s 29.4891 Ops/s $\color{#35bf28}+2.20\%$
test_collector_without_rb[200-img_shape1-large_batch] 67.5772ms 65.7784ms 15.2026 Ops/s 15.0862 Ops/s $\color{#35bf28}+0.77\%$
test_collector_with_rb[100-img_shape0-atari] 38.7823ms 37.9987ms 26.3167 Ops/s 25.9691 Ops/s $\color{#35bf28}+1.34\%$
test_collector_with_rb[200-img_shape1-large_batch] 76.1987ms 74.6546ms 13.3950 Ops/s 13.2774 Ops/s $\color{#35bf28}+0.89\%$

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 80.2374μs 79.5256μs 12.5746 KOps/s 12.3726 KOps/s $\color{#35bf28}+1.63\%$
test_tensor_to_bytestream_speed[torch.save] 0.1381ms 0.1378ms 7.2555 KOps/s 7.1564 KOps/s $\color{#35bf28}+1.38\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1073s 0.1071s 9.3344 Ops/s 8.9186 Ops/s $\color{#35bf28}+4.66\%$
test_tensor_to_bytestream_speed[numpy] 2.7730μs 2.7662μs 361.5103 KOps/s 375.1809 KOps/s $\color{#d91a1a}-3.64\%$
test_tensor_to_bytestream_speed[safetensors] 39.0381μs 38.4667μs 25.9965 KOps/s 26.3444 KOps/s $\color{#d91a1a}-1.32\%$
test_simple 0.8012s 0.7982s 1.2528 Ops/s 1.2276 Ops/s $\color{#35bf28}+2.05\%$
test_transformed 1.5526s 1.4567s 0.6865 Ops/s 0.6929 Ops/s $\color{#d91a1a}-0.93\%$
test_serial 2.4243s 2.3278s 0.4296 Ops/s 0.4331 Ops/s $\color{#d91a1a}-0.81\%$
test_parallel 2.1436s 2.0114s 0.4972 Ops/s 0.5121 Ops/s $\color{#d91a1a}-2.92\%$
test_step_mdp_speed[True-True-True-True-True] 0.3209ms 45.2667μs 22.0913 KOps/s 22.4869 KOps/s $\color{#d91a1a}-1.76\%$
test_step_mdp_speed[True-True-True-True-False] 48.7300μs 25.3539μs 39.4416 KOps/s 39.8812 KOps/s $\color{#d91a1a}-1.10\%$
test_step_mdp_speed[True-True-True-False-True] 60.1400μs 25.3168μs 39.4994 KOps/s 39.8672 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[True-True-True-False-False] 38.5510μs 13.7666μs 72.6395 KOps/s 72.3160 KOps/s $\color{#35bf28}+0.45\%$
test_step_mdp_speed[True-True-False-True-True] 74.1710μs 47.8027μs 20.9193 KOps/s 21.2604 KOps/s $\color{#d91a1a}-1.60\%$
test_step_mdp_speed[True-True-False-True-False] 56.9110μs 27.3404μs 36.5759 KOps/s 36.0930 KOps/s $\color{#35bf28}+1.34\%$
test_step_mdp_speed[True-True-False-False-True] 61.6010μs 28.4338μs 35.1694 KOps/s 35.9290 KOps/s $\color{#d91a1a}-2.11\%$
test_step_mdp_speed[True-True-False-False-False] 47.2910μs 16.5645μs 60.3700 KOps/s 61.0538 KOps/s $\color{#d91a1a}-1.12\%$
test_step_mdp_speed[True-False-True-True-True] 79.5710μs 50.0866μs 19.9654 KOps/s 19.8118 KOps/s $\color{#35bf28}+0.78\%$
test_step_mdp_speed[True-False-True-True-False] 59.3910μs 30.6428μs 32.6340 KOps/s 32.8596 KOps/s $\color{#d91a1a}-0.69\%$
test_step_mdp_speed[True-False-True-False-True] 59.5110μs 28.0185μs 35.6907 KOps/s 36.5379 KOps/s $\color{#d91a1a}-2.32\%$
test_step_mdp_speed[True-False-True-False-False] 42.9110μs 16.6621μs 60.0164 KOps/s 60.4730 KOps/s $\color{#d91a1a}-0.76\%$
test_step_mdp_speed[True-False-False-True-True] 80.9920μs 52.5022μs 19.0468 KOps/s 18.9927 KOps/s $\color{#35bf28}+0.28\%$
test_step_mdp_speed[True-False-False-True-False] 66.0610μs 32.9190μs 30.3776 KOps/s 30.3005 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[True-False-False-False-True] 60.1810μs 30.2928μs 33.0112 KOps/s 33.5646 KOps/s $\color{#d91a1a}-1.65\%$
test_step_mdp_speed[True-False-False-False-False] 44.4300μs 19.1690μs 52.1676 KOps/s 52.1793 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[False-True-True-True-True] 96.6920μs 49.9974μs 20.0010 KOps/s 19.8828 KOps/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[False-True-True-True-False] 56.3910μs 30.8202μs 32.4462 KOps/s 32.9688 KOps/s $\color{#d91a1a}-1.58\%$
test_step_mdp_speed[False-True-True-False-True] 2.3215ms 31.8616μs 31.3857 KOps/s 31.3184 KOps/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[False-True-True-False-False] 48.0910μs 18.4443μs 54.2172 KOps/s 55.5583 KOps/s $\color{#d91a1a}-2.41\%$
test_step_mdp_speed[False-True-False-True-True] 0.1024ms 53.4188μs 18.7200 KOps/s 18.7740 KOps/s $\color{#d91a1a}-0.29\%$
test_step_mdp_speed[False-True-False-True-False] 66.9710μs 33.2932μs 30.0362 KOps/s 29.9688 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[False-True-False-False-True] 64.1310μs 33.9504μs 29.4547 KOps/s 29.9196 KOps/s $\color{#d91a1a}-1.55\%$
test_step_mdp_speed[False-True-False-False-False] 45.6910μs 21.0222μs 47.5688 KOps/s 47.4508 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[False-False-True-True-True] 89.6110μs 55.7553μs 17.9355 KOps/s 17.9913 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[False-False-True-True-False] 62.7310μs 35.8876μs 27.8648 KOps/s 28.0301 KOps/s $\color{#d91a1a}-0.59\%$
test_step_mdp_speed[False-False-True-False-True] 66.2510μs 33.7373μs 29.6408 KOps/s 29.5478 KOps/s $\color{#35bf28}+0.31\%$
test_step_mdp_speed[False-False-True-False-False] 59.9410μs 20.8974μs 47.8529 KOps/s 47.6163 KOps/s $\color{#35bf28}+0.50\%$
test_step_mdp_speed[False-False-False-True-True] 91.9210μs 57.9138μs 17.2670 KOps/s 17.5582 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[False-False-False-True-False] 68.6010μs 38.3666μs 26.0643 KOps/s 26.3860 KOps/s $\color{#d91a1a}-1.22\%$
test_step_mdp_speed[False-False-False-False-True] 63.7410μs 36.9698μs 27.0491 KOps/s 28.2064 KOps/s $\color{#d91a1a}-4.10\%$
test_step_mdp_speed[False-False-False-False-False] 51.2910μs 23.6853μs 42.2203 KOps/s 43.1127 KOps/s $\color{#d91a1a}-2.07\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8709s 0.7722s 1.2949 Ops/s 1.3020 Ops/s $\color{#d91a1a}-0.54\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7320s 0.6379s 1.5678 Ops/s 1.5819 Ops/s $\color{#d91a1a}-0.89\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7590s 1.6854s 0.5933 Ops/s 0.5993 Ops/s $\color{#d91a1a}-1.00\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5406s 1.4626s 0.6837 Ops/s 0.6900 Ops/s $\color{#d91a1a}-0.91\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0085s 1.9305s 0.5180 Ops/s 0.5204 Ops/s $\color{#d91a1a}-0.45\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7912s 1.7096s 0.5849 Ops/s 0.5889 Ops/s $\color{#d91a1a}-0.68\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.8361s 4.6781s 0.2138 Ops/s 0.2142 Ops/s $\color{#d91a1a}-0.20\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5607s 4.4457s 0.2249 Ops/s 0.2233 Ops/s $\color{#35bf28}+0.71\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0495s 1.9590s 0.5105 Ops/s 0.5112 Ops/s $\color{#d91a1a}-0.14\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7617s 1.6632s 0.6013 Ops/s 0.5881 Ops/s $\color{#35bf28}+2.24\%$
test_values[generalized_advantage_estimate-True-True] 21.7109ms 21.2575ms 47.0421 Ops/s 48.1157 Ops/s $\color{#d91a1a}-2.23\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1462s 3.8456ms 260.0391 Ops/s 287.6283 Ops/s $\textbf{\color{#d91a1a}-9.59\%}$
test_values[td0_return_estimate-False-False] 0.1121ms 83.0845μs 12.0359 KOps/s 11.9595 KOps/s $\color{#35bf28}+0.64\%$
test_values[td1_return_estimate-False-False] 50.0913ms 49.5331ms 20.1885 Ops/s 20.3139 Ops/s $\color{#d91a1a}-0.62\%$
test_values[vec_td1_return_estimate-False-False] 1.4083ms 1.0929ms 914.9808 Ops/s 916.3271 Ops/s $\color{#d91a1a}-0.15\%$
test_values[td_lambda_return_estimate-True-False] 80.8827ms 80.5066ms 12.4213 Ops/s 12.4058 Ops/s $\color{#35bf28}+0.13\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2928ms 1.0875ms 919.5748 Ops/s 916.8909 Ops/s $\color{#35bf28}+0.29\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.5863ms 22.1304ms 45.1867 Ops/s 47.8005 Ops/s $\textbf{\color{#d91a1a}-5.47\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0308ms 0.7617ms 1.3128 KOps/s 1.3229 KOps/s $\color{#d91a1a}-0.76\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 1.1518ms 0.6983ms 1.4321 KOps/s 1.4726 KOps/s $\color{#d91a1a}-2.75\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5821ms 1.4967ms 668.1216 Ops/s 670.3762 Ops/s $\color{#d91a1a}-0.34\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7833ms 0.7041ms 1.4203 KOps/s 1.4350 KOps/s $\color{#d91a1a}-1.02\%$
test_dqn_speed[False-None] 1.9574ms 1.5326ms 652.4957 Ops/s 656.6249 Ops/s $\color{#d91a1a}-0.63\%$
test_dqn_speed[False-backward] 2.4280ms 2.1675ms 461.3600 Ops/s 461.6170 Ops/s $\color{#d91a1a}-0.06\%$
test_dqn_speed[True-None] 1.0904ms 0.5822ms 1.7176 KOps/s 1.7444 KOps/s $\color{#d91a1a}-1.54\%$
test_dqn_speed[True-backward] 1.1332ms 1.0973ms 911.3120 Ops/s 832.6060 Ops/s $\textbf{\color{#35bf28}+9.45\%}$
test_dqn_speed[reduce-overhead-None] 0.7865ms 0.5995ms 1.6681 KOps/s 1.6465 KOps/s $\color{#35bf28}+1.31\%$
test_ddpg_speed[False-None] 3.2669ms 2.8850ms 346.6255 Ops/s 342.7233 Ops/s $\color{#35bf28}+1.14\%$
test_ddpg_speed[False-backward] 4.5445ms 4.1235ms 242.5132 Ops/s 236.0376 Ops/s $\color{#35bf28}+2.74\%$
test_ddpg_speed[True-None] 1.4620ms 1.3249ms 754.7531 Ops/s 755.0080 Ops/s $\color{#d91a1a}-0.03\%$
test_ddpg_speed[True-backward] 2.4245ms 2.3653ms 422.7875 Ops/s 420.0320 Ops/s $\color{#35bf28}+0.66\%$
test_ddpg_speed[reduce-overhead-None] 1.4602ms 1.3524ms 739.4432 Ops/s 741.9298 Ops/s $\color{#d91a1a}-0.34\%$
test_sac_speed[False-None] 8.9031ms 8.3219ms 120.1642 Ops/s 119.8099 Ops/s $\color{#35bf28}+0.30\%$
test_sac_speed[False-backward] 11.6789ms 11.2252ms 89.0852 Ops/s 88.6893 Ops/s $\color{#35bf28}+0.45\%$
test_sac_speed[True-None] 1.9961ms 1.8177ms 550.1532 Ops/s 548.2552 Ops/s $\color{#35bf28}+0.35\%$
test_sac_speed[True-backward] 3.5792ms 3.4657ms 288.5389 Ops/s 284.2179 Ops/s $\color{#35bf28}+1.52\%$
test_sac_speed[reduce-overhead-None] 19.5455ms 11.0547ms 90.4596 Ops/s 90.1581 Ops/s $\color{#35bf28}+0.33\%$
test_redq_deprec_speed[False-None] 10.2845ms 9.4396ms 105.9372 Ops/s 104.5606 Ops/s $\color{#35bf28}+1.32\%$
test_redq_deprec_speed[False-backward] 12.9130ms 12.3588ms 80.9139 Ops/s 79.4591 Ops/s $\color{#35bf28}+1.83\%$
test_redq_deprec_speed[True-None] 2.6488ms 2.5084ms 398.6578 Ops/s 391.1179 Ops/s $\color{#35bf28}+1.93\%$
test_redq_deprec_speed[True-backward] 4.5462ms 4.1075ms 243.4592 Ops/s 230.8535 Ops/s $\textbf{\color{#35bf28}+5.46\%}$
test_redq_deprec_speed[reduce-overhead-None] 16.2006ms 9.9405ms 100.5990 Ops/s 100.3024 Ops/s $\color{#35bf28}+0.30\%$
test_td3_speed[False-None] 8.5410ms 8.2215ms 121.6318 Ops/s 113.7885 Ops/s $\textbf{\color{#35bf28}+6.89\%}$
test_td3_speed[False-backward] 10.8822ms 10.5038ms 95.2033 Ops/s 91.2152 Ops/s $\color{#35bf28}+4.37\%$
test_td3_speed[True-None] 1.6699ms 1.6442ms 608.2035 Ops/s 614.9039 Ops/s $\color{#d91a1a}-1.09\%$
test_td3_speed[True-backward] 3.1288ms 3.0866ms 323.9841 Ops/s 307.3118 Ops/s $\textbf{\color{#35bf28}+5.43\%}$
test_td3_speed[reduce-overhead-None] 73.0274ms 25.0199ms 39.9681 Ops/s 39.9102 Ops/s $\color{#35bf28}+0.15\%$
test_cql_speed[False-None] 17.6884ms 17.2358ms 58.0187 Ops/s 57.8161 Ops/s $\color{#35bf28}+0.35\%$
test_cql_speed[False-backward] 22.8236ms 22.3510ms 44.7407 Ops/s 43.8490 Ops/s $\color{#35bf28}+2.03\%$
test_cql_speed[True-None] 3.5129ms 3.2826ms 304.6333 Ops/s 304.6697 Ops/s $\color{#d91a1a}-0.01\%$
test_cql_speed[True-backward] 5.6400ms 5.3379ms 187.3391 Ops/s 181.2282 Ops/s $\color{#35bf28}+3.37\%$
test_cql_speed[reduce-overhead-None] 18.8992ms 11.9233ms 83.8697 Ops/s 84.4405 Ops/s $\color{#d91a1a}-0.68\%$
test_a2c_speed[False-None] 3.9368ms 3.2377ms 308.8598 Ops/s 309.5894 Ops/s $\color{#d91a1a}-0.24\%$
test_a2c_speed[False-backward] 6.5496ms 6.0653ms 164.8710 Ops/s 158.1680 Ops/s $\color{#35bf28}+4.24\%$
test_a2c_speed[True-None] 1.3994ms 1.3266ms 753.8320 Ops/s 744.2268 Ops/s $\color{#35bf28}+1.29\%$
test_a2c_speed[True-backward] 3.1189ms 2.9898ms 334.4692 Ops/s 320.2259 Ops/s $\color{#35bf28}+4.45\%$
test_a2c_speed[reduce-overhead-None] 1.1388ms 0.9924ms 1.0077 KOps/s 1.0166 KOps/s $\color{#d91a1a}-0.88\%$
test_ppo_speed[False-None] 3.9879ms 3.8460ms 260.0135 Ops/s 260.9289 Ops/s $\color{#d91a1a}-0.35\%$
test_ppo_speed[False-backward] 7.3210ms 6.8882ms 145.1766 Ops/s 141.2807 Ops/s $\color{#35bf28}+2.76\%$
test_ppo_speed[True-None] 1.7743ms 1.4439ms 692.5665 Ops/s 697.8306 Ops/s $\color{#d91a1a}-0.75\%$
test_ppo_speed[True-backward] 3.4697ms 3.0986ms 322.7229 Ops/s 301.6606 Ops/s $\textbf{\color{#35bf28}+6.98\%}$
test_ppo_speed[reduce-overhead-None] 1.1364ms 1.0464ms 955.6418 Ops/s 922.5488 Ops/s $\color{#35bf28}+3.59\%$
test_reinforce_speed[False-None] 2.4616ms 2.2787ms 438.8492 Ops/s 437.3071 Ops/s $\color{#35bf28}+0.35\%$
test_reinforce_speed[False-backward] 3.7665ms 3.3004ms 302.9959 Ops/s 306.6611 Ops/s $\color{#d91a1a}-1.20\%$
test_reinforce_speed[True-None] 1.3661ms 1.3021ms 768.0049 Ops/s 779.0995 Ops/s $\color{#d91a1a}-1.42\%$
test_reinforce_speed[True-backward] 3.0165ms 2.9189ms 342.5951 Ops/s 334.0483 Ops/s $\color{#35bf28}+2.56\%$
test_reinforce_speed[reduce-overhead-None] 0.4495s 10.4038ms 96.1190 Ops/s 105.1991 Ops/s $\textbf{\color{#d91a1a}-8.63\%}$
test_iql_speed[False-None] 9.9153ms 9.4222ms 106.1324 Ops/s 106.4492 Ops/s $\color{#d91a1a}-0.30\%$
test_iql_speed[False-backward] 13.4522ms 13.0263ms 76.7678 Ops/s 76.7861 Ops/s $\color{#d91a1a}-0.02\%$
test_iql_speed[True-None] 2.3439ms 2.1932ms 455.9600 Ops/s 454.2154 Ops/s $\color{#35bf28}+0.38\%$
test_iql_speed[True-backward] 5.2153ms 4.7256ms 211.6154 Ops/s 202.9520 Ops/s $\color{#35bf28}+4.27\%$
test_iql_speed[reduce-overhead-None] 18.0493ms 10.6551ms 93.8517 Ops/s 95.8484 Ops/s $\color{#d91a1a}-2.08\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1678ms 5.9640ms 167.6733 Ops/s 164.3908 Ops/s $\color{#35bf28}+2.00\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6706ms 0.3580ms 2.7932 KOps/s 2.9255 KOps/s $\color{#d91a1a}-4.52\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6144ms 0.3394ms 2.9468 KOps/s 3.4675 KOps/s $\textbf{\color{#d91a1a}-15.02\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1035ms 5.8489ms 170.9728 Ops/s 170.9212 Ops/s $\color{#35bf28}+0.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9611ms 0.3392ms 2.9482 KOps/s 3.1487 KOps/s $\textbf{\color{#d91a1a}-6.37\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5325ms 0.3149ms 3.1757 KOps/s 3.1970 KOps/s $\color{#d91a1a}-0.66\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6123ms 1.3929ms 717.9040 Ops/s 701.5540 Ops/s $\color{#35bf28}+2.33\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6182ms 1.3580ms 736.3623 Ops/s 740.4402 Ops/s $\color{#d91a1a}-0.55\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2620ms 6.0443ms 165.4454 Ops/s 165.1610 Ops/s $\color{#35bf28}+0.17\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1614ms 0.4339ms 2.3049 KOps/s 2.1569 KOps/s $\textbf{\color{#35bf28}+6.86\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6376ms 0.4259ms 2.3480 KOps/s 2.4098 KOps/s $\color{#d91a1a}-2.56\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.5349ms 5.9593ms 167.8062 Ops/s 170.6044 Ops/s $\color{#d91a1a}-1.64\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8285ms 0.2958ms 3.3811 KOps/s 2.6766 KOps/s $\textbf{\color{#35bf28}+26.32\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6003ms 0.3686ms 2.7132 KOps/s 3.7825 KOps/s $\textbf{\color{#d91a1a}-28.27\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0955ms 5.8280ms 171.5866 Ops/s 169.9176 Ops/s $\color{#35bf28}+0.98\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.6322ms 0.3691ms 2.7094 KOps/s 2.7931 KOps/s $\color{#d91a1a}-3.00\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5564ms 0.3087ms 3.2392 KOps/s 3.4543 KOps/s $\textbf{\color{#d91a1a}-6.23\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1896ms 6.0604ms 165.0053 Ops/s 164.2097 Ops/s $\color{#35bf28}+0.48\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.7859ms 0.4850ms 2.0619 KOps/s 2.2363 KOps/s $\textbf{\color{#d91a1a}-7.80\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7483ms 0.4543ms 2.2014 KOps/s 2.2924 KOps/s $\color{#d91a1a}-3.97\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.6471s 17.9532ms 55.7004 Ops/s 45.5248 Ops/s $\textbf{\color{#35bf28}+22.35\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.4086ms 2.0106ms 497.3714 Ops/s 480.0761 Ops/s $\color{#35bf28}+3.60\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 9.5092ms 1.2887ms 775.9905 Ops/s 881.1802 Ops/s $\textbf{\color{#d91a1a}-11.94\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.9031ms 5.0651ms 197.4277 Ops/s 194.0328 Ops/s $\color{#35bf28}+1.75\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.0521ms 1.8014ms 555.1236 Ops/s 547.6284 Ops/s $\color{#35bf28}+1.37\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.0741ms 0.9427ms 1.0607 KOps/s 754.6157 Ops/s $\textbf{\color{#35bf28}+40.57\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5830s 16.8482ms 59.3536 Ops/s 183.7769 Ops/s $\textbf{\color{#d91a1a}-67.70\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.0434ms 1.9535ms 511.9091 Ops/s 471.6666 Ops/s $\textbf{\color{#35bf28}+8.53\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.0980ms 1.1179ms 894.5477 Ops/s 936.1466 Ops/s $\color{#d91a1a}-4.44\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 39.0850ms 36.5570ms 27.3546 Ops/s 27.2726 Ops/s $\color{#35bf28}+0.30\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.2790ms 18.5952ms 53.7774 Ops/s 56.1309 Ops/s $\color{#d91a1a}-4.19\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 41.9518ms 38.2458ms 26.1467 Ops/s 26.6802 Ops/s $\color{#d91a1a}-2.00\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.6261ms 18.9494ms 52.7722 Ops/s 54.9574 Ops/s $\color{#d91a1a}-3.98\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.8262ms 39.5032ms 25.3144 Ops/s 25.7414 Ops/s $\color{#d91a1a}-1.66\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 22.2125ms 20.6756ms 48.3661 Ops/s 50.5554 Ops/s $\color{#d91a1a}-4.33\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8988ms 0.2275ms 4.3964 KOps/s 4.4413 KOps/s $\color{#d91a1a}-1.01\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6051ms 1.4241ms 702.1931 Ops/s 695.4716 Ops/s $\color{#35bf28}+0.97\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.5232ms 2.3238ms 430.3246 Ops/s 443.7768 Ops/s $\color{#d91a1a}-3.03\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0332ms 2.8730ms 348.0686 Ops/s 343.9342 Ops/s $\color{#35bf28}+1.20\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2369ms 0.1515ms 6.6019 KOps/s 6.7387 KOps/s $\color{#d91a1a}-2.03\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3834ms 0.2289ms 4.3695 KOps/s 4.8105 KOps/s $\textbf{\color{#d91a1a}-9.17\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.8739ms 1.7112ms 584.3739 Ops/s 553.8458 Ops/s $\textbf{\color{#35bf28}+5.51\%}$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4372ms 1.2979ms 770.5036 Ops/s 742.7778 Ops/s $\color{#35bf28}+3.73\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2366ms 1.1293ms 885.4820 Ops/s 876.9184 Ops/s $\color{#35bf28}+0.98\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8202ms 3.6273ms 275.6876 Ops/s 265.5702 Ops/s $\color{#35bf28}+3.81\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.7534ms 5.6138ms 178.1329 Ops/s 175.8469 Ops/s $\color{#35bf28}+1.30\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.4796ms 7.3433ms 136.1785 Ops/s 139.2154 Ops/s $\color{#d91a1a}-2.18\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4804ms 0.2853ms 3.5050 KOps/s 3.6836 KOps/s $\color{#d91a1a}-4.85\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.5966ms 1.4582ms 685.7819 Ops/s 652.7806 Ops/s $\textbf{\color{#35bf28}+5.06\%}$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.5538ms 2.4259ms 412.2105 Ops/s 411.1285 Ops/s $\color{#35bf28}+0.26\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4802ms 3.0963ms 322.9669 Ops/s 320.3387 Ops/s $\color{#35bf28}+0.82\%$
test_collector_without_rb[100-img_shape0-atari] 35.4567ms 34.8759ms 28.6731 Ops/s 29.7289 Ops/s $\color{#d91a1a}-3.55\%$
test_collector_without_rb[200-img_shape1-large_batch] 68.7976ms 67.1358ms 14.8952 Ops/s 15.0731 Ops/s $\color{#d91a1a}-1.18\%$
test_collector_with_rb[100-img_shape0-atari] 40.3648ms 38.8413ms 25.7458 Ops/s 26.1545 Ops/s $\color{#d91a1a}-1.56\%$
test_collector_with_rb[200-img_shape1-large_batch] 77.5613ms 75.7553ms 13.2004 Ops/s 13.3847 Ops/s $\color{#d91a1a}-1.38\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 58.4168ms 57.1454ms 17.4992 Ops/s 17.4924 Ops/s $\color{#35bf28}+0.04\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1166s 0.1152s 8.6777 Ops/s 8.7698 Ops/s $\color{#d91a1a}-1.05\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 60.8497ms 59.9721ms 16.6744 Ops/s 16.9707 Ops/s $\color{#d91a1a}-1.75\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1204s 0.1188s 8.4147 Ops/s 8.4284 Ops/s $\color{#d91a1a}-0.16\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Feature New feature Record

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant