Skip to content

[Performance] Use log_metrics in sota-implementations#3454

Merged
vmoens merged 1 commit intomainfrom
feat/use-log-metrics-sota
Feb 6, 2026
Merged

[Performance] Use log_metrics in sota-implementations#3454
vmoens merged 1 commit intomainfrom
feat/use-log-metrics-sota

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 6, 2026

Summary

Replace loops calling log_scalar with single log_metrics calls across all sota-implementations. This is a follow-up to #3452.

Benefits

  1. Efficiency: For loggers with batch APIs (wandb, mlflow), this uses a single API call instead of N calls for N metrics.

  2. CUDA sync optimization: The new log_metrics method batches CUDA→CPU tensor transfers with non_blocking=True and syncs once, avoiding the overhead of multiple implicit synchronizations from calling .item() on each tensor individually.

Updated implementations

  • On-policy: PPO (mujoco, atari), A2C (mujoco, atari), IMPALA (single_node, multi_node_submitit, multi_node_ray)
  • Off-policy: DQN (cartpole, atari), SAC, TD3, TD3-BC, DDPG, CrossQ, CQL, IQL, Discrete SAC
  • Offline RL: Decision Transformer
  • Model-based: Dreamer
  • Imitation: GAIL
  • LLM: Expert-Iteration (sync, async), GRPO (sync, async)
  • Multi-agent: Multiagent logging utility

Changes

27 files updated with simple refactoring:

  • Replace for key, value in metrics.items(): logger.log_scalar(key, value, step) with logger.log_metrics(metrics, step)
  • For utility files that defined their own log_metrics(logger, metrics, step) helper, simplified the implementation to just call logger.log_metrics(metrics, step)

Test plan

  • Verify existing tests pass
  • Manual testing with sample implementations

Made with Cursor

Replace loops calling log_scalar with single log_metrics calls across all
sota-implementations. This provides two benefits:

1. Efficiency: For loggers with batch APIs (wandb, mlflow), this uses a
   single API call instead of N calls for N metrics.

2. CUDA sync optimization: The new log_metrics method batches CUDA->CPU
   tensor transfers with non_blocking=True and syncs once, avoiding the
   overhead of multiple implicit synchronizations from calling .item()
   on each tensor individually.

Updated implementations:
- PPO (mujoco, atari)
- A2C (mujoco, atari)
- IMPALA (single_node, multi_node_submitit, multi_node_ray)
- DQN (cartpole, atari)
- SAC, TD3, TD3-BC, DDPG, CrossQ, CQL, IQL, Discrete SAC
- Decision Transformer, Dreamer, GAIL
- Expert-Iteration (sync, async)
- GRPO (sync, async)
- Multiagent logging utility

Co-authored-by: Cursor <[email protected]>
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3454

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6592140 with merge base 190a43d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 6, 2026
@github-actions github-actions bot added Performance Performance issue or suggestion for improvement sota-implementations/ and removed Performance Performance issue or suggestion for improvement labels Feb 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}20$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 85.0430μs 82.5966μs 12.1070 KOps/s 12.1626 KOps/s $\color{#d91a1a}-0.46\%$
test_tensor_to_bytestream_speed[torch.save] 0.1402ms 0.1392ms 7.1836 KOps/s 7.1155 KOps/s $\color{#35bf28}+0.96\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1141s 0.1137s 8.7951 Ops/s 9.6974 Ops/s $\textbf{\color{#d91a1a}-9.30\%}$
test_tensor_to_bytestream_speed[numpy] 2.7125μs 2.6990μs 370.5059 KOps/s 374.8855 KOps/s $\color{#d91a1a}-1.17\%$
test_tensor_to_bytestream_speed[safetensors] 38.9159μs 37.2543μs 26.8425 KOps/s 25.6223 KOps/s $\color{#35bf28}+4.76\%$
test_simple 0.5477s 0.5471s 1.8278 Ops/s 1.7467 Ops/s $\color{#35bf28}+4.64\%$
test_transformed 1.1343s 1.1324s 0.8830 Ops/s 0.8622 Ops/s $\color{#35bf28}+2.41\%$
test_serial 1.6749s 1.6718s 0.5981 Ops/s 0.5812 Ops/s $\color{#35bf28}+2.91\%$
test_parallel 1.2043s 1.1096s 0.9012 Ops/s 0.8449 Ops/s $\textbf{\color{#35bf28}+6.67\%}$
test_step_mdp_speed[True-True-True-True-True] 0.1695ms 45.7086μs 21.8777 KOps/s 21.8114 KOps/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[True-True-True-True-False] 59.4340μs 26.1897μs 38.1829 KOps/s 39.4094 KOps/s $\color{#d91a1a}-3.11\%$
test_step_mdp_speed[True-True-True-False-True] 49.8830μs 25.7900μs 38.7747 KOps/s 37.9571 KOps/s $\color{#35bf28}+2.15\%$
test_step_mdp_speed[True-True-True-False-False] 48.2820μs 14.5480μs 68.7378 KOps/s 69.9203 KOps/s $\color{#d91a1a}-1.69\%$
test_step_mdp_speed[True-True-False-True-True] 88.2940μs 49.3703μs 20.2551 KOps/s 20.6893 KOps/s $\color{#d91a1a}-2.10\%$
test_step_mdp_speed[True-True-False-True-False] 56.6930μs 28.4163μs 35.1911 KOps/s 34.8666 KOps/s $\color{#35bf28}+0.93\%$
test_step_mdp_speed[True-True-False-False-True] 73.6240μs 28.5782μs 34.9917 KOps/s 35.2180 KOps/s $\color{#d91a1a}-0.64\%$
test_step_mdp_speed[True-True-False-False-False] 48.8730μs 17.4823μs 57.2008 KOps/s 57.4273 KOps/s $\color{#d91a1a}-0.39\%$
test_step_mdp_speed[True-False-True-True-True] 84.4940μs 50.7998μs 19.6851 KOps/s 19.3276 KOps/s $\color{#35bf28}+1.85\%$
test_step_mdp_speed[True-False-True-True-False] 71.6840μs 31.0957μs 32.1588 KOps/s 32.9968 KOps/s $\color{#d91a1a}-2.54\%$
test_step_mdp_speed[True-False-True-False-True] 53.1520μs 29.4706μs 33.9322 KOps/s 34.9494 KOps/s $\color{#d91a1a}-2.91\%$
test_step_mdp_speed[True-False-True-False-False] 46.2220μs 17.4363μs 57.3515 KOps/s 58.7648 KOps/s $\color{#d91a1a}-2.40\%$
test_step_mdp_speed[True-False-False-True-True] 82.8240μs 54.8045μs 18.2467 KOps/s 18.8138 KOps/s $\color{#d91a1a}-3.01\%$
test_step_mdp_speed[True-False-False-True-False] 80.7840μs 33.6319μs 29.7336 KOps/s 29.9242 KOps/s $\color{#d91a1a}-0.64\%$
test_step_mdp_speed[True-False-False-False-True] 0.1019ms 30.9694μs 32.2900 KOps/s 32.7260 KOps/s $\color{#d91a1a}-1.33\%$
test_step_mdp_speed[True-False-False-False-False] 43.7720μs 20.0871μs 49.7833 KOps/s 51.2012 KOps/s $\color{#d91a1a}-2.77\%$
test_step_mdp_speed[False-True-True-True-True] 80.0740μs 51.9699μs 19.2419 KOps/s 19.4163 KOps/s $\color{#d91a1a}-0.90\%$
test_step_mdp_speed[False-True-True-True-False] 61.0430μs 31.1173μs 32.1365 KOps/s 32.2376 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[False-True-True-False-True] 2.2866ms 33.2377μs 30.0863 KOps/s 30.8905 KOps/s $\color{#d91a1a}-2.60\%$
test_step_mdp_speed[False-True-True-False-False] 45.1420μs 19.3471μs 51.6873 KOps/s 54.3669 KOps/s $\color{#d91a1a}-4.93\%$
test_step_mdp_speed[False-True-False-True-True] 97.1750μs 53.7714μs 18.5972 KOps/s 18.9215 KOps/s $\color{#d91a1a}-1.71\%$
test_step_mdp_speed[False-True-False-True-False] 74.2040μs 33.6997μs 29.6739 KOps/s 29.7798 KOps/s $\color{#d91a1a}-0.36\%$
test_step_mdp_speed[False-True-False-False-True] 62.7930μs 34.9401μs 28.6204 KOps/s 28.7776 KOps/s $\color{#d91a1a}-0.55\%$
test_step_mdp_speed[False-True-False-False-False] 50.6330μs 22.0631μs 45.3245 KOps/s 47.0442 KOps/s $\color{#d91a1a}-3.66\%$
test_step_mdp_speed[False-False-True-True-True] 0.1029ms 57.4482μs 17.4070 KOps/s 17.8708 KOps/s $\color{#d91a1a}-2.60\%$
test_step_mdp_speed[False-False-True-True-False] 63.8430μs 36.5890μs 27.3306 KOps/s 27.5240 KOps/s $\color{#d91a1a}-0.70\%$
test_step_mdp_speed[False-False-True-False-True] 64.6140μs 34.9227μs 28.6347 KOps/s 28.9269 KOps/s $\color{#d91a1a}-1.01\%$
test_step_mdp_speed[False-False-True-False-False] 66.1140μs 21.8451μs 45.7768 KOps/s 46.8797 KOps/s $\color{#d91a1a}-2.35\%$
test_step_mdp_speed[False-False-False-True-True] 98.8050μs 58.8740μs 16.9854 KOps/s 16.9976 KOps/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[False-False-False-True-False] 68.5630μs 39.0650μs 25.5984 KOps/s 25.6242 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[False-False-False-False-True] 64.3130μs 36.7796μs 27.1890 KOps/s 27.1437 KOps/s $\color{#35bf28}+0.17\%$
test_step_mdp_speed[False-False-False-False-False] 47.9320μs 23.9540μs 41.7467 KOps/s 41.5233 KOps/s $\color{#35bf28}+0.54\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7560s 0.7486s 1.3359 Ops/s 1.2882 Ops/s $\color{#35bf28}+3.70\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7348s 0.6394s 1.5639 Ops/s 1.5747 Ops/s $\color{#d91a1a}-0.68\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7795s 1.6990s 0.5886 Ops/s 0.5930 Ops/s $\color{#d91a1a}-0.75\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5493s 1.4668s 0.6818 Ops/s 0.6845 Ops/s $\color{#d91a1a}-0.40\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0250s 1.9461s 0.5139 Ops/s 0.5152 Ops/s $\color{#d91a1a}-0.27\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7967s 1.7163s 0.5826 Ops/s 0.5845 Ops/s $\color{#d91a1a}-0.33\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6945s 4.6094s 0.2169 Ops/s 0.2150 Ops/s $\color{#35bf28}+0.90\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4669s 4.3892s 0.2278 Ops/s 0.2259 Ops/s $\color{#35bf28}+0.86\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.1628s 2.0105s 0.4974 Ops/s 0.5065 Ops/s $\color{#d91a1a}-1.80\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7792s 1.6843s 0.5937 Ops/s 0.6006 Ops/s $\color{#d91a1a}-1.15\%$
test_values[generalized_advantage_estimate-True-True] 10.1337ms 10.0173ms 99.8270 Ops/s 101.3051 Ops/s $\color{#d91a1a}-1.46\%$
test_values[vec_generalized_advantage_estimate-True-True] 20.0555ms 17.4772ms 57.2175 Ops/s 56.8928 Ops/s $\color{#35bf28}+0.57\%$
test_values[td0_return_estimate-False-False] 0.2312ms 0.1250ms 8.0023 KOps/s 8.0208 KOps/s $\color{#d91a1a}-0.23\%$
test_values[td1_return_estimate-False-False] 26.9427ms 26.5917ms 37.6058 Ops/s 37.8137 Ops/s $\color{#d91a1a}-0.55\%$
test_values[vec_td1_return_estimate-False-False] 17.9280ms 17.5094ms 57.1123 Ops/s 56.1712 Ops/s $\color{#35bf28}+1.68\%$
test_values[td_lambda_return_estimate-True-False] 40.8512ms 39.3363ms 25.4218 Ops/s 25.4613 Ops/s $\color{#d91a1a}-0.16\%$
test_values[vec_td_lambda_return_estimate-True-False] 17.8355ms 17.5211ms 57.0742 Ops/s 55.8646 Ops/s $\color{#35bf28}+2.17\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.5703ms 8.8658ms 112.7929 Ops/s 114.7183 Ops/s $\color{#d91a1a}-1.68\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.6809ms 1.5182ms 658.6784 Ops/s 672.6666 Ops/s $\color{#d91a1a}-2.08\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4600ms 0.4067ms 2.4591 KOps/s 2.3868 KOps/s $\color{#35bf28}+3.03\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.0793ms 30.2490ms 33.0590 Ops/s 28.6700 Ops/s $\textbf{\color{#35bf28}+15.31\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.0884ms 1.6944ms 590.1882 Ops/s 583.6917 Ops/s $\color{#35bf28}+1.11\%$
test_dqn_speed[False-None] 1.7872ms 1.3796ms 724.8277 Ops/s 723.8121 Ops/s $\color{#35bf28}+0.14\%$
test_dqn_speed[False-backward] 1.9518ms 1.8933ms 528.1908 Ops/s 532.7579 Ops/s $\color{#d91a1a}-0.86\%$
test_dqn_speed[True-None] 0.9726ms 0.5518ms 1.8123 KOps/s 1.8336 KOps/s $\color{#d91a1a}-1.16\%$
test_dqn_speed[True-backward] 1.0289ms 0.9970ms 1.0031 KOps/s 838.3931 Ops/s $\textbf{\color{#35bf28}+19.64\%}$
test_dqn_speed[reduce-overhead-None] 0.8152ms 0.5393ms 1.8541 KOps/s 1.7810 KOps/s $\color{#35bf28}+4.10\%$
test_ddpg_speed[False-None] 3.1599ms 2.8208ms 354.5121 Ops/s 352.4031 Ops/s $\color{#35bf28}+0.60\%$
test_ddpg_speed[False-backward] 4.0533ms 3.9864ms 250.8532 Ops/s 249.0320 Ops/s $\color{#35bf28}+0.73\%$
test_ddpg_speed[True-None] 1.7957ms 1.4162ms 706.1066 Ops/s 689.7158 Ops/s $\color{#35bf28}+2.38\%$
test_ddpg_speed[True-backward] 2.4422ms 2.3951ms 417.5206 Ops/s 406.1231 Ops/s $\color{#35bf28}+2.81\%$
test_ddpg_speed[reduce-overhead-None] 2.5099ms 1.4503ms 689.5354 Ops/s 704.7657 Ops/s $\color{#d91a1a}-2.16\%$
test_sac_speed[False-None] 8.5491ms 7.9026ms 126.5403 Ops/s 126.9746 Ops/s $\color{#d91a1a}-0.34\%$
test_sac_speed[False-backward] 11.6219ms 11.0645ms 90.3790 Ops/s 90.6775 Ops/s $\color{#d91a1a}-0.33\%$
test_sac_speed[True-None] 2.3174ms 2.1552ms 463.9984 Ops/s 456.7417 Ops/s $\color{#35bf28}+1.59\%$
test_sac_speed[True-backward] 4.1502ms 4.0325ms 247.9859 Ops/s 217.5160 Ops/s $\textbf{\color{#35bf28}+14.01\%}$
test_sac_speed[reduce-overhead-None] 2.4984ms 2.1387ms 467.5736 Ops/s 456.0948 Ops/s $\color{#35bf28}+2.52\%$
test_redq_speed[False-None] 15.7678ms 10.6537ms 93.8641 Ops/s 96.2757 Ops/s $\color{#d91a1a}-2.50\%$
test_redq_speed[False-backward] 19.1545ms 17.8029ms 56.1706 Ops/s 55.4614 Ops/s $\color{#35bf28}+1.28\%$
test_redq_speed[True-None] 4.8960ms 4.4868ms 222.8767 Ops/s 213.1646 Ops/s $\color{#35bf28}+4.56\%$
test_redq_speed[True-backward] 10.1267ms 9.8137ms 101.8984 Ops/s 100.6980 Ops/s $\color{#35bf28}+1.19\%$
test_redq_speed[reduce-overhead-None] 4.7420ms 4.4939ms 222.5243 Ops/s 217.7400 Ops/s $\color{#35bf28}+2.20\%$
test_redq_deprec_speed[False-None] 11.4016ms 10.9698ms 91.1591 Ops/s 93.1968 Ops/s $\color{#d91a1a}-2.19\%$
test_redq_deprec_speed[False-backward] 16.0237ms 15.7491ms 63.4955 Ops/s 63.7081 Ops/s $\color{#d91a1a}-0.33\%$
test_redq_deprec_speed[True-None] 4.4024ms 3.6987ms 270.3637 Ops/s 272.3596 Ops/s $\color{#d91a1a}-0.73\%$
test_redq_deprec_speed[True-backward] 7.8586ms 7.6550ms 130.6342 Ops/s 135.7647 Ops/s $\color{#d91a1a}-3.78\%$
test_redq_deprec_speed[reduce-overhead-None] 4.0859ms 3.6450ms 274.3486 Ops/s 282.7344 Ops/s $\color{#d91a1a}-2.97\%$
test_td3_speed[False-None] 8.2078ms 8.0180ms 124.7199 Ops/s 126.8121 Ops/s $\color{#d91a1a}-1.65\%$
test_td3_speed[False-backward] 11.3144ms 10.8319ms 92.3203 Ops/s 93.2823 Ops/s $\color{#d91a1a}-1.03\%$
test_td3_speed[True-None] 1.9169ms 1.8761ms 533.0302 Ops/s 536.0122 Ops/s $\color{#d91a1a}-0.56\%$
test_td3_speed[True-backward] 4.0322ms 3.6477ms 274.1477 Ops/s 223.3206 Ops/s $\textbf{\color{#35bf28}+22.76\%}$
test_td3_speed[reduce-overhead-None] 1.8861ms 1.8189ms 549.7749 Ops/s 543.7373 Ops/s $\color{#35bf28}+1.11\%$
test_cql_speed[False-None] 28.9880ms 25.9938ms 38.4707 Ops/s 38.9990 Ops/s $\color{#d91a1a}-1.35\%$
test_cql_speed[False-backward] 35.7227ms 35.1475ms 28.4515 Ops/s 28.7727 Ops/s $\color{#d91a1a}-1.12\%$
test_cql_speed[True-None] 12.8378ms 12.3898ms 80.7118 Ops/s 82.0499 Ops/s $\color{#d91a1a}-1.63\%$
test_cql_speed[True-backward] 19.6570ms 18.5451ms 53.9226 Ops/s 56.4633 Ops/s $\color{#d91a1a}-4.50\%$
test_cql_speed[reduce-overhead-None] 15.4813ms 12.5364ms 79.7675 Ops/s 81.0537 Ops/s $\color{#d91a1a}-1.59\%$
test_a2c_speed[False-None] 5.8913ms 5.4298ms 184.1685 Ops/s 187.5218 Ops/s $\color{#d91a1a}-1.79\%$
test_a2c_speed[False-backward] 12.2260ms 11.7828ms 84.8692 Ops/s 85.4270 Ops/s $\color{#d91a1a}-0.65\%$
test_a2c_speed[True-None] 3.9516ms 3.7272ms 268.3005 Ops/s 265.8301 Ops/s $\color{#35bf28}+0.93\%$
test_a2c_speed[True-backward] 8.8816ms 8.5896ms 116.4201 Ops/s 115.8520 Ops/s $\color{#35bf28}+0.49\%$
test_a2c_speed[reduce-overhead-None] 4.1699ms 3.7457ms 266.9748 Ops/s 268.8357 Ops/s $\color{#d91a1a}-0.69\%$
test_ppo_speed[False-None] 6.3735ms 5.8533ms 170.8426 Ops/s 170.0460 Ops/s $\color{#35bf28}+0.47\%$
test_ppo_speed[False-backward] 12.7946ms 12.3976ms 80.6607 Ops/s 80.7476 Ops/s $\color{#d91a1a}-0.11\%$
test_ppo_speed[True-None] 3.7746ms 3.6573ms 273.4275 Ops/s 275.0724 Ops/s $\color{#d91a1a}-0.60\%$
test_ppo_speed[True-backward] 8.7169ms 8.5046ms 117.5833 Ops/s 117.7128 Ops/s $\color{#d91a1a}-0.11\%$
test_ppo_speed[reduce-overhead-None] 4.0064ms 3.6283ms 275.6079 Ops/s 274.6750 Ops/s $\color{#35bf28}+0.34\%$
test_reinforce_speed[False-None] 4.9082ms 4.5357ms 220.4731 Ops/s 219.5673 Ops/s $\color{#35bf28}+0.41\%$
test_reinforce_speed[False-backward] 7.6155ms 7.3700ms 135.6850 Ops/s 136.8679 Ops/s $\color{#d91a1a}-0.86\%$
test_reinforce_speed[True-None] 3.3404ms 2.9422ms 339.8769 Ops/s 334.8271 Ops/s $\color{#35bf28}+1.51\%$
test_reinforce_speed[True-backward] 8.2779ms 7.7507ms 129.0207 Ops/s 121.4286 Ops/s $\textbf{\color{#35bf28}+6.25\%}$
test_reinforce_speed[reduce-overhead-None] 3.3118ms 2.9005ms 344.7701 Ops/s 336.4737 Ops/s $\color{#35bf28}+2.47\%$
test_iql_speed[False-None] 26.1106ms 20.7903ms 48.0995 Ops/s 50.1294 Ops/s $\color{#d91a1a}-4.05\%$
test_iql_speed[False-backward] 35.6334ms 30.4081ms 32.8860 Ops/s 32.8840 Ops/s $+0.01\%$
test_iql_speed[True-None] 8.8503ms 8.5591ms 116.8346 Ops/s 116.2994 Ops/s $\color{#35bf28}+0.46\%$
test_iql_speed[True-backward] 17.2872ms 16.8181ms 59.4597 Ops/s 59.5545 Ops/s $\color{#d91a1a}-0.16\%$
test_iql_speed[reduce-overhead-None] 9.1433ms 8.6201ms 116.0085 Ops/s 111.8361 Ops/s $\color{#35bf28}+3.73\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1708ms 6.0666ms 164.8360 Ops/s 162.5528 Ops/s $\color{#35bf28}+1.40\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.2246ms 0.3180ms 3.1444 KOps/s 2.8371 KOps/s $\textbf{\color{#35bf28}+10.83\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7214ms 0.2708ms 3.6929 KOps/s 2.9488 KOps/s $\textbf{\color{#35bf28}+25.23\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0158ms 5.7585ms 173.6564 Ops/s 168.7517 Ops/s $\color{#35bf28}+2.91\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9494ms 0.3232ms 3.0939 KOps/s 2.7530 KOps/s $\textbf{\color{#35bf28}+12.38\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6610ms 0.3014ms 3.3173 KOps/s 2.8674 KOps/s $\textbf{\color{#35bf28}+15.69\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5583ms 1.2471ms 801.8866 Ops/s 740.3827 Ops/s $\textbf{\color{#35bf28}+8.31\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4156ms 1.1653ms 858.1298 Ops/s 786.3162 Ops/s $\textbf{\color{#35bf28}+9.13\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.0129ms 6.0691ms 164.7683 Ops/s 164.0964 Ops/s $\color{#35bf28}+0.41\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0609ms 0.4917ms 2.0339 KOps/s 1.8962 KOps/s $\textbf{\color{#35bf28}+7.26\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7739ms 0.4173ms 2.3964 KOps/s 1.9773 KOps/s $\textbf{\color{#35bf28}+21.20\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8747ms 5.7798ms 173.0177 Ops/s 166.9078 Ops/s $\color{#35bf28}+3.66\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8703ms 0.3702ms 2.7015 KOps/s 2.7702 KOps/s $\color{#d91a1a}-2.48\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6245ms 0.3506ms 2.8526 KOps/s 2.8994 KOps/s $\color{#d91a1a}-1.61\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0053ms 5.7792ms 173.0341 Ops/s 169.7299 Ops/s $\color{#35bf28}+1.95\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.5676ms 0.2817ms 3.5500 KOps/s 2.8664 KOps/s $\textbf{\color{#35bf28}+23.85\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4527ms 0.2651ms 3.7723 KOps/s 3.1984 KOps/s $\textbf{\color{#35bf28}+17.95\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1310ms 6.0056ms 166.5112 Ops/s 163.5487 Ops/s $\color{#35bf28}+1.81\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0973ms 0.4932ms 2.0278 KOps/s 2.0896 KOps/s $\color{#d91a1a}-2.96\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6890ms 0.4623ms 2.1629 KOps/s 2.1655 KOps/s $\color{#d91a1a}-0.12\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.6221ms 5.1712ms 193.3779 Ops/s 198.4815 Ops/s $\color{#d91a1a}-2.57\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 7.5886ms 2.1554ms 463.9435 Ops/s 546.1681 Ops/s $\textbf{\color{#d91a1a}-15.05\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 10.8587ms 1.2699ms 787.4343 Ops/s 1.1161 KOps/s $\textbf{\color{#d91a1a}-29.45\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5690s 16.4601ms 60.7528 Ops/s 197.6535 Ops/s $\textbf{\color{#d91a1a}-69.26\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9787ms 1.7293ms 578.2623 Ops/s 503.7559 Ops/s $\textbf{\color{#35bf28}+14.79\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 10.0326ms 1.3286ms 752.6773 Ops/s 863.4206 Ops/s $\textbf{\color{#d91a1a}-12.83\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.3455ms 5.4535ms 183.3671 Ops/s 56.5598 Ops/s $\textbf{\color{#35bf28}+224.20\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.1764ms 1.9577ms 510.7974 Ops/s 498.7510 Ops/s $\color{#35bf28}+2.42\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.4540ms 1.1009ms 908.3444 Ops/s 682.8482 Ops/s $\textbf{\color{#35bf28}+33.02\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 40.0165ms 37.5821ms 26.6084 Ops/s 26.9737 Ops/s $\color{#d91a1a}-1.35\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.5994ms 18.0012ms 55.5519 Ops/s 54.7787 Ops/s $\color{#35bf28}+1.41\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 41.0527ms 37.1311ms 26.9316 Ops/s 26.7272 Ops/s $\color{#35bf28}+0.76\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.0900ms 18.5316ms 53.9620 Ops/s 54.8305 Ops/s $\color{#d91a1a}-1.58\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 42.2963ms 39.9061ms 25.0588 Ops/s 25.7349 Ops/s $\color{#d91a1a}-2.63\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.4428ms 19.9074ms 50.2325 Ops/s 50.9713 Ops/s $\color{#d91a1a}-1.45\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8979ms 0.2193ms 4.5600 KOps/s 4.5983 KOps/s $\color{#d91a1a}-0.83\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7526ms 1.4260ms 701.2672 Ops/s 715.0794 Ops/s $\color{#d91a1a}-1.93\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7232ms 2.3529ms 425.0031 Ops/s 427.8909 Ops/s $\color{#d91a1a}-0.67\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1329ms 2.9209ms 342.3582 Ops/s 343.1238 Ops/s $\color{#d91a1a}-0.22\%$
test_storage_write_contiguous[50-img_shape0-small] 0.4688ms 0.1411ms 7.0876 KOps/s 7.5339 KOps/s $\textbf{\color{#d91a1a}-5.92\%}$
test_storage_write_contiguous[100-img_shape1-atari] 0.3678ms 0.2040ms 4.9014 KOps/s 5.1754 KOps/s $\textbf{\color{#d91a1a}-5.29\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9903ms 1.7711ms 564.6076 Ops/s 573.7236 Ops/s $\color{#d91a1a}-1.59\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4660ms 1.3238ms 755.4121 Ops/s 771.8771 Ops/s $\color{#d91a1a}-2.13\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2025ms 1.1207ms 892.2715 Ops/s 895.9150 Ops/s $\color{#d91a1a}-0.41\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7717ms 3.5816ms 279.2053 Ops/s 273.8470 Ops/s $\color{#35bf28}+1.96\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.0308ms 5.6890ms 175.7775 Ops/s 178.0604 Ops/s $\color{#d91a1a}-1.28\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.1794ms 6.9301ms 144.2977 Ops/s 136.9742 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4525ms 0.2790ms 3.5841 KOps/s 3.5054 KOps/s $\color{#35bf28}+2.25\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7842ms 1.5419ms 648.5397 Ops/s 651.3517 Ops/s $\color{#d91a1a}-0.43\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8929ms 2.4289ms 411.7170 Ops/s 405.1632 Ops/s $\color{#35bf28}+1.62\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3933ms 3.0961ms 322.9855 Ops/s 314.9180 Ops/s $\color{#35bf28}+2.56\%$
test_collector_without_rb[100-img_shape0-atari] 34.2847ms 33.6894ms 29.6829 Ops/s 29.7729 Ops/s $\color{#d91a1a}-0.30\%$
test_collector_without_rb[200-img_shape1-large_batch] 0.5761s 0.1012s 9.8856 Ops/s 15.0986 Ops/s $\textbf{\color{#d91a1a}-34.53\%}$
test_collector_with_rb[100-img_shape0-atari] 39.4897ms 38.9160ms 25.6963 Ops/s 26.1327 Ops/s $\color{#d91a1a}-1.67\%$
test_collector_with_rb[200-img_shape1-large_batch] 76.8399ms 76.0352ms 13.1518 Ops/s 13.3420 Ops/s $\color{#d91a1a}-1.43\%$

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 85.4610μs 84.4386μs 11.8429 KOps/s 12.4221 KOps/s $\color{#d91a1a}-4.66\%$
test_tensor_to_bytestream_speed[torch.save] 0.1452ms 0.1420ms 7.0398 KOps/s 7.1426 KOps/s $\color{#d91a1a}-1.44\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1138s 0.1136s 8.8051 Ops/s 9.3829 Ops/s $\textbf{\color{#d91a1a}-6.16\%}$
test_tensor_to_bytestream_speed[numpy] 2.6539μs 2.6477μs 377.6842 KOps/s 375.1945 KOps/s $\color{#35bf28}+0.66\%$
test_tensor_to_bytestream_speed[safetensors] 37.3998μs 37.1105μs 26.9466 KOps/s 26.8916 KOps/s $\color{#35bf28}+0.20\%$
test_simple 0.8078s 0.8053s 1.2417 Ops/s 1.1872 Ops/s $\color{#35bf28}+4.59\%$
test_transformed 1.5568s 1.4645s 0.6828 Ops/s 0.6778 Ops/s $\color{#35bf28}+0.75\%$
test_serial 2.4491s 2.3555s 0.4245 Ops/s 0.4216 Ops/s $\color{#35bf28}+0.69\%$
test_parallel 2.0364s 1.9874s 0.5032 Ops/s 0.5103 Ops/s $\color{#d91a1a}-1.40\%$
test_step_mdp_speed[True-True-True-True-True] 0.3319ms 44.7768μs 22.3330 KOps/s 22.8173 KOps/s $\color{#d91a1a}-2.12\%$
test_step_mdp_speed[True-True-True-True-False] 56.9710μs 25.6488μs 38.9882 KOps/s 40.2397 KOps/s $\color{#d91a1a}-3.11\%$
test_step_mdp_speed[True-True-True-False-True] 79.8420μs 25.1051μs 39.8325 KOps/s 39.8204 KOps/s $\color{#35bf28}+0.03\%$
test_step_mdp_speed[True-True-True-False-False] 82.6820μs 13.9584μs 71.6417 KOps/s 71.8653 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[True-True-False-True-True] 0.1044ms 47.0279μs 21.2640 KOps/s 21.0782 KOps/s $\color{#35bf28}+0.88\%$
test_step_mdp_speed[True-True-False-True-False] 76.6710μs 27.7667μs 36.0143 KOps/s 36.5411 KOps/s $\color{#d91a1a}-1.44\%$
test_step_mdp_speed[True-True-False-False-True] 66.8010μs 27.6932μs 36.1100 KOps/s 36.2730 KOps/s $\color{#d91a1a}-0.45\%$
test_step_mdp_speed[True-True-False-False-False] 51.2100μs 16.6285μs 60.1375 KOps/s 61.3029 KOps/s $\color{#d91a1a}-1.90\%$
test_step_mdp_speed[True-False-True-True-True] 0.1008ms 50.7087μs 19.7205 KOps/s 19.6842 KOps/s $\color{#35bf28}+0.18\%$
test_step_mdp_speed[True-False-True-True-False] 94.0210μs 30.9777μs 32.2813 KOps/s 32.8417 KOps/s $\color{#d91a1a}-1.71\%$
test_step_mdp_speed[True-False-True-False-True] 77.5810μs 28.3792μs 35.2370 KOps/s 36.5981 KOps/s $\color{#d91a1a}-3.72\%$
test_step_mdp_speed[True-False-True-False-False] 46.1510μs 17.0671μs 58.5922 KOps/s 60.5754 KOps/s $\color{#d91a1a}-3.27\%$
test_step_mdp_speed[True-False-False-True-True] 94.9020μs 53.7547μs 18.6030 KOps/s 18.6603 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[True-False-False-True-False] 72.8410μs 33.6901μs 29.6823 KOps/s 30.1182 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[True-False-False-False-True] 99.4720μs 30.7539μs 32.5162 KOps/s 32.8472 KOps/s $\color{#d91a1a}-1.01\%$
test_step_mdp_speed[True-False-False-False-False] 50.0310μs 19.6282μs 50.9472 KOps/s 52.7525 KOps/s $\color{#d91a1a}-3.42\%$
test_step_mdp_speed[False-True-True-True-True] 0.1112ms 52.3025μs 19.1196 KOps/s 19.7965 KOps/s $\color{#d91a1a}-3.42\%$
test_step_mdp_speed[False-True-True-True-False] 0.1247ms 30.6397μs 32.6374 KOps/s 32.1158 KOps/s $\color{#35bf28}+1.62\%$
test_step_mdp_speed[False-True-True-False-True] 2.4103ms 31.7765μs 31.4698 KOps/s 30.6545 KOps/s $\color{#35bf28}+2.66\%$
test_step_mdp_speed[False-True-True-False-False] 47.5310μs 18.3011μs 54.6415 KOps/s 54.8725 KOps/s $\color{#d91a1a}-0.42\%$
test_step_mdp_speed[False-True-False-True-True] 0.1011ms 53.7843μs 18.5928 KOps/s 18.9274 KOps/s $\color{#d91a1a}-1.77\%$
test_step_mdp_speed[False-True-False-True-False] 71.4210μs 32.8530μs 30.4386 KOps/s 30.0422 KOps/s $\color{#35bf28}+1.32\%$
test_step_mdp_speed[False-True-False-False-True] 69.2220μs 33.9966μs 29.4147 KOps/s 29.2858 KOps/s $\color{#35bf28}+0.44\%$
test_step_mdp_speed[False-True-False-False-False] 52.5210μs 21.3244μs 46.8945 KOps/s 47.3296 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[False-False-True-True-True] 97.8620μs 56.2053μs 17.7919 KOps/s 17.5627 KOps/s $\color{#35bf28}+1.31\%$
test_step_mdp_speed[False-False-True-True-False] 71.3510μs 37.0344μs 27.0020 KOps/s 27.7586 KOps/s $\color{#d91a1a}-2.73\%$
test_step_mdp_speed[False-False-True-False-True] 76.7720μs 34.6056μs 28.8970 KOps/s 28.2265 KOps/s $\color{#35bf28}+2.38\%$
test_step_mdp_speed[False-False-True-False-False] 51.4810μs 21.1988μs 47.1725 KOps/s 46.8393 KOps/s $\color{#35bf28}+0.71\%$
test_step_mdp_speed[False-False-False-True-True] 92.6110μs 58.2528μs 17.1665 KOps/s 17.0782 KOps/s $\color{#35bf28}+0.52\%$
test_step_mdp_speed[False-False-False-True-False] 71.0220μs 38.6147μs 25.8969 KOps/s 25.9231 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[False-False-False-False-True] 81.4710μs 36.5097μs 27.3900 KOps/s 26.9986 KOps/s $\color{#35bf28}+1.45\%$
test_step_mdp_speed[False-False-False-False-False] 69.3720μs 23.7819μs 42.0487 KOps/s 43.7083 KOps/s $\color{#d91a1a}-3.80\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8763s 0.7766s 1.2877 Ops/s 1.2879 Ops/s $\color{#d91a1a}-0.02\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7340s 0.6397s 1.5632 Ops/s 1.5670 Ops/s $\color{#d91a1a}-0.24\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7690s 1.6896s 0.5919 Ops/s 0.5906 Ops/s $\color{#35bf28}+0.21\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5453s 1.4665s 0.6819 Ops/s 0.6799 Ops/s $\color{#35bf28}+0.30\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0254s 1.9361s 0.5165 Ops/s 0.5076 Ops/s $\color{#35bf28}+1.75\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.8112s 1.7277s 0.5788 Ops/s 0.5822 Ops/s $\color{#d91a1a}-0.59\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.8881s 4.7149s 0.2121 Ops/s 0.2139 Ops/s $\color{#d91a1a}-0.83\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6797s 4.5332s 0.2206 Ops/s 0.2238 Ops/s $\color{#d91a1a}-1.42\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0407s 1.9670s 0.5084 Ops/s 0.5022 Ops/s $\color{#35bf28}+1.23\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7725s 1.6753s 0.5969 Ops/s 0.5972 Ops/s $\color{#d91a1a}-0.05\%$
test_values[generalized_advantage_estimate-True-True] 22.1914ms 21.2384ms 47.0846 Ops/s 45.6630 Ops/s $\color{#35bf28}+3.11\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1283s 3.5013ms 285.6120 Ops/s 270.2041 Ops/s $\textbf{\color{#35bf28}+5.70\%}$
test_values[td0_return_estimate-False-False] 0.1105ms 85.5443μs 11.6899 KOps/s 11.1964 KOps/s $\color{#35bf28}+4.41\%$
test_values[td1_return_estimate-False-False] 51.6589ms 50.0778ms 19.9689 Ops/s 19.3251 Ops/s $\color{#35bf28}+3.33\%$
test_values[vec_td1_return_estimate-False-False] 1.3543ms 1.1129ms 898.5295 Ops/s 890.2228 Ops/s $\color{#35bf28}+0.93\%$
test_values[td_lambda_return_estimate-True-False] 85.6707ms 83.5948ms 11.9625 Ops/s 11.7899 Ops/s $\color{#35bf28}+1.46\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3521ms 1.1081ms 902.4116 Ops/s 894.9477 Ops/s $\color{#35bf28}+0.83\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.7317ms 21.5209ms 46.4665 Ops/s 45.4149 Ops/s $\color{#35bf28}+2.32\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0538ms 0.7806ms 1.2810 KOps/s 1.2644 KOps/s $\color{#35bf28}+1.31\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8114ms 0.7240ms 1.3813 KOps/s 1.4158 KOps/s $\color{#d91a1a}-2.43\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5924ms 1.5142ms 660.4085 Ops/s 658.2046 Ops/s $\color{#35bf28}+0.33\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8039ms 0.7411ms 1.3494 KOps/s 1.3755 KOps/s $\color{#d91a1a}-1.90\%$
test_dqn_speed[False-None] 1.6420ms 1.5415ms 648.7202 Ops/s 636.4202 Ops/s $\color{#35bf28}+1.93\%$
test_dqn_speed[False-backward] 2.2691ms 2.2020ms 454.1311 Ops/s 449.4755 Ops/s $\color{#35bf28}+1.04\%$
test_dqn_speed[True-None] 0.6485ms 0.5654ms 1.7688 KOps/s 1.7422 KOps/s $\color{#35bf28}+1.53\%$
test_dqn_speed[True-backward] 1.2638ms 1.2127ms 824.5810 Ops/s 898.7284 Ops/s $\textbf{\color{#d91a1a}-8.25\%}$
test_dqn_speed[reduce-overhead-None] 0.6370ms 0.5868ms 1.7042 KOps/s 1.5773 KOps/s $\textbf{\color{#35bf28}+8.04\%}$
test_ddpg_speed[False-None] 3.3053ms 2.9190ms 342.5867 Ops/s 339.0740 Ops/s $\color{#35bf28}+1.04\%$
test_ddpg_speed[False-backward] 4.8252ms 4.3364ms 230.6044 Ops/s 233.9040 Ops/s $\color{#d91a1a}-1.41\%$
test_ddpg_speed[True-None] 1.3772ms 1.3136ms 761.2903 Ops/s 745.3632 Ops/s $\color{#35bf28}+2.14\%$
test_ddpg_speed[True-backward] 2.4586ms 2.3754ms 420.9816 Ops/s 414.7822 Ops/s $\color{#35bf28}+1.49\%$
test_ddpg_speed[reduce-overhead-None] 1.4786ms 1.3642ms 733.0259 Ops/s 728.3992 Ops/s $\color{#35bf28}+0.64\%$
test_sac_speed[False-None] 9.2342ms 8.5486ms 116.9783 Ops/s 116.6252 Ops/s $\color{#35bf28}+0.30\%$
test_sac_speed[False-backward] 12.0005ms 11.5636ms 86.4786 Ops/s 85.2801 Ops/s $\color{#35bf28}+1.41\%$
test_sac_speed[True-None] 1.9990ms 1.8166ms 550.4659 Ops/s 542.6964 Ops/s $\color{#35bf28}+1.43\%$
test_sac_speed[True-backward] 3.8994ms 3.4929ms 286.2931 Ops/s 284.6208 Ops/s $\color{#35bf28}+0.59\%$
test_sac_speed[reduce-overhead-None] 20.5804ms 11.0437ms 90.5491 Ops/s 91.8493 Ops/s $\color{#d91a1a}-1.42\%$
test_redq_deprec_speed[False-None] 10.0237ms 9.4716ms 105.5783 Ops/s 104.0812 Ops/s $\color{#35bf28}+1.44\%$
test_redq_deprec_speed[False-backward] 13.2463ms 12.7013ms 78.7318 Ops/s 77.8087 Ops/s $\color{#35bf28}+1.19\%$
test_redq_deprec_speed[True-None] 2.6892ms 2.5602ms 390.5962 Ops/s 382.8610 Ops/s $\color{#35bf28}+2.02\%$
test_redq_deprec_speed[True-backward] 4.2819ms 4.1712ms 239.7402 Ops/s 227.6198 Ops/s $\textbf{\color{#35bf28}+5.32\%}$
test_redq_deprec_speed[reduce-overhead-None] 16.0802ms 9.9734ms 100.2668 Ops/s 101.3794 Ops/s $\color{#d91a1a}-1.10\%$
test_td3_speed[False-None] 8.5714ms 8.3786ms 119.3519 Ops/s 111.8803 Ops/s $\textbf{\color{#35bf28}+6.68\%}$
test_td3_speed[False-backward] 11.6677ms 11.0452ms 90.5368 Ops/s 88.8425 Ops/s $\color{#35bf28}+1.91\%$
test_td3_speed[True-None] 1.6756ms 1.6485ms 606.6283 Ops/s 571.3746 Ops/s $\textbf{\color{#35bf28}+6.17\%}$
test_td3_speed[True-backward] 3.4785ms 3.2891ms 304.0322 Ops/s 298.0984 Ops/s $\color{#35bf28}+1.99\%$
test_td3_speed[reduce-overhead-None] 73.5148ms 25.1513ms 39.7594 Ops/s 40.0975 Ops/s $\color{#d91a1a}-0.84\%$
test_cql_speed[False-None] 17.7769ms 17.5388ms 57.0165 Ops/s 56.2290 Ops/s $\color{#35bf28}+1.40\%$
test_cql_speed[False-backward] 23.7602ms 23.3008ms 42.9170 Ops/s 42.9228 Ops/s $\color{#d91a1a}-0.01\%$
test_cql_speed[True-None] 3.6497ms 3.2927ms 303.6989 Ops/s 301.6390 Ops/s $\color{#35bf28}+0.68\%$
test_cql_speed[True-backward] 5.8584ms 5.4293ms 184.1865 Ops/s 180.9975 Ops/s $\color{#35bf28}+1.76\%$
test_cql_speed[reduce-overhead-None] 18.9862ms 11.8801ms 84.1746 Ops/s 84.2162 Ops/s $\color{#d91a1a}-0.05\%$
test_a2c_speed[False-None] 3.9885ms 3.2987ms 303.1537 Ops/s 300.2236 Ops/s $\color{#35bf28}+0.98\%$
test_a2c_speed[False-backward] 6.7217ms 6.2681ms 159.5382 Ops/s 156.4950 Ops/s $\color{#35bf28}+1.94\%$
test_a2c_speed[True-None] 1.4652ms 1.3136ms 761.2415 Ops/s 735.8464 Ops/s $\color{#35bf28}+3.45\%$
test_a2c_speed[True-backward] 3.0568ms 2.9714ms 336.5443 Ops/s 317.2283 Ops/s $\textbf{\color{#35bf28}+6.09\%}$
test_a2c_speed[reduce-overhead-None] 1.0945ms 0.9891ms 1.0110 KOps/s 1.0003 KOps/s $\color{#35bf28}+1.07\%$
test_ppo_speed[False-None] 4.3294ms 3.9200ms 255.1014 Ops/s 244.5701 Ops/s $\color{#35bf28}+4.31\%$
test_ppo_speed[False-backward] 7.5267ms 7.0918ms 141.0087 Ops/s 133.3757 Ops/s $\textbf{\color{#35bf28}+5.72\%}$
test_ppo_speed[True-None] 1.6885ms 1.4375ms 695.6628 Ops/s 690.6939 Ops/s $\color{#35bf28}+0.72\%$
test_ppo_speed[True-backward] 3.1949ms 3.0977ms 322.8225 Ops/s 301.1901 Ops/s $\textbf{\color{#35bf28}+7.18\%}$
test_ppo_speed[reduce-overhead-None] 1.1312ms 1.0471ms 955.0085 Ops/s 917.8942 Ops/s $\color{#35bf28}+4.04\%$
test_reinforce_speed[False-None] 2.5264ms 2.3114ms 432.6355 Ops/s 426.1655 Ops/s $\color{#35bf28}+1.52\%$
test_reinforce_speed[False-backward] 3.5510ms 3.4508ms 289.7920 Ops/s 283.2611 Ops/s $\color{#35bf28}+2.31\%$
test_reinforce_speed[True-None] 1.5508ms 1.2971ms 770.9449 Ops/s 775.9069 Ops/s $\color{#d91a1a}-0.64\%$
test_reinforce_speed[True-backward] 3.0801ms 2.9867ms 334.8182 Ops/s 318.4747 Ops/s $\textbf{\color{#35bf28}+5.13\%}$
test_reinforce_speed[reduce-overhead-None] 0.4502s 10.4194ms 95.9751 Ops/s 104.3718 Ops/s $\textbf{\color{#d91a1a}-8.05\%}$
test_iql_speed[False-None] 10.0385ms 9.5650ms 104.5473 Ops/s 103.0395 Ops/s $\color{#35bf28}+1.46\%$
test_iql_speed[False-backward] 13.9201ms 13.4059ms 74.5942 Ops/s 73.8393 Ops/s $\color{#35bf28}+1.02\%$
test_iql_speed[True-None] 2.2680ms 2.1776ms 459.2167 Ops/s 448.9127 Ops/s $\color{#35bf28}+2.30\%$
test_iql_speed[True-backward] 4.9515ms 4.7453ms 210.7339 Ops/s 204.1825 Ops/s $\color{#35bf28}+3.21\%$
test_iql_speed[reduce-overhead-None] 18.0570ms 10.7202ms 93.2815 Ops/s 95.9188 Ops/s $\color{#d91a1a}-2.75\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1129ms 6.0060ms 166.4991 Ops/s 165.8804 Ops/s $\color{#35bf28}+0.37\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8051ms 0.2848ms 3.5107 KOps/s 2.6315 KOps/s $\textbf{\color{#35bf28}+33.41\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5688ms 0.2666ms 3.7503 KOps/s 2.7705 KOps/s $\textbf{\color{#35bf28}+35.36\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1364ms 5.9121ms 169.1445 Ops/s 169.3323 Ops/s $\color{#d91a1a}-0.11\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8175ms 0.3014ms 3.3176 KOps/s 2.7068 KOps/s $\textbf{\color{#35bf28}+22.56\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6176ms 0.3046ms 3.2826 KOps/s 2.8157 KOps/s $\textbf{\color{#35bf28}+16.58\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7235ms 1.2733ms 785.3732 Ops/s 687.7150 Ops/s $\textbf{\color{#35bf28}+14.20\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4428ms 1.1973ms 835.2430 Ops/s 724.9821 Ops/s $\textbf{\color{#35bf28}+15.21\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.3213ms 6.0827ms 164.4000 Ops/s 167.1202 Ops/s $\color{#d91a1a}-1.63\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.5148ms 0.4868ms 2.0541 KOps/s 2.2899 KOps/s $\textbf{\color{#d91a1a}-10.30\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7113ms 0.4227ms 2.3659 KOps/s 2.3909 KOps/s $\color{#d91a1a}-1.05\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0714ms 5.9590ms 167.8127 Ops/s 170.9185 Ops/s $\color{#d91a1a}-1.82\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.7943ms 0.3819ms 2.6186 KOps/s 3.5444 KOps/s $\textbf{\color{#d91a1a}-26.12\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6344ms 0.3629ms 2.7559 KOps/s 3.7460 KOps/s $\textbf{\color{#d91a1a}-26.43\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1203ms 5.8518ms 170.8874 Ops/s 171.9424 Ops/s $\color{#d91a1a}-0.61\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9496ms 0.3734ms 2.6779 KOps/s 2.8049 KOps/s $\color{#d91a1a}-4.53\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4989ms 0.3205ms 3.1200 KOps/s 2.9113 KOps/s $\textbf{\color{#35bf28}+7.17\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2326ms 6.1052ms 163.7952 Ops/s 165.4104 Ops/s $\color{#d91a1a}-0.98\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2664ms 0.5172ms 1.9336 KOps/s 627.3422 Ops/s $\textbf{\color{#35bf28}+208.23\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7473ms 0.5045ms 1.9821 KOps/s 2.3817 KOps/s $\textbf{\color{#d91a1a}-16.78\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.6656s 18.3585ms 54.4705 Ops/s 192.7664 Ops/s $\textbf{\color{#d91a1a}-71.74\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.0196ms 1.8219ms 548.8892 Ops/s 497.4540 Ops/s $\textbf{\color{#35bf28}+10.34\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.0469ms 0.9140ms 1.0941 KOps/s 1.0754 KOps/s $\color{#35bf28}+1.73\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 9.2167ms 5.1826ms 192.9527 Ops/s 192.8816 Ops/s $\color{#35bf28}+0.04\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.7963ms 1.9917ms 502.0780 Ops/s 513.9235 Ops/s $\color{#d91a1a}-2.30\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.2657ms 0.9115ms 1.0971 KOps/s 741.7503 Ops/s $\textbf{\color{#35bf28}+47.91\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5792s 16.7899ms 59.5597 Ops/s 50.8846 Ops/s $\textbf{\color{#35bf28}+17.05\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.2832ms 2.0095ms 497.6243 Ops/s 504.0210 Ops/s $\color{#d91a1a}-1.27\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.9274ms 1.1244ms 889.3426 Ops/s 892.9563 Ops/s $\color{#d91a1a}-0.40\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.9394ms 36.0610ms 27.7308 Ops/s 27.3604 Ops/s $\color{#35bf28}+1.35\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.1208ms 18.3623ms 54.4593 Ops/s 54.0588 Ops/s $\color{#35bf28}+0.74\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.2862ms 37.3324ms 26.7864 Ops/s 26.3511 Ops/s $\color{#35bf28}+1.65\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.2669ms 18.6919ms 53.4992 Ops/s 51.9671 Ops/s $\color{#35bf28}+2.95\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.7337ms 39.4212ms 25.3671 Ops/s 25.0070 Ops/s $\color{#35bf28}+1.44\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.8546ms 20.4227ms 48.9651 Ops/s 49.5362 Ops/s $\color{#d91a1a}-1.15\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8586ms 0.2227ms 4.4896 KOps/s 4.5810 KOps/s $\color{#d91a1a}-1.99\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.8139ms 1.4385ms 695.1558 Ops/s 714.8760 Ops/s $\color{#d91a1a}-2.76\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.5263ms 2.3119ms 432.5431 Ops/s 437.5922 Ops/s $\color{#d91a1a}-1.15\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0938ms 2.9177ms 342.7402 Ops/s 342.7706 Ops/s $-0.01\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2482ms 0.1541ms 6.4878 KOps/s 6.6900 KOps/s $\color{#d91a1a}-3.02\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3569ms 0.2059ms 4.8574 KOps/s 4.6955 KOps/s $\color{#35bf28}+3.45\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.0537ms 1.8174ms 550.2234 Ops/s 561.7846 Ops/s $\color{#d91a1a}-2.06\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5143ms 1.3622ms 734.0836 Ops/s 763.3030 Ops/s $\color{#d91a1a}-3.83\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2487ms 1.1515ms 868.4555 Ops/s 877.0541 Ops/s $\color{#d91a1a}-0.98\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8873ms 3.6153ms 276.5990 Ops/s 268.4215 Ops/s $\color{#35bf28}+3.05\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.2984ms 5.7873ms 172.7924 Ops/s 171.3414 Ops/s $\color{#35bf28}+0.85\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.7509ms 7.0350ms 142.1469 Ops/s 134.4644 Ops/s $\textbf{\color{#35bf28}+5.71\%}$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4525ms 0.2815ms 3.5528 KOps/s 3.6312 KOps/s $\color{#d91a1a}-2.16\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7354ms 1.5576ms 642.0316 Ops/s 666.2540 Ops/s $\color{#d91a1a}-3.64\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6365ms 2.4470ms 408.6719 Ops/s 417.6131 Ops/s $\color{#d91a1a}-2.14\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4625ms 3.1226ms 320.2463 Ops/s 317.3738 Ops/s $\color{#35bf28}+0.91\%$
test_collector_without_rb[100-img_shape0-atari] 35.1010ms 34.5319ms 28.9588 Ops/s 28.6659 Ops/s $\color{#35bf28}+1.02\%$
test_collector_without_rb[200-img_shape1-large_batch] 67.8287ms 67.5833ms 14.7966 Ops/s 14.6854 Ops/s $\color{#35bf28}+0.76\%$
test_collector_with_rb[100-img_shape0-atari] 39.5291ms 38.9741ms 25.6580 Ops/s 25.5277 Ops/s $\color{#35bf28}+0.51\%$
test_collector_with_rb[200-img_shape1-large_batch] 77.4650ms 75.8487ms 13.1841 Ops/s 12.8671 Ops/s $\color{#35bf28}+2.46\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 57.3388ms 57.2381ms 17.4709 Ops/s 16.9012 Ops/s $\color{#35bf28}+3.37\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1145s 0.1141s 8.7608 Ops/s 8.6620 Ops/s $\color{#35bf28}+1.14\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 60.1961ms 59.3391ms 16.8523 Ops/s 16.6945 Ops/s $\color{#35bf28}+0.95\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1190s 0.1181s 8.4686 Ops/s 8.3958 Ops/s $\color{#35bf28}+0.87\%$

@vmoens vmoens merged commit 1415062 into main Feb 6, 2026
137 of 140 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. sota-implementations/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant