Skip to content

[Feature] AsyncBatchedCollector: backend params and performance optimizations#3511

Merged
vmoens merged 4 commits intogh/vmoens/242/basefrom
gh/vmoens/242/head
Feb 21, 2026
Merged

[Feature] AsyncBatchedCollector: backend params and performance optimizations#3511
vmoens merged 4 commits intogh/vmoens/242/basefrom
gh/vmoens/242/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Feb 16, 2026

Stack from ghstack (oldest at bottom):


  • Three-tier backend system: backend (global default), env_backend
    (env pool override), policy_backend (transport override), mirroring
    the device parameter pattern.
  • Lock-free SlotTransport: per-env slots with no shared lock, replacing
    ThreadingTransport as the default for in-process threading.
  • min_batch_size parameter for InferenceServer to accumulate requests.
  • Batch drain from result queue (get_nowait after first blocking get).
  • Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor cursoragent@cursor.com

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 16, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3511

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 2d201bf with merge base 266e4aa (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Feb 16, 2026
…izations

- Three-tier backend system: `backend` (global default), `env_backend`
  (env pool override), `policy_backend` (transport override), mirroring
  the device parameter pattern.
- Lock-free SlotTransport: per-env slots with no shared lock, replacing
  ThreadingTransport as the default for in-process threading.
- min_batch_size parameter for InferenceServer to accumulate requests.
- Batch drain from result queue (get_nowait after first blocking get).
- Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 58cc17b
Pull-Request: #3511
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 16, 2026
@github-actions github-actions bot added the Feature New feature label Feb 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 16, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.4887μs 80.0912μs 12.4858 KOps/s 12.4521 KOps/s $\color{#35bf28}+0.27\%$
test_tensor_to_bytestream_speed[torch.save] 0.1391ms 0.1384ms 7.2254 KOps/s 7.1783 KOps/s $\color{#35bf28}+0.66\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1116s 0.1108s 9.0217 Ops/s 9.1212 Ops/s $\color{#d91a1a}-1.09\%$
test_tensor_to_bytestream_speed[numpy] 2.5963μs 2.5892μs 386.2164 KOps/s 391.3210 KOps/s $\color{#d91a1a}-1.30\%$
test_tensor_to_bytestream_speed[safetensors] 36.7086μs 36.4363μs 27.4452 KOps/s 27.5417 KOps/s $\color{#d91a1a}-0.35\%$
test_simple 0.5425s 0.5398s 1.8525 Ops/s 1.7736 Ops/s $\color{#35bf28}+4.44\%$
test_transformed 1.0735s 1.0722s 0.9327 Ops/s 0.9085 Ops/s $\color{#35bf28}+2.66\%$
test_serial 1.6558s 1.6533s 0.6049 Ops/s 0.5920 Ops/s $\color{#35bf28}+2.17\%$
test_parallel 1.1346s 1.0360s 0.9653 Ops/s 0.9256 Ops/s $\color{#35bf28}+4.29\%$
test_step_mdp_speed[True-True-True-True-True] 0.2040ms 41.8115μs 23.9169 KOps/s 23.7501 KOps/s $\color{#35bf28}+0.70\%$
test_step_mdp_speed[True-True-True-True-False] 46.7910μs 23.7276μs 42.1450 KOps/s 42.1206 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[True-True-True-False-True] 97.2120μs 23.8172μs 41.9865 KOps/s 42.3279 KOps/s $\color{#d91a1a}-0.81\%$
test_step_mdp_speed[True-True-True-False-False] 46.0900μs 12.7766μs 78.2681 KOps/s 76.6442 KOps/s $\color{#35bf28}+2.12\%$
test_step_mdp_speed[True-True-False-True-True] 72.9810μs 44.8064μs 22.3183 KOps/s 21.9285 KOps/s $\color{#35bf28}+1.78\%$
test_step_mdp_speed[True-True-False-True-False] 66.0310μs 26.2109μs 38.1520 KOps/s 38.1355 KOps/s $\color{#35bf28}+0.04\%$
test_step_mdp_speed[True-True-False-False-True] 55.0310μs 26.3126μs 38.0046 KOps/s 38.0718 KOps/s $\color{#d91a1a}-0.18\%$
test_step_mdp_speed[True-True-False-False-False] 52.8510μs 15.7355μs 63.5504 KOps/s 63.3848 KOps/s $\color{#35bf28}+0.26\%$
test_step_mdp_speed[True-False-True-True-True] 97.8720μs 47.4159μs 21.0900 KOps/s 20.6241 KOps/s $\color{#35bf28}+2.26\%$
test_step_mdp_speed[True-False-True-True-False] 55.3410μs 28.8751μs 34.6319 KOps/s 34.5649 KOps/s $\color{#35bf28}+0.19\%$
test_step_mdp_speed[True-False-True-False-True] 57.0510μs 25.9497μs 38.5361 KOps/s 38.0268 KOps/s $\color{#35bf28}+1.34\%$
test_step_mdp_speed[True-False-True-False-False] 55.6610μs 15.5646μs 64.2485 KOps/s 63.0624 KOps/s $\color{#35bf28}+1.88\%$
test_step_mdp_speed[True-False-False-True-True] 0.1037ms 50.3166μs 19.8742 KOps/s 19.9435 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[True-False-False-True-False] 69.2910μs 30.8664μs 32.3977 KOps/s 31.7892 KOps/s $\color{#35bf28}+1.91\%$
test_step_mdp_speed[True-False-False-False-True] 64.6510μs 28.4550μs 35.1433 KOps/s 35.0585 KOps/s $\color{#35bf28}+0.24\%$
test_step_mdp_speed[True-False-False-False-False] 48.5200μs 18.3010μs 54.6419 KOps/s 54.3177 KOps/s $\color{#35bf28}+0.60\%$
test_step_mdp_speed[False-True-True-True-True] 81.9510μs 47.5646μs 21.0241 KOps/s 20.8614 KOps/s $\color{#35bf28}+0.78\%$
test_step_mdp_speed[False-True-True-True-False] 76.8410μs 28.4597μs 35.1375 KOps/s 34.5828 KOps/s $\color{#35bf28}+1.60\%$
test_step_mdp_speed[False-True-True-False-True] 2.5880ms 30.0594μs 33.2674 KOps/s 33.1331 KOps/s $\color{#35bf28}+0.41\%$
test_step_mdp_speed[False-True-True-False-False] 62.4410μs 17.3667μs 57.5814 KOps/s 57.6287 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[False-True-False-True-True] 79.1610μs 49.7734μs 20.0910 KOps/s 19.8530 KOps/s $\color{#35bf28}+1.20\%$
test_step_mdp_speed[False-True-False-True-False] 62.1410μs 31.3626μs 31.8851 KOps/s 31.6547 KOps/s $\color{#35bf28}+0.73\%$
test_step_mdp_speed[False-True-False-False-True] 63.8510μs 32.1552μs 31.0992 KOps/s 30.3718 KOps/s $\color{#35bf28}+2.39\%$
test_step_mdp_speed[False-True-False-False-False] 45.2210μs 19.6924μs 50.7811 KOps/s 49.9521 KOps/s $\color{#35bf28}+1.66\%$
test_step_mdp_speed[False-False-True-True-True] 87.7220μs 52.2781μs 19.1285 KOps/s 18.5772 KOps/s $\color{#35bf28}+2.97\%$
test_step_mdp_speed[False-False-True-True-False] 91.9220μs 33.7636μs 29.6177 KOps/s 28.9903 KOps/s $\color{#35bf28}+2.16\%$
test_step_mdp_speed[False-False-True-False-True] 0.1161ms 31.1497μs 32.1030 KOps/s 30.7233 KOps/s $\color{#35bf28}+4.49\%$
test_step_mdp_speed[False-False-True-False-False] 83.2210μs 19.6691μs 50.8411 KOps/s 49.9596 KOps/s $\color{#35bf28}+1.76\%$
test_step_mdp_speed[False-False-False-True-True] 92.5310μs 54.1941μs 18.4522 KOps/s 17.7258 KOps/s $\color{#35bf28}+4.10\%$
test_step_mdp_speed[False-False-False-True-False] 78.5410μs 36.1425μs 27.6682 KOps/s 27.6091 KOps/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[False-False-False-False-True] 64.9910μs 33.7715μs 29.6108 KOps/s 28.6883 KOps/s $\color{#35bf28}+3.22\%$
test_step_mdp_speed[False-False-False-False-False] 49.1310μs 22.1240μs 45.1997 KOps/s 44.4110 KOps/s $\color{#35bf28}+1.78\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7127s 0.7092s 1.4100 Ops/s 1.3536 Ops/s $\color{#35bf28}+4.17\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6994s 0.6032s 1.6580 Ops/s 1.6521 Ops/s $\color{#35bf28}+0.36\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6993s 1.6233s 0.6160 Ops/s 0.6128 Ops/s $\color{#35bf28}+0.53\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4780s 1.3990s 0.7148 Ops/s 0.7097 Ops/s $\color{#35bf28}+0.72\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9452s 1.8659s 0.5359 Ops/s 0.5323 Ops/s $\color{#35bf28}+0.67\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7220s 1.6432s 0.6086 Ops/s 0.6044 Ops/s $\color{#35bf28}+0.70\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6730s 4.5879s 0.2180 Ops/s 0.2166 Ops/s $\color{#35bf28}+0.64\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4721s 4.3976s 0.2274 Ops/s 0.2251 Ops/s $\color{#35bf28}+1.00\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9270s 1.8478s 0.5412 Ops/s 0.5340 Ops/s $\color{#35bf28}+1.35\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6702s 1.5673s 0.6380 Ops/s 0.6164 Ops/s $\color{#35bf28}+3.51\%$
test_values[generalized_advantage_estimate-True-True] 10.7229ms 10.5249ms 95.0124 Ops/s 96.2354 Ops/s $\color{#d91a1a}-1.27\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.5083ms 17.6351ms 56.7050 Ops/s 78.2648 Ops/s $\textbf{\color{#d91a1a}-27.55\%}$
test_values[td0_return_estimate-False-False] 0.2308ms 0.1329ms 7.5234 KOps/s 7.8271 KOps/s $\color{#d91a1a}-3.88\%$
test_values[td1_return_estimate-False-False] 29.9733ms 28.5899ms 34.9774 Ops/s 35.1031 Ops/s $\color{#d91a1a}-0.36\%$
test_values[vec_td1_return_estimate-False-False] 18.2925ms 17.7238ms 56.4212 Ops/s 74.8881 Ops/s $\textbf{\color{#d91a1a}-24.66\%}$
test_values[td_lambda_return_estimate-True-False] 42.6604ms 42.2242ms 23.6831 Ops/s 23.7610 Ops/s $\color{#d91a1a}-0.33\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.5412ms 17.7248ms 56.4180 Ops/s 64.4234 Ops/s $\textbf{\color{#d91a1a}-12.43\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.3434ms 9.2873ms 107.6735 Ops/s 108.6851 Ops/s $\color{#d91a1a}-0.93\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7433ms 1.5231ms 656.5398 Ops/s 650.0677 Ops/s $\color{#35bf28}+1.00\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5094ms 0.4198ms 2.3820 KOps/s 2.3520 KOps/s $\color{#35bf28}+1.28\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.2686ms 34.9812ms 28.5868 Ops/s 33.7075 Ops/s $\textbf{\color{#d91a1a}-15.19\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.9514ms 1.7240ms 580.0514 Ops/s 575.3822 Ops/s $\color{#35bf28}+0.81\%$
test_dqn_speed[False-None] 1.4871ms 1.3941ms 717.2830 Ops/s 716.9126 Ops/s $\color{#35bf28}+0.05\%$
test_dqn_speed[False-backward] 1.9467ms 1.9055ms 524.7834 Ops/s 524.5762 Ops/s $\color{#35bf28}+0.04\%$
test_dqn_speed[True-None] 0.6464ms 0.5513ms 1.8138 KOps/s 1.8292 KOps/s $\color{#d91a1a}-0.84\%$
test_dqn_speed[True-backward] 1.0600ms 1.0082ms 991.8372 Ops/s 987.5591 Ops/s $\color{#35bf28}+0.43\%$
test_dqn_speed[reduce-overhead-None] 0.9270ms 0.5418ms 1.8458 KOps/s 1.7838 KOps/s $\color{#35bf28}+3.48\%$
test_ddpg_speed[False-None] 3.2041ms 2.8480ms 351.1178 Ops/s 352.5018 Ops/s $\color{#d91a1a}-0.39\%$
test_ddpg_speed[False-backward] 4.2128ms 4.0660ms 245.9409 Ops/s 245.9924 Ops/s $\color{#d91a1a}-0.02\%$
test_ddpg_speed[True-None] 1.4806ms 1.4045ms 711.9819 Ops/s 703.6782 Ops/s $\color{#35bf28}+1.18\%$
test_ddpg_speed[True-backward] 2.4628ms 2.3916ms 418.1219 Ops/s 355.6053 Ops/s $\textbf{\color{#35bf28}+17.58\%}$
test_ddpg_speed[reduce-overhead-None] 1.5432ms 1.3952ms 716.7507 Ops/s 720.1714 Ops/s $\color{#d91a1a}-0.47\%$
test_sac_speed[False-None] 8.5198ms 7.9458ms 125.8519 Ops/s 126.6381 Ops/s $\color{#d91a1a}-0.62\%$
test_sac_speed[False-backward] 11.6746ms 11.2071ms 89.2295 Ops/s 89.8875 Ops/s $\color{#d91a1a}-0.73\%$
test_sac_speed[True-None] 2.3060ms 2.1376ms 467.8149 Ops/s 464.6589 Ops/s $\color{#35bf28}+0.68\%$
test_sac_speed[True-backward] 4.1849ms 3.9643ms 252.2499 Ops/s 245.1878 Ops/s $\color{#35bf28}+2.88\%$
test_sac_speed[reduce-overhead-None] 2.4458ms 2.1143ms 472.9626 Ops/s 456.6087 Ops/s $\color{#35bf28}+3.58\%$
test_redq_speed[False-None] 15.0428ms 10.4577ms 95.6229 Ops/s 95.3684 Ops/s $\color{#35bf28}+0.27\%$
test_redq_speed[False-backward] 18.3060ms 17.7250ms 56.4176 Ops/s 56.1217 Ops/s $\color{#35bf28}+0.53\%$
test_redq_speed[True-None] 4.5483ms 4.4056ms 226.9857 Ops/s 237.9239 Ops/s $\color{#d91a1a}-4.60\%$
test_redq_speed[True-backward] 9.9195ms 9.6345ms 103.7937 Ops/s 107.2757 Ops/s $\color{#d91a1a}-3.25\%$
test_redq_speed[reduce-overhead-None] 4.4808ms 4.3306ms 230.9174 Ops/s 243.7059 Ops/s $\textbf{\color{#d91a1a}-5.25\%}$
test_redq_deprec_speed[False-None] 11.6098ms 11.0787ms 90.2635 Ops/s 89.8760 Ops/s $\color{#35bf28}+0.43\%$
test_redq_deprec_speed[False-backward] 16.6046ms 16.1301ms 61.9959 Ops/s 60.8019 Ops/s $\color{#35bf28}+1.96\%$
test_redq_deprec_speed[True-None] 4.0568ms 3.6360ms 275.0251 Ops/s 274.0763 Ops/s $\color{#35bf28}+0.35\%$
test_redq_deprec_speed[True-backward] 7.9247ms 7.7248ms 129.4524 Ops/s 115.8814 Ops/s $\textbf{\color{#35bf28}+11.71\%}$
test_redq_deprec_speed[reduce-overhead-None] 4.0282ms 3.5914ms 278.4460 Ops/s 278.5059 Ops/s $\color{#d91a1a}-0.02\%$
test_td3_speed[False-None] 8.1196ms 7.9944ms 125.0875 Ops/s 124.8329 Ops/s $\color{#35bf28}+0.20\%$
test_td3_speed[False-backward] 11.3241ms 10.8788ms 91.9223 Ops/s 92.2925 Ops/s $\color{#d91a1a}-0.40\%$
test_td3_speed[True-None] 1.8566ms 1.8093ms 552.7065 Ops/s 551.3814 Ops/s $\color{#35bf28}+0.24\%$
test_td3_speed[True-backward] 3.9934ms 3.6222ms 276.0784 Ops/s 254.8513 Ops/s $\textbf{\color{#35bf28}+8.33\%}$
test_td3_speed[reduce-overhead-None] 1.8287ms 1.7848ms 560.2875 Ops/s 561.6521 Ops/s $\color{#d91a1a}-0.24\%$
test_cql_speed[False-None] 29.4902ms 26.0911ms 38.3272 Ops/s 38.7013 Ops/s $\color{#d91a1a}-0.97\%$
test_cql_speed[False-backward] 39.4139ms 35.9790ms 27.7940 Ops/s 28.4227 Ops/s $\color{#d91a1a}-2.21\%$
test_cql_speed[True-None] 12.7015ms 12.1362ms 82.3982 Ops/s 80.4669 Ops/s $\color{#35bf28}+2.40\%$
test_cql_speed[True-backward] 18.5491ms 17.8288ms 56.0891 Ops/s 56.9261 Ops/s $\color{#d91a1a}-1.47\%$
test_cql_speed[reduce-overhead-None] 15.1833ms 12.3517ms 80.9604 Ops/s 80.8646 Ops/s $\color{#35bf28}+0.12\%$
test_a2c_speed[False-None] 5.5082ms 5.3575ms 186.6549 Ops/s 183.8003 Ops/s $\color{#35bf28}+1.55\%$
test_a2c_speed[False-backward] 12.3257ms 11.8938ms 84.0773 Ops/s 83.8273 Ops/s $\color{#35bf28}+0.30\%$
test_a2c_speed[True-None] 4.0843ms 3.7084ms 269.6569 Ops/s 264.4769 Ops/s $\color{#35bf28}+1.96\%$
test_a2c_speed[True-backward] 8.8186ms 8.4721ms 118.0344 Ops/s 116.0167 Ops/s $\color{#35bf28}+1.74\%$
test_a2c_speed[reduce-overhead-None] 3.9794ms 3.7011ms 270.1902 Ops/s 270.7680 Ops/s $\color{#d91a1a}-0.21\%$
test_ppo_speed[False-None] 6.2667ms 5.9166ms 169.0147 Ops/s 169.1419 Ops/s $\color{#d91a1a}-0.08\%$
test_ppo_speed[False-backward] 12.6894ms 12.3532ms 80.9509 Ops/s 79.9254 Ops/s $\color{#35bf28}+1.28\%$
test_ppo_speed[True-None] 3.7661ms 3.6118ms 276.8725 Ops/s 274.1521 Ops/s $\color{#35bf28}+0.99\%$
test_ppo_speed[True-backward] 9.0317ms 8.3461ms 119.8161 Ops/s 117.4307 Ops/s $\color{#35bf28}+2.03\%$
test_ppo_speed[reduce-overhead-None] 4.0335ms 3.6041ms 277.4598 Ops/s 270.6630 Ops/s $\color{#35bf28}+2.51\%$
test_reinforce_speed[False-None] 5.0344ms 4.5904ms 217.8440 Ops/s 218.1519 Ops/s $\color{#d91a1a}-0.14\%$
test_reinforce_speed[False-backward] 7.6499ms 7.4038ms 135.0660 Ops/s 136.0527 Ops/s $\color{#d91a1a}-0.73\%$
test_reinforce_speed[True-None] 3.0582ms 2.8620ms 349.4083 Ops/s 330.1753 Ops/s $\textbf{\color{#35bf28}+5.83\%}$
test_reinforce_speed[True-backward] 7.9805ms 7.7048ms 129.7898 Ops/s 123.9434 Ops/s $\color{#35bf28}+4.72\%$
test_reinforce_speed[reduce-overhead-None] 3.3683ms 2.8564ms 350.0879 Ops/s 349.0029 Ops/s $\color{#35bf28}+0.31\%$
test_iql_speed[False-None] 25.4731ms 20.3115ms 49.2331 Ops/s 49.0965 Ops/s $\color{#35bf28}+0.28\%$
test_iql_speed[False-backward] 35.2636ms 30.7617ms 32.5079 Ops/s 32.4706 Ops/s $\color{#35bf28}+0.11\%$
test_iql_speed[True-None] 8.5855ms 8.3472ms 119.8003 Ops/s 117.4860 Ops/s $\color{#35bf28}+1.97\%$
test_iql_speed[True-backward] 17.0646ms 16.5746ms 60.3333 Ops/s 59.6273 Ops/s $\color{#35bf28}+1.18\%$
test_iql_speed[reduce-overhead-None] 8.6893ms 8.4309ms 118.6113 Ops/s 116.8362 Ops/s $\color{#35bf28}+1.52\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1739ms 6.0545ms 165.1674 Ops/s 165.4229 Ops/s $\color{#d91a1a}-0.15\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.7387ms 0.3442ms 2.9052 KOps/s 3.4956 KOps/s $\textbf{\color{#d91a1a}-16.89\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6075ms 0.3151ms 3.1739 KOps/s 3.7525 KOps/s $\textbf{\color{#d91a1a}-15.42\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0149ms 5.7621ms 173.5488 Ops/s 173.4713 Ops/s $\color{#35bf28}+0.04\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8726ms 0.3714ms 2.6924 KOps/s 3.5745 KOps/s $\textbf{\color{#d91a1a}-24.68\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6057ms 0.3599ms 2.7785 KOps/s 3.7979 KOps/s $\textbf{\color{#d91a1a}-26.84\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7399ms 1.4415ms 693.7309 Ops/s 796.7032 Ops/s $\textbf{\color{#d91a1a}-12.92\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6326ms 1.3655ms 732.3529 Ops/s 844.7166 Ops/s $\textbf{\color{#d91a1a}-13.30\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.9630ms 6.0246ms 165.9848 Ops/s 169.5761 Ops/s $\color{#d91a1a}-2.12\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.6367ms 0.4504ms 2.2205 KOps/s 2.0186 KOps/s $\textbf{\color{#35bf28}+10.00\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8998ms 0.4517ms 2.2136 KOps/s 2.0330 KOps/s $\textbf{\color{#35bf28}+8.88\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8469ms 5.7423ms 174.1455 Ops/s 171.2942 Ops/s $\color{#35bf28}+1.66\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0195ms 0.3158ms 3.1663 KOps/s 3.4760 KOps/s $\textbf{\color{#d91a1a}-8.91\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4570ms 0.2670ms 3.7453 KOps/s 3.1936 KOps/s $\textbf{\color{#35bf28}+17.27\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9886ms 5.7536ms 173.8034 Ops/s 173.0401 Ops/s $\color{#35bf28}+0.44\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8195ms 0.2832ms 3.5316 KOps/s 2.8860 KOps/s $\textbf{\color{#35bf28}+22.37\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4690ms 0.2623ms 3.8118 KOps/s 2.8850 KOps/s $\textbf{\color{#35bf28}+32.12\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1497ms 5.9229ms 168.8364 Ops/s 168.9743 Ops/s $\color{#d91a1a}-0.08\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1619ms 0.4745ms 2.1074 KOps/s 2.0773 KOps/s $\color{#35bf28}+1.45\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6931ms 0.4555ms 2.1955 KOps/s 1.9783 KOps/s $\textbf{\color{#35bf28}+10.98\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3848ms 4.9537ms 201.8694 Ops/s 199.6974 Ops/s $\color{#35bf28}+1.09\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.3110ms 2.1168ms 472.4045 Ops/s 489.9363 Ops/s $\color{#d91a1a}-3.58\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 0.9939ms 0.8701ms 1.1493 KOps/s 1.0723 KOps/s $\textbf{\color{#35bf28}+7.18\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5303s 15.5511ms 64.3039 Ops/s 58.9567 Ops/s $\textbf{\color{#35bf28}+9.07\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 12.6084ms 1.9513ms 512.4693 Ops/s 511.6013 Ops/s $\color{#35bf28}+0.17\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.1174ms 1.1775ms 849.2515 Ops/s 843.8040 Ops/s $\color{#35bf28}+0.65\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.5893ms 5.1133ms 195.5669 Ops/s 193.3979 Ops/s $\color{#35bf28}+1.12\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 6.0784ms 1.9324ms 517.4976 Ops/s 517.6228 Ops/s $\color{#d91a1a}-0.02\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.3830ms 1.0455ms 956.4756 Ops/s 960.4800 Ops/s $\color{#d91a1a}-0.42\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.7015ms 35.9430ms 27.8219 Ops/s 27.7796 Ops/s $\color{#35bf28}+0.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.5400ms 17.9946ms 55.5724 Ops/s 55.1316 Ops/s $\color{#35bf28}+0.80\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.4308ms 37.1551ms 26.9142 Ops/s 26.9956 Ops/s $\color{#d91a1a}-0.30\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.8651ms 18.4100ms 54.3184 Ops/s 54.6840 Ops/s $\color{#d91a1a}-0.67\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.6812ms 38.7405ms 25.8128 Ops/s 25.8112 Ops/s $+0.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.4355ms 19.9220ms 50.1957 Ops/s 50.7159 Ops/s $\color{#d91a1a}-1.03\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8202ms 0.2173ms 4.6014 KOps/s 4.4014 KOps/s $\color{#35bf28}+4.54\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6632ms 1.3821ms 723.5252 Ops/s 709.6916 Ops/s $\color{#35bf28}+1.95\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.6954ms 2.3118ms 432.5649 Ops/s 430.2753 Ops/s $\color{#35bf28}+0.53\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0764ms 2.9265ms 341.7023 Ops/s 340.7706 Ops/s $\color{#35bf28}+0.27\%$
test_storage_write_contiguous[50-img_shape0-small] 0.5347ms 0.1326ms 7.5434 KOps/s 7.5768 KOps/s $\color{#d91a1a}-0.44\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3532ms 0.1864ms 5.3648 KOps/s 5.0553 KOps/s $\textbf{\color{#35bf28}+6.12\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9319ms 1.7809ms 561.5289 Ops/s 564.7873 Ops/s $\color{#d91a1a}-0.58\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6945ms 1.3396ms 746.4942 Ops/s 766.0448 Ops/s $\color{#d91a1a}-2.55\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2454ms 1.1187ms 893.9209 Ops/s 892.7533 Ops/s $\color{#35bf28}+0.13\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.5002ms 3.6329ms 275.2619 Ops/s 281.9876 Ops/s $\color{#d91a1a}-2.39\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.5480ms 5.7321ms 174.4562 Ops/s 174.0969 Ops/s $\color{#35bf28}+0.21\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 15.0441ms 7.1312ms 140.2296 Ops/s 143.5403 Ops/s $\color{#d91a1a}-2.31\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4417ms 0.2737ms 3.6536 KOps/s 3.5840 KOps/s $\color{#35bf28}+1.94\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6689ms 1.5184ms 658.5666 Ops/s 647.9313 Ops/s $\color{#35bf28}+1.64\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6432ms 2.4195ms 413.3059 Ops/s 405.9709 Ops/s $\color{#35bf28}+1.81\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3469ms 3.1272ms 319.7777 Ops/s 316.9655 Ops/s $\color{#35bf28}+0.89\%$
test_collector_without_rb[100-img_shape0-atari] 33.8054ms 32.7763ms 30.5099 Ops/s 30.7263 Ops/s $\color{#d91a1a}-0.70\%$
test_collector_without_rb[200-img_shape1-large_batch] 65.9918ms 64.4532ms 15.5151 Ops/s 15.5397 Ops/s $\color{#d91a1a}-0.16\%$
test_collector_with_rb[100-img_shape0-atari] 38.6054ms 37.3138ms 26.7997 Ops/s 27.0101 Ops/s $\color{#d91a1a}-0.78\%$
test_collector_with_rb[200-img_shape1-large_batch] 96.1414ms 74.3630ms 13.4475 Ops/s 13.6749 Ops/s $\color{#d91a1a}-1.66\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 16, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 86.4853μs 84.3182μs 11.8598 KOps/s 12.6680 KOps/s $\textbf{\color{#d91a1a}-6.38\%}$
test_tensor_to_bytestream_speed[torch.save] 0.1442ms 0.1432ms 6.9857 KOps/s 7.2688 KOps/s $\color{#d91a1a}-3.90\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1048s 0.1039s 9.6235 Ops/s 9.5688 Ops/s $\color{#35bf28}+0.57\%$
test_tensor_to_bytestream_speed[numpy] 2.5105μs 2.5043μs 399.3063 KOps/s 400.1154 KOps/s $\color{#d91a1a}-0.20\%$
test_tensor_to_bytestream_speed[safetensors] 37.0755μs 36.8040μs 27.1710 KOps/s 26.1142 KOps/s $\color{#35bf28}+4.05\%$
test_simple 0.7758s 0.7745s 1.2912 Ops/s 1.2565 Ops/s $\color{#35bf28}+2.76\%$
test_transformed 1.3604s 1.3575s 0.7366 Ops/s 0.7246 Ops/s $\color{#35bf28}+1.66\%$
test_serial 2.2470s 2.2449s 0.4454 Ops/s 0.4418 Ops/s $\color{#35bf28}+0.82\%$
test_parallel 1.8852s 1.8047s 0.5541 Ops/s 0.5528 Ops/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[True-True-True-True-True] 0.2605ms 40.4083μs 24.7474 KOps/s 25.0295 KOps/s $\color{#d91a1a}-1.13\%$
test_step_mdp_speed[True-True-True-True-False] 50.6910μs 22.5102μs 44.4243 KOps/s 43.6220 KOps/s $\color{#35bf28}+1.84\%$
test_step_mdp_speed[True-True-True-False-True] 89.8620μs 22.6556μs 44.1391 KOps/s 43.8794 KOps/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[True-True-True-False-False] 77.3820μs 12.3371μs 81.0562 KOps/s 80.7492 KOps/s $\color{#35bf28}+0.38\%$
test_step_mdp_speed[True-True-False-True-True] 79.7520μs 43.9212μs 22.7680 KOps/s 23.2910 KOps/s $\color{#d91a1a}-2.25\%$
test_step_mdp_speed[True-True-False-True-False] 56.2110μs 25.2505μs 39.6032 KOps/s 40.7054 KOps/s $\color{#d91a1a}-2.71\%$
test_step_mdp_speed[True-True-False-False-True] 57.4310μs 25.3983μs 39.3728 KOps/s 39.6041 KOps/s $\color{#d91a1a}-0.58\%$
test_step_mdp_speed[True-True-False-False-False] 52.7910μs 15.2476μs 65.5841 KOps/s 67.6866 KOps/s $\color{#d91a1a}-3.11\%$
test_step_mdp_speed[True-False-True-True-True] 87.1120μs 46.2981μs 21.5992 KOps/s 21.7425 KOps/s $\color{#d91a1a}-0.66\%$
test_step_mdp_speed[True-False-True-True-False] 90.6320μs 27.0205μs 37.0090 KOps/s 36.3296 KOps/s $\color{#35bf28}+1.87\%$
test_step_mdp_speed[True-False-True-False-True] 59.1710μs 25.7621μs 38.8167 KOps/s 39.6337 KOps/s $\color{#d91a1a}-2.06\%$
test_step_mdp_speed[True-False-True-False-False] 43.2610μs 15.0169μs 66.5916 KOps/s 66.8580 KOps/s $\color{#d91a1a}-0.40\%$
test_step_mdp_speed[True-False-False-True-True] 85.2020μs 48.3388μs 20.6873 KOps/s 20.9307 KOps/s $\color{#d91a1a}-1.16\%$
test_step_mdp_speed[True-False-False-True-False] 64.4320μs 30.1730μs 33.1423 KOps/s 33.5222 KOps/s $\color{#d91a1a}-1.13\%$
test_step_mdp_speed[True-False-False-False-True] 58.7610μs 28.0874μs 35.6031 KOps/s 35.8817 KOps/s $\color{#d91a1a}-0.78\%$
test_step_mdp_speed[True-False-False-False-False] 46.6110μs 17.4723μs 57.2333 KOps/s 57.7316 KOps/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[False-True-True-True-True] 85.2320μs 45.1286μs 22.1589 KOps/s 21.9837 KOps/s $\color{#35bf28}+0.80\%$
test_step_mdp_speed[False-True-True-True-False] 86.0320μs 27.1314μs 36.8577 KOps/s 35.8211 KOps/s $\color{#35bf28}+2.89\%$
test_step_mdp_speed[False-True-True-False-True] 2.6334ms 29.5213μs 33.8738 KOps/s 34.2017 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[False-True-True-False-False] 46.5310μs 16.8213μs 59.4484 KOps/s 59.8743 KOps/s $\color{#d91a1a}-0.71\%$
test_step_mdp_speed[False-True-False-True-True] 88.4420μs 48.4539μs 20.6382 KOps/s 20.8224 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[False-True-False-True-False] 68.5610μs 29.9594μs 33.3785 KOps/s 33.1982 KOps/s $\color{#35bf28}+0.54\%$
test_step_mdp_speed[False-True-False-False-True] 72.3220μs 30.9669μs 32.2925 KOps/s 32.1602 KOps/s $\color{#35bf28}+0.41\%$
test_step_mdp_speed[False-True-False-False-False] 57.2520μs 19.0269μs 52.5572 KOps/s 52.6404 KOps/s $\color{#d91a1a}-0.16\%$
test_step_mdp_speed[False-False-True-True-True] 0.1090ms 51.6961μs 19.3438 KOps/s 19.9986 KOps/s $\color{#d91a1a}-3.27\%$
test_step_mdp_speed[False-False-True-True-False] 74.3520μs 33.5504μs 29.8059 KOps/s 30.7357 KOps/s $\color{#d91a1a}-3.03\%$
test_step_mdp_speed[False-False-True-False-True] 72.2220μs 31.9441μs 31.3047 KOps/s 32.2747 KOps/s $\color{#d91a1a}-3.01\%$
test_step_mdp_speed[False-False-True-False-False] 51.6510μs 19.3975μs 51.5531 KOps/s 52.9922 KOps/s $\color{#d91a1a}-2.72\%$
test_step_mdp_speed[False-False-False-True-True] 94.1420μs 52.9639μs 18.8808 KOps/s 18.9715 KOps/s $\color{#d91a1a}-0.48\%$
test_step_mdp_speed[False-False-False-True-False] 70.1520μs 35.0606μs 28.5220 KOps/s 28.9376 KOps/s $\color{#d91a1a}-1.44\%$
test_step_mdp_speed[False-False-False-False-True] 69.4510μs 34.1145μs 29.3131 KOps/s 30.3000 KOps/s $\color{#d91a1a}-3.26\%$
test_step_mdp_speed[False-False-False-False-False] 50.3010μs 21.2565μs 47.0444 KOps/s 47.3314 KOps/s $\color{#d91a1a}-0.61\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8282s 0.7219s 1.3853 Ops/s 1.3852 Ops/s $+0.01\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6899s 0.5926s 1.6875 Ops/s 1.6830 Ops/s $\color{#35bf28}+0.27\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6700s 1.5915s 0.6283 Ops/s 0.6298 Ops/s $\color{#d91a1a}-0.23\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4596s 1.3698s 0.7301 Ops/s 0.7296 Ops/s $\color{#35bf28}+0.06\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9109s 1.8269s 0.5474 Ops/s 0.5472 Ops/s $\color{#35bf28}+0.04\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.6820s 1.6011s 0.6246 Ops/s 0.6187 Ops/s $\color{#35bf28}+0.95\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6869s 4.5693s 0.2189 Ops/s 0.2186 Ops/s $\color{#35bf28}+0.12\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5208s 4.3508s 0.2298 Ops/s 0.2274 Ops/s $\color{#35bf28}+1.09\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0947s 1.8791s 0.5322 Ops/s 0.5463 Ops/s $\color{#d91a1a}-2.59\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6257s 1.5381s 0.6502 Ops/s 0.6460 Ops/s $\color{#35bf28}+0.64\%$
test_values[generalized_advantage_estimate-True-True] 20.5906ms 20.2726ms 49.3278 Ops/s 49.0950 Ops/s $\color{#35bf28}+0.47\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1336s 3.5897ms 278.5765 Ops/s 270.1173 Ops/s $\color{#35bf28}+3.13\%$
test_values[td0_return_estimate-False-False] 0.1055ms 82.6874μs 12.0937 KOps/s 12.0063 KOps/s $\color{#35bf28}+0.73\%$
test_values[td1_return_estimate-False-False] 48.1128ms 47.5291ms 21.0398 Ops/s 20.9031 Ops/s $\color{#35bf28}+0.65\%$
test_values[vec_td1_return_estimate-False-False] 1.3020ms 1.0833ms 923.0709 Ops/s 921.2984 Ops/s $\color{#35bf28}+0.19\%$
test_values[td_lambda_return_estimate-True-False] 78.4021ms 77.8382ms 12.8472 Ops/s 12.7829 Ops/s $\color{#35bf28}+0.50\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3276ms 1.0802ms 925.7386 Ops/s 923.4219 Ops/s $\color{#35bf28}+0.25\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.5993ms 20.3312ms 49.1855 Ops/s 49.2807 Ops/s $\color{#d91a1a}-0.19\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0308ms 0.7684ms 1.3014 KOps/s 1.3194 KOps/s $\color{#d91a1a}-1.37\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7292ms 0.6748ms 1.4819 KOps/s 1.4875 KOps/s $\color{#d91a1a}-0.37\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5523ms 1.4875ms 672.2850 Ops/s 671.9196 Ops/s $\color{#35bf28}+0.05\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7422ms 0.6907ms 1.4479 KOps/s 1.4456 KOps/s $\color{#35bf28}+0.16\%$
test_dqn_speed[False-None] 1.5945ms 1.5119ms 661.3978 Ops/s 666.6180 Ops/s $\color{#d91a1a}-0.78\%$
test_dqn_speed[False-backward] 2.2062ms 2.1406ms 467.1655 Ops/s 460.3535 Ops/s $\color{#35bf28}+1.48\%$
test_dqn_speed[True-None] 0.6443ms 0.5715ms 1.7496 KOps/s 1.8117 KOps/s $\color{#d91a1a}-3.42\%$
test_dqn_speed[True-backward] 1.2365ms 1.1871ms 842.3948 Ops/s 925.6328 Ops/s $\textbf{\color{#d91a1a}-8.99\%}$
test_dqn_speed[reduce-overhead-None] 0.6806ms 0.5655ms 1.7683 KOps/s 1.7171 KOps/s $\color{#35bf28}+2.98\%$
test_ddpg_speed[False-None] 3.1944ms 2.8315ms 353.1686 Ops/s 347.1468 Ops/s $\color{#35bf28}+1.73\%$
test_ddpg_speed[False-backward] 4.6008ms 4.2123ms 237.4011 Ops/s 243.1628 Ops/s $\color{#d91a1a}-2.37\%$
test_ddpg_speed[True-None] 1.3675ms 1.2750ms 784.3157 Ops/s 778.2771 Ops/s $\color{#35bf28}+0.78\%$
test_ddpg_speed[True-backward] 2.5399ms 2.4659ms 405.5391 Ops/s 426.9548 Ops/s $\textbf{\color{#d91a1a}-5.02\%}$
test_ddpg_speed[reduce-overhead-None] 1.3825ms 1.2997ms 769.3867 Ops/s 759.4937 Ops/s $\color{#35bf28}+1.30\%$
test_sac_speed[False-None] 8.7860ms 8.2126ms 121.7635 Ops/s 121.1040 Ops/s $\color{#35bf28}+0.54\%$
test_sac_speed[False-backward] 11.9442ms 11.5003ms 86.9544 Ops/s 87.9898 Ops/s $\color{#d91a1a}-1.18\%$
test_sac_speed[True-None] 1.8606ms 1.7603ms 568.0996 Ops/s 568.3600 Ops/s $\color{#d91a1a}-0.05\%$
test_sac_speed[True-backward] 3.5677ms 3.4925ms 286.3285 Ops/s 285.5088 Ops/s $\color{#35bf28}+0.29\%$
test_sac_speed[reduce-overhead-None] 19.2667ms 10.9511ms 91.3146 Ops/s 82.7557 Ops/s $\textbf{\color{#35bf28}+10.34\%}$
test_redq_deprec_speed[False-None] 10.0982ms 9.2071ms 108.6118 Ops/s 108.9016 Ops/s $\color{#d91a1a}-0.27\%$
test_redq_deprec_speed[False-backward] 13.2265ms 12.6468ms 79.0716 Ops/s 79.5146 Ops/s $\color{#d91a1a}-0.56\%$
test_redq_deprec_speed[True-None] 2.7729ms 2.4845ms 402.4933 Ops/s 402.7102 Ops/s $\color{#d91a1a}-0.05\%$
test_redq_deprec_speed[True-backward] 4.4177ms 4.2215ms 236.8810 Ops/s 236.9568 Ops/s $\color{#d91a1a}-0.03\%$
test_redq_deprec_speed[reduce-overhead-None] 15.9015ms 9.7175ms 102.9074 Ops/s 103.0663 Ops/s $\color{#d91a1a}-0.15\%$
test_td3_speed[False-None] 8.4445ms 8.1029ms 123.4124 Ops/s 124.1218 Ops/s $\color{#d91a1a}-0.57\%$
test_td3_speed[False-backward] 11.4524ms 10.7168ms 93.3117 Ops/s 93.5678 Ops/s $\color{#d91a1a}-0.27\%$
test_td3_speed[True-None] 1.5958ms 1.5736ms 635.4808 Ops/s 636.0620 Ops/s $\color{#d91a1a}-0.09\%$
test_td3_speed[True-backward] 3.2221ms 3.1542ms 317.0339 Ops/s 313.1120 Ops/s $\color{#35bf28}+1.25\%$
test_td3_speed[reduce-overhead-None] 46.1053ms 23.6915ms 42.2092 Ops/s 41.3820 Ops/s $\color{#35bf28}+2.00\%$
test_cql_speed[False-None] 17.2756ms 16.9834ms 58.8810 Ops/s 58.6626 Ops/s $\color{#35bf28}+0.37\%$
test_cql_speed[False-backward] 23.0615ms 22.6084ms 44.2314 Ops/s 44.1509 Ops/s $\color{#35bf28}+0.18\%$
test_cql_speed[True-None] 3.2605ms 3.1523ms 317.2287 Ops/s 313.9966 Ops/s $\color{#35bf28}+1.03\%$
test_cql_speed[True-backward] 5.3890ms 5.2006ms 192.2858 Ops/s 185.0527 Ops/s $\color{#35bf28}+3.91\%$
test_cql_speed[reduce-overhead-None] 19.0078ms 11.8886ms 84.1143 Ops/s 84.4314 Ops/s $\color{#d91a1a}-0.38\%$
test_a2c_speed[False-None] 4.0201ms 3.1928ms 313.2004 Ops/s 312.7209 Ops/s $\color{#35bf28}+0.15\%$
test_a2c_speed[False-backward] 6.5045ms 6.3632ms 157.1528 Ops/s 156.6608 Ops/s $\color{#35bf28}+0.31\%$
test_a2c_speed[True-None] 1.7651ms 1.3026ms 767.7146 Ops/s 773.0855 Ops/s $\color{#d91a1a}-0.69\%$
test_a2c_speed[True-backward] 3.1078ms 3.0351ms 329.4755 Ops/s 328.3458 Ops/s $\color{#35bf28}+0.34\%$
test_a2c_speed[reduce-overhead-None] 1.0320ms 0.9536ms 1.0487 KOps/s 1.0478 KOps/s $\color{#35bf28}+0.09\%$
test_ppo_speed[False-None] 4.0465ms 3.8102ms 262.4561 Ops/s 263.3869 Ops/s $\color{#d91a1a}-0.35\%$
test_ppo_speed[False-backward] 7.5949ms 7.1509ms 139.8434 Ops/s 138.5100 Ops/s $\color{#35bf28}+0.96\%$
test_ppo_speed[True-None] 1.5327ms 1.3883ms 720.2924 Ops/s 730.9135 Ops/s $\color{#d91a1a}-1.45\%$
test_ppo_speed[True-backward] 3.3728ms 3.1948ms 313.0083 Ops/s 311.1927 Ops/s $\color{#35bf28}+0.58\%$
test_ppo_speed[reduce-overhead-None] 1.0703ms 1.0179ms 982.4040 Ops/s 957.1338 Ops/s $\color{#35bf28}+2.64\%$
test_reinforce_speed[False-None] 2.4220ms 2.2544ms 443.5841 Ops/s 440.4614 Ops/s $\color{#35bf28}+0.71\%$
test_reinforce_speed[False-backward] 3.6757ms 3.3906ms 294.9359 Ops/s 292.4441 Ops/s $\color{#35bf28}+0.85\%$
test_reinforce_speed[True-None] 1.3039ms 1.2269ms 815.0544 Ops/s 811.9252 Ops/s $\color{#35bf28}+0.39\%$
test_reinforce_speed[True-backward] 3.1137ms 3.0068ms 332.5760 Ops/s 332.1172 Ops/s $\color{#35bf28}+0.14\%$
test_reinforce_speed[reduce-overhead-None] 17.2783ms 9.5487ms 104.7258 Ops/s 108.7011 Ops/s $\color{#d91a1a}-3.66\%$
test_iql_speed[False-None] 10.1142ms 9.3334ms 107.1426 Ops/s 107.0234 Ops/s $\color{#35bf28}+0.11\%$
test_iql_speed[False-backward] 14.1801ms 13.3737ms 74.7734 Ops/s 74.7342 Ops/s $\color{#35bf28}+0.05\%$
test_iql_speed[True-None] 2.3143ms 2.1144ms 472.9488 Ops/s 471.3198 Ops/s $\color{#35bf28}+0.35\%$
test_iql_speed[True-backward] 4.9999ms 4.7769ms 209.3415 Ops/s 212.6807 Ops/s $\color{#d91a1a}-1.57\%$
test_iql_speed[reduce-overhead-None] 17.7117ms 10.4191ms 95.9776 Ops/s 95.9974 Ops/s $\color{#d91a1a}-0.02\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8903ms 5.7351ms 174.3652 Ops/s 174.0445 Ops/s $\color{#35bf28}+0.18\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.6738ms 0.2886ms 3.4646 KOps/s 2.8889 KOps/s $\textbf{\color{#35bf28}+19.93\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5136ms 0.2634ms 3.7961 KOps/s 3.0151 KOps/s $\textbf{\color{#35bf28}+25.91\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8018ms 5.5716ms 179.4828 Ops/s 178.3086 Ops/s $\color{#35bf28}+0.66\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.5729ms 0.2762ms 3.6203 KOps/s 3.1468 KOps/s $\textbf{\color{#35bf28}+15.05\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5411ms 0.2604ms 3.8401 KOps/s 3.2477 KOps/s $\textbf{\color{#35bf28}+18.24\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6210ms 1.2400ms 806.4714 Ops/s 800.3424 Ops/s $\color{#35bf28}+0.77\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.3820ms 1.1569ms 864.4040 Ops/s 849.2294 Ops/s $\color{#35bf28}+1.79\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9814ms 5.8521ms 170.8782 Ops/s 173.1097 Ops/s $\color{#d91a1a}-1.29\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2505ms 0.4626ms 2.1616 KOps/s 1.9998 KOps/s $\textbf{\color{#35bf28}+8.09\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6766ms 0.4758ms 2.1015 KOps/s 2.0742 KOps/s $\color{#35bf28}+1.32\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8068ms 5.6856ms 175.8815 Ops/s 177.3140 Ops/s $\color{#d91a1a}-0.81\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8130ms 0.3235ms 3.0912 KOps/s 3.5716 KOps/s $\textbf{\color{#d91a1a}-13.45\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4646ms 0.2652ms 3.7703 KOps/s 3.8370 KOps/s $\color{#d91a1a}-1.74\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9055ms 5.6531ms 176.8935 Ops/s 177.3956 Ops/s $\color{#d91a1a}-0.28\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8865ms 0.3475ms 2.8774 KOps/s 3.5817 KOps/s $\textbf{\color{#d91a1a}-19.66\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6841ms 0.3515ms 2.8453 KOps/s 3.8321 KOps/s $\textbf{\color{#d91a1a}-25.75\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.8912ms 5.7928ms 172.6284 Ops/s 173.7971 Ops/s $\color{#d91a1a}-0.67\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.8814ms 0.4946ms 2.0218 KOps/s 2.3231 KOps/s $\textbf{\color{#d91a1a}-12.97\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7359ms 0.4773ms 2.0953 KOps/s 2.4201 KOps/s $\textbf{\color{#d91a1a}-13.42\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.6055s 17.0219ms 58.7478 Ops/s 52.4601 Ops/s $\textbf{\color{#35bf28}+11.99\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.4752ms 1.9828ms 504.3375 Ops/s 540.9307 Ops/s $\textbf{\color{#d91a1a}-6.76\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 10.7319ms 1.3855ms 721.7605 Ops/s 788.5837 Ops/s $\textbf{\color{#d91a1a}-8.47\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 8.0216ms 4.9784ms 200.8664 Ops/s 201.4418 Ops/s $\color{#d91a1a}-0.29\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 9.2846ms 1.8924ms 528.4217 Ops/s 559.8437 Ops/s $\textbf{\color{#d91a1a}-5.61\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.0814ms 0.9202ms 1.0867 KOps/s 785.1668 Ops/s $\textbf{\color{#35bf28}+38.40\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.9664ms 5.1203ms 195.3002 Ops/s 193.0806 Ops/s $\color{#35bf28}+1.15\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.1841ms 1.9914ms 502.1647 Ops/s 480.3435 Ops/s $\color{#35bf28}+4.54\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.7624ms 1.1510ms 868.8248 Ops/s 853.6703 Ops/s $\color{#35bf28}+1.78\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.8668ms 35.6752ms 28.0307 Ops/s 27.6821 Ops/s $\color{#35bf28}+1.26\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.1765ms 17.8056ms 56.1620 Ops/s 55.0616 Ops/s $\color{#35bf28}+2.00\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.6366ms 36.6107ms 27.3144 Ops/s 26.7079 Ops/s $\color{#35bf28}+2.27\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.5884ms 18.0861ms 55.2911 Ops/s 53.8886 Ops/s $\color{#35bf28}+2.60\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 39.9117ms 38.3637ms 26.0663 Ops/s 25.8648 Ops/s $\color{#35bf28}+0.78\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.8645ms 19.5813ms 51.0691 Ops/s 50.3507 Ops/s $\color{#35bf28}+1.43\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8777ms 0.2155ms 4.6401 KOps/s 4.4581 KOps/s $\color{#35bf28}+4.08\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6176ms 1.4346ms 697.0601 Ops/s 717.5726 Ops/s $\color{#d91a1a}-2.86\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7865ms 2.3812ms 419.9602 Ops/s 427.2028 Ops/s $\color{#d91a1a}-1.70\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.2630ms 2.9834ms 335.1881 Ops/s 337.5695 Ops/s $\color{#d91a1a}-0.71\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2471ms 0.1587ms 6.3021 KOps/s 6.2245 KOps/s $\color{#35bf28}+1.25\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.5740ms 0.2201ms 4.5434 KOps/s 3.7067 KOps/s $\textbf{\color{#35bf28}+22.57\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9258ms 1.7650ms 566.5664 Ops/s 573.1812 Ops/s $\color{#d91a1a}-1.15\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5525ms 1.3632ms 733.5902 Ops/s 717.2477 Ops/s $\color{#35bf28}+2.28\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2693ms 1.1256ms 888.3942 Ops/s 889.7392 Ops/s $\color{#d91a1a}-0.15\%$
test_collector_stack_then_write[100-img_shape1-atari] 4.6971ms 3.5793ms 279.3866 Ops/s 276.8953 Ops/s $\color{#35bf28}+0.90\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.2440ms 5.8144ms 171.9869 Ops/s 171.2313 Ops/s $\color{#35bf28}+0.44\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.4388ms 6.9666ms 143.5419 Ops/s 143.8070 Ops/s $\color{#d91a1a}-0.18\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4345ms 0.2709ms 3.6914 KOps/s 3.6851 KOps/s $\color{#35bf28}+0.17\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6899ms 1.5302ms 653.4943 Ops/s 645.3392 Ops/s $\color{#35bf28}+1.26\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6490ms 2.4933ms 401.0807 Ops/s 415.2368 Ops/s $\color{#d91a1a}-3.41\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.5910ms 3.1675ms 315.7018 Ops/s 316.8657 Ops/s $\color{#d91a1a}-0.37\%$
test_collector_without_rb[100-img_shape0-atari] 32.6684ms 32.2726ms 30.9860 Ops/s 31.3243 Ops/s $\color{#d91a1a}-1.08\%$
test_collector_without_rb[200-img_shape1-large_batch] 63.7183ms 63.2599ms 15.8078 Ops/s 15.8606 Ops/s $\color{#d91a1a}-0.33\%$
test_collector_with_rb[100-img_shape0-atari] 37.4057ms 36.7150ms 27.2369 Ops/s 27.3827 Ops/s $\color{#d91a1a}-0.53\%$
test_collector_with_rb[200-img_shape1-large_batch] 71.7634ms 71.1853ms 14.0478 Ops/s 14.0278 Ops/s $\color{#35bf28}+0.14\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 54.6722ms 54.4680ms 18.3594 Ops/s 18.2879 Ops/s $\color{#35bf28}+0.39\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1088s 0.1085s 9.2185 Ops/s 9.1248 Ops/s $\color{#35bf28}+1.03\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 56.7828ms 56.5164ms 17.6940 Ops/s 17.7418 Ops/s $\color{#d91a1a}-0.27\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1132s 0.1124s 8.8986 Ops/s 8.8959 Ops/s $\color{#35bf28}+0.03\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 16, 2026
…izations

- Three-tier backend system: `backend` (global default), `env_backend`
  (env pool override), `policy_backend` (transport override), mirroring
  the device parameter pattern.
- Lock-free SlotTransport: per-env slots with no shared lock, replacing
  ThreadingTransport as the default for in-process threading.
- min_batch_size parameter for InferenceServer to accumulate requests.
- Batch drain from result queue (get_nowait after first blocking get).
- Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 5b0282d
Pull-Request: #3511
Co-authored-by: Cursor <cursoragent@cursor.com>
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 17, 2026
…izations

- Three-tier backend system: `backend` (global default), `env_backend`
  (env pool override), `policy_backend` (transport override), mirroring
  the device parameter pattern.
- Lock-free SlotTransport: per-env slots with no shared lock, replacing
  ThreadingTransport as the default for in-process threading.
- min_batch_size parameter for InferenceServer to accumulate requests.
- Batch drain from result queue (get_nowait after first blocking get).
- Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: d7fc567
Pull-Request: #3511
Co-authored-by: Cursor <cursoragent@cursor.com>
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 20, 2026
…izations

- Three-tier backend system: `backend` (global default), `env_backend`
  (env pool override), `policy_backend` (transport override), mirroring
  the device parameter pattern.
- Lock-free SlotTransport: per-env slots with no shared lock, replacing
  ThreadingTransport as the default for in-process threading.
- min_batch_size parameter for InferenceServer to accumulate requests.
- Batch drain from result queue (get_nowait after first blocking get).
- Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 3e3cd93
Pull-Request: #3511
Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions github-actions bot added the Documentation Improvements or additions to documentation label Feb 20, 2026
vmoens added a commit that referenced this pull request Feb 21, 2026
…izations

- Three-tier backend system: `backend` (global default), `env_backend`
  (env pool override), `policy_backend` (transport override), mirroring
  the device parameter pattern.
- Lock-free SlotTransport: per-env slots with no shared lock, replacing
  ThreadingTransport as the default for in-process threading.
- min_batch_size parameter for InferenceServer to accumulate requests.
- Batch drain from result queue (get_nowait after first blocking get).
- Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 3e3cd93
Pull-Request: #3511
Co-authored-by: Cursor <cursoragent@cursor.com>
vmoens added a commit that referenced this pull request Feb 21, 2026
…izations (#3511)

- Three-tier backend system: `backend` (global default), `env_backend`
  (env pool override), `policy_backend` (transport override), mirroring
  the device parameter pattern.
- Lock-free SlotTransport: per-env slots with no shared lock, replacing
  ThreadingTransport as the default for in-process threading.
- min_batch_size parameter for InferenceServer to accumulate requests.
- Batch drain from result queue (get_nowait after first blocking get).
- Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 3e3cd93
Pull-Request: #3511
Co-authored-by: Cursor <cursoragent@cursor.com>
@vmoens vmoens merged commit 2d201bf into gh/vmoens/242/base Feb 21, 2026
114 of 116 checks passed
@vmoens vmoens deleted the gh/vmoens/242/head branch February 21, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Benchmarks rl/benchmark changes CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Documentation Improvements or additions to documentation Examples Feature New feature Modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant