[Perf] ParallelEnv: replace mp.Event with shared-memory done flags#3457
Merged
vmoens merged 2 commits intogh/vmoens/218/basefrom Feb 7, 2026
Merged
[Perf] ParallelEnv: replace mp.Event with shared-memory done flags#3457vmoens merged 2 commits intogh/vmoens/218/basefrom
vmoens merged 2 commits intogh/vmoens/218/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3457
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 1643a0b with merge base ab49b59 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This was referenced Feb 6, 2026
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 81.0465μs | 80.1412μs | 12.4780 KOps/s | 12.3741 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1407ms | 0.1400ms | 7.1427 KOps/s | 7.0182 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1178s | 0.1175s | 8.5126 Ops/s | 8.4855 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.5658μs | 2.5600μs | 390.6195 KOps/s | 394.9232 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 37.6820μs | 37.3161μs | 26.7981 KOps/s | 26.2366 KOps/s | |
| test_simple | 0.5529s | 0.5517s | 1.8125 Ops/s | 1.7310 Ops/s | |
| test_transformed | 1.2510s | 1.1548s | 0.8660 Ops/s | 0.8554 Ops/s | |
| test_serial | 1.7015s | 1.6961s | 0.5896 Ops/s | 0.5770 Ops/s | |
| test_parallel | 1.1461s | 1.0526s | 0.9500 Ops/s | 0.9411 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 87.4210μs | 44.4850μs | 22.4795 KOps/s | 21.7650 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 0.4357ms | 25.0102μs | 39.9837 KOps/s | 38.3738 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 0.4337ms | 25.6375μs | 39.0054 KOps/s | 39.2977 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 40.1310μs | 13.6844μs | 73.0762 KOps/s | 71.3605 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 0.4736ms | 47.7857μs | 20.9267 KOps/s | 20.6553 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 0.4433ms | 27.6432μs | 36.1753 KOps/s | 35.4633 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 0.4407ms | 27.4630μs | 36.4127 KOps/s | 34.7211 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 52.3900μs | 16.4380μs | 60.8348 KOps/s | 58.8232 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 0.4677ms | 50.0795μs | 19.9683 KOps/s | 19.3590 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 0.4521ms | 30.4839μs | 32.8042 KOps/s | 31.7955 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 0.4519ms | 27.9433μs | 35.7867 KOps/s | 34.9332 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 46.4600μs | 16.5974μs | 60.2503 KOps/s | 58.4306 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 0.4714ms | 52.7507μs | 18.9571 KOps/s | 18.3440 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 0.4516ms | 33.1597μs | 30.1571 KOps/s | 29.0350 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 0.4498ms | 30.2158μs | 33.0953 KOps/s | 31.9561 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 51.1910μs | 19.2299μs | 52.0024 KOps/s | 50.0472 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 97.3910μs | 49.8318μs | 20.0675 KOps/s | 19.6765 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 62.8310μs | 30.6069μs | 32.6723 KOps/s | 31.7326 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.4280ms | 31.8613μs | 31.3861 KOps/s | 31.3191 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 46.6910μs | 18.3906μs | 54.3755 KOps/s | 53.4349 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 87.5210μs | 53.4194μs | 18.7198 KOps/s | 18.5525 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 60.8410μs | 33.2997μs | 30.0303 KOps/s | 29.0813 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 64.8410μs | 34.2821μs | 29.1698 KOps/s | 28.5283 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 61.8100μs | 20.9360μs | 47.7645 KOps/s | 46.3734 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 92.8710μs | 55.8433μs | 17.9072 KOps/s | 17.6481 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 90.2310μs | 36.1535μs | 27.6598 KOps/s | 26.8893 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 68.4400μs | 34.1711μs | 29.2645 KOps/s | 28.7717 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 52.9200μs | 21.0896μs | 47.4167 KOps/s | 46.4705 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 90.6610μs | 58.8598μs | 16.9895 KOps/s | 16.9048 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 66.3510μs | 38.5631μs | 25.9315 KOps/s | 25.1312 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 79.9510μs | 36.2237μs | 27.6063 KOps/s | 26.8043 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 51.2900μs | 23.1204μs | 43.2519 KOps/s | 41.5762 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8746s | 0.7766s | 1.2877 Ops/s | 1.2772 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7373s | 0.6377s | 1.5681 Ops/s | 1.5602 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.8039s | 1.7110s | 0.5845 Ops/s | 0.5847 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5396s | 1.4622s | 0.6839 Ops/s | 0.6768 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 2.0339s | 1.9448s | 0.5142 Ops/s | 0.5107 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7953s | 1.7170s | 0.5824 Ops/s | 0.5792 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.7964s | 4.6849s | 0.2135 Ops/s | 0.2108 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.5404s | 4.4691s | 0.2238 Ops/s | 0.2211 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9801s | 1.9109s | 0.5233 Ops/s | 0.5188 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.7845s | 1.6277s | 0.6144 Ops/s | 0.6131 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 12.1420ms | 10.9418ms | 91.3923 Ops/s | 94.4008 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 19.8166ms | 13.7035ms | 72.9741 Ops/s | 55.9455 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.2402ms | 0.1360ms | 7.3552 KOps/s | 7.7533 KOps/s | |
| test_values[td1_return_estimate-False-False] | 29.6793ms | 28.8758ms | 34.6310 Ops/s | 35.3518 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 18.4331ms | 15.1262ms | 66.1105 Ops/s | 54.8850 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 43.1755ms | 42.6642ms | 23.4388 Ops/s | 23.6781 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 18.0284ms | 12.3827ms | 80.7578 Ops/s | 55.2984 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 9.5956ms | 9.4844ms | 105.4365 Ops/s | 105.6019 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.7349ms | 1.5792ms | 633.2303 Ops/s | 653.8908 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.4937ms | 0.4381ms | 2.2824 KOps/s | 2.2962 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 30.4718ms | 29.9030ms | 33.4415 Ops/s | 31.9575 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 1.8257ms | 1.7145ms | 583.2589 Ops/s | 581.8817 Ops/s | |
| test_dqn_speed[False-None] | 1.6580ms | 1.4053ms | 711.6048 Ops/s | 705.0253 Ops/s | |
| test_dqn_speed[False-backward] | 1.9885ms | 1.9228ms | 520.0775 Ops/s | 513.6461 Ops/s | |
| test_dqn_speed[True-None] | 0.6680ms | 0.5483ms | 1.8239 KOps/s | 1.7496 KOps/s | |
| test_dqn_speed[True-backward] | 1.0424ms | 1.0154ms | 984.8253 Ops/s | 890.3364 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6549ms | 0.5471ms | 1.8279 KOps/s | 1.7906 KOps/s | |
| test_ddpg_speed[False-None] | 3.2707ms | 2.8732ms | 348.0419 Ops/s | 349.2912 Ops/s | |
| test_ddpg_speed[False-backward] | 4.2245ms | 4.0994ms | 243.9391 Ops/s | 245.8617 Ops/s | |
| test_ddpg_speed[True-None] | 1.5608ms | 1.4108ms | 708.8270 Ops/s | 678.0647 Ops/s | |
| test_ddpg_speed[True-backward] | 2.4897ms | 2.4319ms | 411.2035 Ops/s | 379.6240 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.5674ms | 1.4295ms | 699.5505 Ops/s | 690.8347 Ops/s | |
| test_sac_speed[False-None] | 8.7360ms | 8.0534ms | 124.1709 Ops/s | 120.9643 Ops/s | |
| test_sac_speed[False-backward] | 11.8521ms | 11.2811ms | 88.6436 Ops/s | 87.8442 Ops/s | |
| test_sac_speed[True-None] | 2.3124ms | 2.2002ms | 454.5059 Ops/s | 457.2900 Ops/s | |
| test_sac_speed[True-backward] | 4.1551ms | 4.0431ms | 247.3379 Ops/s | 203.0807 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 2.3258ms | 2.1814ms | 458.4134 Ops/s | 445.3448 Ops/s | |
| test_redq_speed[False-None] | 10.8304ms | 10.2997ms | 97.0902 Ops/s | 87.9269 Ops/s | |
| test_redq_speed[False-backward] | 18.3138ms | 17.5713ms | 56.9111 Ops/s | 55.3423 Ops/s | |
| test_redq_speed[True-None] | 4.6406ms | 4.4605ms | 224.1893 Ops/s | 221.5255 Ops/s | |
| test_redq_speed[True-backward] | 9.9633ms | 9.6097ms | 104.0618 Ops/s | 98.8074 Ops/s | |
| test_redq_speed[reduce-overhead-None] | 4.6995ms | 4.4872ms | 222.8539 Ops/s | 224.1008 Ops/s | |
| test_redq_deprec_speed[False-None] | 11.4638ms | 10.9529ms | 91.3003 Ops/s | 89.5019 Ops/s | |
| test_redq_deprec_speed[False-backward] | 16.0887ms | 15.7838ms | 63.3563 Ops/s | 62.4139 Ops/s | |
| test_redq_deprec_speed[True-None] | 4.0751ms | 3.7104ms | 269.5095 Ops/s | 268.4680 Ops/s | |
| test_redq_deprec_speed[True-backward] | 7.8902ms | 7.6742ms | 130.3065 Ops/s | 129.3405 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 4.0286ms | 3.6696ms | 272.5124 Ops/s | 270.7919 Ops/s | |
| test_td3_speed[False-None] | 8.2193ms | 8.0979ms | 123.4884 Ops/s | 123.8560 Ops/s | |
| test_td3_speed[False-backward] | 11.4823ms | 10.9679ms | 91.1750 Ops/s | 90.1581 Ops/s | |
| test_td3_speed[True-None] | 1.9005ms | 1.8641ms | 536.4539 Ops/s | 540.1615 Ops/s | |
| test_td3_speed[True-backward] | 3.8317ms | 3.6945ms | 270.6738 Ops/s | 250.5550 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 1.8583ms | 1.8289ms | 546.7881 Ops/s | 548.4806 Ops/s | |
| test_cql_speed[False-None] | 28.8438ms | 26.0854ms | 38.3356 Ops/s | 38.0473 Ops/s | |
| test_cql_speed[False-backward] | 35.8561ms | 35.1597ms | 28.4416 Ops/s | 27.8238 Ops/s | |
| test_cql_speed[True-None] | 13.4990ms | 12.3675ms | 80.8568 Ops/s | 79.7581 Ops/s | |
| test_cql_speed[True-backward] | 18.4289ms | 17.7859ms | 56.2243 Ops/s | 54.9368 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 12.9161ms | 12.5227ms | 79.8548 Ops/s | 78.8935 Ops/s | |
| test_a2c_speed[False-None] | 5.8576ms | 5.4651ms | 182.9799 Ops/s | 185.7094 Ops/s | |
| test_a2c_speed[False-backward] | 12.5219ms | 11.8998ms | 84.0352 Ops/s | 83.2200 Ops/s | |
| test_a2c_speed[True-None] | 4.0082ms | 3.7266ms | 268.3445 Ops/s | 258.9982 Ops/s | |
| test_a2c_speed[True-backward] | 9.0050ms | 8.6398ms | 115.7433 Ops/s | 110.1452 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 3.9105ms | 3.7783ms | 264.6687 Ops/s | 261.5434 Ops/s | |
| test_ppo_speed[False-None] | 6.2680ms | 5.9764ms | 167.3240 Ops/s | 169.2697 Ops/s | |
| test_ppo_speed[False-backward] | 12.8040ms | 12.5112ms | 79.9283 Ops/s | 78.5734 Ops/s | |
| test_ppo_speed[True-None] | 3.8658ms | 3.6738ms | 272.1977 Ops/s | 263.8098 Ops/s | |
| test_ppo_speed[True-backward] | 8.7197ms | 8.5105ms | 117.5013 Ops/s | 114.6995 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 3.9583ms | 3.7775ms | 264.7246 Ops/s | 270.8703 Ops/s | |
| test_reinforce_speed[False-None] | 4.9557ms | 4.6816ms | 213.6000 Ops/s | 220.2278 Ops/s | |
| test_reinforce_speed[False-backward] | 7.8066ms | 7.5222ms | 132.9392 Ops/s | 135.3226 Ops/s | |
| test_reinforce_speed[True-None] | 3.0265ms | 2.8622ms | 349.3772 Ops/s | 332.5637 Ops/s | |
| test_reinforce_speed[True-backward] | 8.0508ms | 7.7478ms | 129.0695 Ops/s | 127.5424 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 3.2158ms | 2.8725ms | 348.1294 Ops/s | 342.2515 Ops/s | |
| test_iql_speed[False-None] | 25.5059ms | 20.3149ms | 49.2248 Ops/s | 47.7275 Ops/s | |
| test_iql_speed[False-backward] | 36.1447ms | 30.5996ms | 32.6802 Ops/s | 32.4758 Ops/s | |
| test_iql_speed[True-None] | 8.7730ms | 8.5094ms | 117.5164 Ops/s | 113.3086 Ops/s | |
| test_iql_speed[True-backward] | 16.8638ms | 16.5818ms | 60.3071 Ops/s | 58.8871 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 8.7732ms | 8.6163ms | 116.0585 Ops/s | 114.9414 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.2927ms | 6.1215ms | 163.3595 Ops/s | 162.5048 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.8361ms | 0.3842ms | 2.6031 KOps/s | 3.2467 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6886ms | 0.3743ms | 2.6715 KOps/s | 3.5466 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.0536ms | 5.7918ms | 172.6576 Ops/s | 168.7717 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 2.2549ms | 0.3130ms | 3.1947 KOps/s | 3.5230 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6289ms | 0.3083ms | 3.2435 KOps/s | 3.7462 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.5875ms | 1.3268ms | 753.6763 Ops/s | 779.3732 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.5372ms | 1.2809ms | 780.6889 Ops/s | 830.2367 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 9.5047ms | 6.1143ms | 163.5499 Ops/s | 166.4931 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.9773ms | 0.4376ms | 2.2854 KOps/s | 2.2923 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6771ms | 0.4401ms | 2.2723 KOps/s | 2.3912 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.1574ms | 5.8739ms | 170.2454 Ops/s | 168.1950 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.0946ms | 0.3063ms | 3.2648 KOps/s | 3.1040 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5129ms | 0.2887ms | 3.4644 KOps/s | 3.2120 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1032ms | 5.8292ms | 171.5510 Ops/s | 169.5318 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 2.1580ms | 0.3343ms | 2.9916 KOps/s | 2.7968 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5699ms | 0.3130ms | 3.1946 KOps/s | 2.7473 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.1190ms | 5.9906ms | 166.9287 Ops/s | 165.9599 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.8426ms | 0.5095ms | 1.9628 KOps/s | 2.0519 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7349ms | 0.4919ms | 2.0329 KOps/s | 2.3189 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.4438ms | 5.0442ms | 198.2471 Ops/s | 57.0051 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 4.2254ms | 2.0537ms | 486.9346 Ops/s | 504.7626 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 3.2490ms | 0.9202ms | 1.0867 KOps/s | 1.1455 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.5427s | 15.8711ms | 63.0077 Ops/s | 194.3943 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 12.8461ms | 1.9755ms | 506.1908 Ops/s | 511.7632 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 7.1460ms | 1.2166ms | 821.9544 Ops/s | 771.7766 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 6.6902ms | 5.1823ms | 192.9656 Ops/s | 59.4087 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 6.0448ms | 1.9468ms | 513.6626 Ops/s | 463.1263 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 1.1993ms | 1.0188ms | 981.5095 Ops/s | 971.2701 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 39.0299ms | 36.3277ms | 27.5272 Ops/s | 27.3445 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 20.9415ms | 18.7068ms | 53.4566 Ops/s | 53.6611 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 40.0860ms | 37.4539ms | 26.6995 Ops/s | 26.6721 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 20.5218ms | 18.6830ms | 53.5247 Ops/s | 53.0189 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 40.7692ms | 39.1663ms | 25.5322 Ops/s | 25.0887 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.5623ms | 20.2601ms | 49.3580 Ops/s | 48.9594 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8605ms | 0.2193ms | 4.5598 KOps/s | 4.5437 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.6060ms | 1.4038ms | 712.3432 Ops/s | 720.3182 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.6216ms | 2.3095ms | 432.9929 Ops/s | 429.0231 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.1310ms | 2.9628ms | 337.5164 Ops/s | 335.8331 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2457ms | 0.1351ms | 7.4001 KOps/s | 7.1221 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3489ms | 0.2068ms | 4.8350 KOps/s | 4.7050 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.9507ms | 1.7646ms | 566.7126 Ops/s | 552.7737 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.4366ms | 1.2943ms | 772.6049 Ops/s | 781.0504 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.2487ms | 1.1323ms | 883.1777 Ops/s | 880.2170 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.7212ms | 3.5174ms | 284.3044 Ops/s | 270.2823 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 11.1869ms | 5.7361ms | 174.3335 Ops/s | 177.9472 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.4777ms | 7.0511ms | 141.8220 Ops/s | 142.0237 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4299ms | 0.2740ms | 3.6494 KOps/s | 3.6159 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.8194ms | 1.5363ms | 650.8994 Ops/s | 654.9538 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.6531ms | 2.4547ms | 407.3782 Ops/s | 412.1187 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.2677ms | 3.1290ms | 319.5905 Ops/s | 319.6383 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 34.6747ms | 34.1786ms | 29.2581 Ops/s | 28.9553 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 67.4536ms | 67.1721ms | 14.8871 Ops/s | 14.7717 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 39.3759ms | 38.7225ms | 25.8247 Ops/s | 25.4660 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 76.3121ms | 75.8431ms | 13.1851 Ops/s | 13.1130 Ops/s |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 80.8382μs | 79.6934μs | 12.5481 KOps/s | 12.3866 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1403ms | 0.1397ms | 7.1589 KOps/s | 7.2104 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1090s | 0.1084s | 9.2256 Ops/s | 9.2166 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.6605μs | 2.6423μs | 378.4554 KOps/s | 398.4340 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 37.7355μs | 37.3775μs | 26.7540 KOps/s | 26.9845 KOps/s | |
| test_simple | 0.7943s | 0.7936s | 1.2601 Ops/s | 1.2158 Ops/s | |
| test_transformed | 1.5400s | 1.4468s | 0.6912 Ops/s | 0.6849 Ops/s | |
| test_serial | 2.4023s | 2.3091s | 0.4331 Ops/s | 0.4282 Ops/s | |
| test_parallel | 1.9112s | 1.8155s | 0.5508 Ops/s | 0.5588 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.3509ms | 45.2075μs | 22.1202 KOps/s | 22.2553 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 52.5530μs | 26.0523μs | 38.3843 KOps/s | 39.6968 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 51.4030μs | 25.2723μs | 39.5690 KOps/s | 40.2784 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 42.0720μs | 13.9047μs | 71.9182 KOps/s | 72.2986 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 85.9540μs | 48.6854μs | 20.5400 KOps/s | 21.1257 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 60.2120μs | 28.4222μs | 35.1838 KOps/s | 35.9445 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 57.3330μs | 28.1594μs | 35.5121 KOps/s | 36.0964 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 48.5620μs | 16.7845μs | 59.5789 KOps/s | 60.1937 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 98.5450μs | 50.9300μs | 19.6348 KOps/s | 20.0581 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 60.5630μs | 31.4052μs | 31.8418 KOps/s | 32.6044 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 58.3130μs | 27.9525μs | 35.7750 KOps/s | 36.0097 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 47.8930μs | 16.6840μs | 59.9378 KOps/s | 60.0176 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 87.5840μs | 54.0836μs | 18.4899 KOps/s | 18.9614 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 64.8330μs | 34.0842μs | 29.3391 KOps/s | 29.9009 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 74.4630μs | 30.8246μs | 32.4416 KOps/s | 33.4409 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 47.6920μs | 19.3347μs | 51.7205 KOps/s | 51.0028 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 91.6340μs | 50.8954μs | 19.6481 KOps/s | 20.0800 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 62.9040μs | 30.7334μs | 32.5379 KOps/s | 31.8054 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.5204ms | 34.0583μs | 29.3615 KOps/s | 31.8804 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 46.7630μs | 18.6173μs | 53.7136 KOps/s | 54.2945 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 90.5040μs | 54.9290μs | 18.2053 KOps/s | 19.0179 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 73.6740μs | 34.1416μs | 29.2898 KOps/s | 29.6728 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 77.2540μs | 34.9535μs | 28.6095 KOps/s | 29.7091 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 52.6630μs | 21.1213μs | 47.3455 KOps/s | 47.3338 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 91.9950μs | 56.5261μs | 17.6909 KOps/s | 17.7258 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 69.6840μs | 36.8453μs | 27.1405 KOps/s | 27.2548 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 68.3830μs | 34.1380μs | 29.2928 KOps/s | 29.5216 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 56.8530μs | 20.7002μs | 48.3088 KOps/s | 47.6112 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 0.1105ms | 58.0139μs | 17.2373 KOps/s | 17.2646 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 72.3030μs | 39.0746μs | 25.5921 KOps/s | 25.9743 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 68.2230μs | 36.8050μs | 27.1702 KOps/s | 28.0622 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 65.8430μs | 23.5263μs | 42.5057 KOps/s | 41.9800 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8707s | 0.7650s | 1.3072 Ops/s | 1.2996 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7266s | 0.6316s | 1.5833 Ops/s | 1.5741 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7508s | 1.6720s | 0.5981 Ops/s | 0.5948 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5362s | 1.4536s | 0.6879 Ops/s | 0.6875 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 2.0040s | 1.9287s | 0.5185 Ops/s | 0.5199 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7916s | 1.7062s | 0.5861 Ops/s | 0.5867 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.7971s | 4.6338s | 0.2158 Ops/s | 0.2127 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.5571s | 4.3985s | 0.2274 Ops/s | 0.2229 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9989s | 1.9251s | 0.5194 Ops/s | 0.5158 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.6975s | 1.6190s | 0.6176 Ops/s | 0.6171 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 20.4550ms | 19.9514ms | 50.1219 Ops/s | 49.2009 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 0.1300s | 3.5155ms | 284.4516 Ops/s | 287.8051 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.1077ms | 81.6428μs | 12.2485 KOps/s | 12.2697 KOps/s | |
| test_values[td1_return_estimate-False-False] | 47.6179ms | 47.2143ms | 21.1800 Ops/s | 20.7927 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 1.2874ms | 1.0781ms | 927.5725 Ops/s | 922.9813 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 78.1378ms | 77.4834ms | 12.9060 Ops/s | 12.8051 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 1.2728ms | 1.0757ms | 929.6240 Ops/s | 926.1803 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 20.5676ms | 20.3099ms | 49.2371 Ops/s | 48.4035 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0098ms | 0.7445ms | 1.3432 KOps/s | 1.3305 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7326ms | 0.6751ms | 1.4812 KOps/s | 1.4854 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5578ms | 1.4858ms | 673.0188 Ops/s | 673.6103 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.7622ms | 0.6883ms | 1.4529 KOps/s | 1.4503 KOps/s | |
| test_dqn_speed[False-None] | 1.6304ms | 1.5293ms | 653.8955 Ops/s | 659.2008 Ops/s | |
| test_dqn_speed[False-backward] | 2.2210ms | 2.1622ms | 462.4839 Ops/s | 462.3999 Ops/s | |
| test_dqn_speed[True-None] | 1.0852ms | 0.5789ms | 1.7273 KOps/s | 1.6608 KOps/s | |
| test_dqn_speed[True-backward] | 1.1696ms | 1.1015ms | 907.8635 Ops/s | 812.1106 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6900ms | 0.5928ms | 1.6869 KOps/s | 1.6470 KOps/s | |
| test_ddpg_speed[False-None] | 3.2980ms | 2.8965ms | 345.2457 Ops/s | 349.6749 Ops/s | |
| test_ddpg_speed[False-backward] | 4.6574ms | 4.1074ms | 243.4651 Ops/s | 236.1135 Ops/s | |
| test_ddpg_speed[True-None] | 1.4433ms | 1.3389ms | 746.9005 Ops/s | 746.3871 Ops/s | |
| test_ddpg_speed[True-backward] | 2.4678ms | 2.3788ms | 420.3876 Ops/s | 391.0994 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.4573ms | 1.3651ms | 732.5401 Ops/s | 731.2055 Ops/s | |
| test_sac_speed[False-None] | 8.8845ms | 8.2799ms | 120.7744 Ops/s | 121.9224 Ops/s | |
| test_sac_speed[False-backward] | 11.6098ms | 11.1510ms | 89.6783 Ops/s | 88.4161 Ops/s | |
| test_sac_speed[True-None] | 1.9618ms | 1.8424ms | 542.7711 Ops/s | 538.5805 Ops/s | |
| test_sac_speed[True-backward] | 3.5558ms | 3.4541ms | 289.5072 Ops/s | 271.7500 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 20.1502ms | 11.1029ms | 90.0667 Ops/s | 82.1603 Ops/s | |
| test_redq_deprec_speed[False-None] | 9.8627ms | 9.2510ms | 108.0964 Ops/s | 106.3392 Ops/s | |
| test_redq_deprec_speed[False-backward] | 12.8369ms | 12.2846ms | 81.4029 Ops/s | 79.0927 Ops/s | |
| test_redq_deprec_speed[True-None] | 2.6513ms | 2.5552ms | 391.3636 Ops/s | 389.9723 Ops/s | |
| test_redq_deprec_speed[True-backward] | 4.2783ms | 4.1167ms | 242.9145 Ops/s | 227.1056 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 16.3514ms | 9.9248ms | 100.7581 Ops/s | 100.1937 Ops/s | |
| test_td3_speed[False-None] | 8.3844ms | 8.1586ms | 122.5704 Ops/s | 123.0100 Ops/s | |
| test_td3_speed[False-backward] | 10.9146ms | 10.4473ms | 95.7183 Ops/s | 93.5386 Ops/s | |
| test_td3_speed[True-None] | 1.7242ms | 1.6818ms | 594.5895 Ops/s | 597.2951 Ops/s | |
| test_td3_speed[True-backward] | 3.3845ms | 3.2637ms | 306.3977 Ops/s | 300.3268 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 83.5770ms | 25.0231ms | 39.9630 Ops/s | 40.7721 Ops/s | |
| test_cql_speed[False-None] | 17.4671ms | 17.1339ms | 58.3640 Ops/s | 58.2716 Ops/s | |
| test_cql_speed[False-backward] | 23.1514ms | 22.6211ms | 44.2064 Ops/s | 44.2589 Ops/s | |
| test_cql_speed[True-None] | 3.7919ms | 3.3801ms | 295.8473 Ops/s | 299.7021 Ops/s | |
| test_cql_speed[True-backward] | 5.9554ms | 5.4412ms | 183.7821 Ops/s | 176.4599 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 19.2258ms | 12.0785ms | 82.7917 Ops/s | 83.8080 Ops/s | |
| test_a2c_speed[False-None] | 4.0214ms | 3.2524ms | 307.4679 Ops/s | 310.9154 Ops/s | |
| test_a2c_speed[False-backward] | 6.1634ms | 6.0237ms | 166.0120 Ops/s | 159.6938 Ops/s | |
| test_a2c_speed[True-None] | 1.5209ms | 1.3497ms | 740.9267 Ops/s | 733.6127 Ops/s | |
| test_a2c_speed[True-backward] | 3.0542ms | 2.9763ms | 335.9894 Ops/s | 317.9956 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 1.1549ms | 0.9913ms | 1.0088 KOps/s | 1.0176 KOps/s | |
| test_ppo_speed[False-None] | 3.9439ms | 3.8090ms | 262.5335 Ops/s | 262.1957 Ops/s | |
| test_ppo_speed[False-backward] | 7.2616ms | 6.8280ms | 146.4566 Ops/s | 141.2602 Ops/s | |
| test_ppo_speed[True-None] | 1.5503ms | 1.4492ms | 690.0414 Ops/s | 694.9424 Ops/s | |
| test_ppo_speed[True-backward] | 3.2082ms | 3.1038ms | 322.1867 Ops/s | 299.4491 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 1.5147ms | 1.0609ms | 942.5975 Ops/s | 922.5984 Ops/s | |
| test_reinforce_speed[False-None] | 2.7099ms | 2.2683ms | 440.8671 Ops/s | 429.0186 Ops/s | |
| test_reinforce_speed[False-backward] | 3.7210ms | 3.2581ms | 306.9292 Ops/s | 302.6965 Ops/s | |
| test_reinforce_speed[True-None] | 1.7440ms | 1.3071ms | 765.0705 Ops/s | 742.4942 Ops/s | |
| test_reinforce_speed[True-backward] | 3.3126ms | 2.9265ms | 341.7008 Ops/s | 336.2993 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 17.5102ms | 9.5657ms | 104.5404 Ops/s | 104.3868 Ops/s | |
| test_iql_speed[False-None] | 9.8437ms | 9.3523ms | 106.9252 Ops/s | 106.5305 Ops/s | |
| test_iql_speed[False-backward] | 13.3524ms | 12.8868ms | 77.5987 Ops/s | 76.0391 Ops/s | |
| test_iql_speed[True-None] | 2.4377ms | 2.2238ms | 449.6745 Ops/s | 444.1160 Ops/s | |
| test_iql_speed[True-backward] | 4.9113ms | 4.7427ms | 210.8482 Ops/s | 200.2140 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 17.9525ms | 10.6067ms | 94.2801 Ops/s | 94.6668 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.1944ms | 6.0280ms | 165.8934 Ops/s | 168.2051 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.5862ms | 0.2822ms | 3.5442 KOps/s | 3.5400 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7066ms | 0.2695ms | 3.7112 KOps/s | 3.7644 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.2616ms | 5.9119ms | 169.1490 Ops/s | 176.1013 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.5948ms | 0.3234ms | 3.0926 KOps/s | 2.8034 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6181ms | 0.3316ms | 3.0159 KOps/s | 2.9586 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6593ms | 1.4479ms | 690.6750 Ops/s | 719.0616 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.6148ms | 1.3521ms | 739.6115 Ops/s | 785.8308 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.1877ms | 5.9445ms | 168.2241 Ops/s | 170.3845 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.0439ms | 0.4181ms | 2.3916 KOps/s | 1.9682 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.5968ms | 0.4030ms | 2.4815 KOps/s | 2.4570 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.0227ms | 5.8622ms | 170.5851 Ops/s | 168.6049 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.0381ms | 0.3052ms | 3.2765 KOps/s | 3.5040 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5211ms | 0.2906ms | 3.4410 KOps/s | 3.7690 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.0874ms | 5.8179ms | 171.8838 Ops/s | 170.4174 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.8128ms | 0.2774ms | 3.6049 KOps/s | 2.9914 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6867ms | 0.2609ms | 3.8335 KOps/s | 3.4081 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.1512ms | 5.9898ms | 166.9492 Ops/s | 166.3304 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.5726s | 1.2165ms | 822.0350 Ops/s | 2.3223 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6925ms | 0.4814ms | 2.0775 KOps/s | 2.4597 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.8055ms | 5.1076ms | 195.7876 Ops/s | 199.1206 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 12.2433ms | 2.1712ms | 460.5651 Ops/s | 435.3415 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 2.2270ms | 1.1400ms | 877.1868 Ops/s | 1.0069 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 8.5582ms | 5.0889ms | 196.5074 Ops/s | 50.3994 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 4.0449ms | 1.8637ms | 536.5739 Ops/s | 538.7943 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 12.4661ms | 1.4023ms | 713.1086 Ops/s | 774.8509 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.5487s | 16.2858ms | 61.4031 Ops/s | 187.1406 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 4.2028ms | 1.9755ms | 506.2118 Ops/s | 469.3880 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 1.9547ms | 1.1166ms | 895.5824 Ops/s | 928.9691 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 37.4807ms | 35.4474ms | 28.2108 Ops/s | 28.0715 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.5689ms | 17.6958ms | 56.5107 Ops/s | 55.5459 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 40.0885ms | 37.0694ms | 26.9764 Ops/s | 26.9734 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 19.9155ms | 18.1371ms | 55.1355 Ops/s | 53.1076 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 41.4180ms | 39.5800ms | 25.2653 Ops/s | 25.5521 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.8332ms | 20.3652ms | 49.1035 Ops/s | 50.1452 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8704ms | 0.2198ms | 4.5497 KOps/s | 4.5459 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 2.2890ms | 1.4068ms | 710.8204 Ops/s | 693.5190 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.7076ms | 2.3281ms | 429.5309 Ops/s | 428.0676 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.0700ms | 2.9342ms | 340.8083 Ops/s | 339.7671 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2511ms | 0.1638ms | 6.1053 KOps/s | 6.1439 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3699ms | 0.2296ms | 4.3545 KOps/s | 4.3892 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.9907ms | 1.8410ms | 543.1863 Ops/s | 543.6777 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.6066ms | 1.2998ms | 769.3454 Ops/s | 705.9498 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.3389ms | 1.1589ms | 862.8930 Ops/s | 869.8857 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.8711ms | 3.6412ms | 274.6355 Ops/s | 268.2138 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 5.9968ms | 5.8368ms | 171.3278 Ops/s | 173.3952 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.5920ms | 7.3951ms | 135.2245 Ops/s | 132.5882 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4169ms | 0.2720ms | 3.6770 KOps/s | 3.6659 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.6543ms | 1.5094ms | 662.5063 Ops/s | 653.1470 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.5817ms | 2.4128ms | 414.4629 Ops/s | 409.4435 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.4268ms | 3.1204ms | 320.4718 Ops/s | 318.0432 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 34.0815ms | 33.6226ms | 29.7419 Ops/s | 29.4402 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 67.2309ms | 65.9359ms | 15.1662 Ops/s | 15.0203 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 38.8321ms | 38.0020ms | 26.3144 Ops/s | 26.1608 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 76.6722ms | 75.3424ms | 13.2727 Ops/s | 13.0495 Ops/s | |
| test_collector_without_rb_cuda[100-img_shape0-atari] | 59.3239ms | 58.4965ms | 17.0950 Ops/s | 17.6099 Ops/s | |
| test_collector_without_rb_cuda[200-img_shape1-large_batch] | 0.8276s | 0.1937s | 5.1625 Ops/s | 8.8412 Ops/s | |
| test_collector_with_rb_cuda[100-img_shape0-atari] | 61.6666ms | 59.8250ms | 16.7154 Ops/s | 17.0329 Ops/s | |
| test_collector_with_rb_cuda[200-img_shape1-large_batch] | 0.1191s | 0.1180s | 8.4715 Ops/s | 8.5335 Ops/s |
vmoens
added a commit
that referenced
this pull request
Feb 7, 2026
Replace multiprocessing.Event (futex-based syscalls) with multiprocessing.RawArray shared-memory byte flags for worker-to-parent completion signaling on the hot path (step_and_maybe_reset). - _start_workers: creates shm_done_flags RawArray, passes to workers - _wait_for_workers: spin-polls done_flags instead of Event.wait() - Worker: _signal_done() closure writes shm_done_flags[idx]=1 - _shutdown_workers: uses _wait_for_workers instead of Event.wait() Measured impact: - 10% FPS improvement (7,737 -> 8,509 fps) on H200 with 8 workers - 28% reduction in penv.wait_for_workers overhead (2,622us -> 1,891us) - ParallelEnv.close() fixed from 80s timeout to ~0.9s Co-authored-by: Cursor <[email protected]> ghstack-source-id: f29522a Pull-Request: #3457 Co-authored-by: Cursor <[email protected]>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Replace multiprocessing.Event (futex-based syscalls) with
multiprocessing.RawArray shared-memory byte flags for worker-to-parent
completion signaling on the hot path (step_and_maybe_reset).
Measured impact:
Co-authored-by: Cursor [email protected]