[Feature] AsyncBatchedCollector: coordinator loop and direct submission mode#3499
Merged
vmoens merged 7 commits intogh/vmoens/241/basefrom Feb 21, 2026
Merged
[Feature] AsyncBatchedCollector: coordinator loop and direct submission mode#3499vmoens merged 7 commits intogh/vmoens/241/basefrom
vmoens merged 7 commits intogh/vmoens/241/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3499
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New FailuresAs of commit 7cd1f9a with merge base 266e4aa ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
vmoens
added a commit
that referenced
this pull request
Feb 12, 2026
…on mode Rewrite the AsyncBatchedCollector to use a coordinator thread that pipelines env stepping and batched inference without a global sync barrier. Add a `direct=True` mode where each env thread submits directly to the InferenceServer, eliminating the coordinator thread and its serialization overhead. Benchmark results (8 mock pixel envs, Nature-CNN, CPU): AsyncBatchedCollector direct: 3183 fps (+72% vs coordinator) AsyncBatchedCollector threading: 1850 fps (coordinator mode) AsyncBatchedCollector mp: 1042 fps (coordinator mode) Co-authored-by: Cursor <cursoragent@cursor.com> ghstack-source-id: 225d2a4 Pull-Request: #3499
This was referenced Feb 11, 2026
This was referenced Feb 12, 2026
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 78.1155μs | 77.5567μs | 12.8938 KOps/s | 12.7413 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1344ms | 0.1340ms | 7.4625 KOps/s | 7.3419 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1023s | 0.1018s | 9.8186 Ops/s | 9.9132 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.5062μs | 2.4907μs | 401.4858 KOps/s | 406.9183 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 38.6334μs | 36.9422μs | 27.0693 KOps/s | 25.6151 KOps/s | |
| test_simple | 0.5374s | 0.5305s | 1.8851 Ops/s | 1.8093 Ops/s | |
| test_transformed | 1.0549s | 1.0502s | 0.9522 Ops/s | 0.9316 Ops/s | |
| test_serial | 1.6111s | 1.6069s | 0.6223 Ops/s | 0.6134 Ops/s | |
| test_parallel | 0.9859s | 0.9824s | 1.0179 Ops/s | 0.9874 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.2581ms | 40.8456μs | 24.4824 KOps/s | 23.8212 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 0.4478ms | 23.2164μs | 43.0730 KOps/s | 43.5799 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 0.4439ms | 22.8189μs | 43.8233 KOps/s | 43.0113 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 41.5820μs | 12.5214μs | 79.8634 KOps/s | 78.5072 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 0.4582ms | 43.7141μs | 22.8759 KOps/s | 22.8082 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 0.4361ms | 25.2168μs | 39.6561 KOps/s | 39.1957 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 0.4449ms | 25.5651μs | 39.1158 KOps/s | 38.9387 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 44.1620μs | 15.1310μs | 66.0895 KOps/s | 65.0148 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 0.4628ms | 47.3479μs | 21.1203 KOps/s | 21.0736 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 0.4356ms | 27.7887μs | 35.9859 KOps/s | 35.7661 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 59.1920μs | 26.0264μs | 38.4225 KOps/s | 38.4337 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 0.4298ms | 15.1573μs | 65.9746 KOps/s | 64.9223 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 0.4764ms | 49.5713μs | 20.1730 KOps/s | 20.3421 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 0.4492ms | 30.7643μs | 32.5053 KOps/s | 33.0550 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 64.1920μs | 28.4180μs | 35.1890 KOps/s | 35.1667 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 0.4489ms | 17.8986μs | 55.8702 KOps/s | 56.0103 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 0.4630ms | 46.8786μs | 21.3317 KOps/s | 21.5301 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 0.4469ms | 28.2396μs | 35.4113 KOps/s | 35.6774 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.5538ms | 29.2775μs | 34.1560 KOps/s | 33.5061 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 0.4510ms | 16.9882μs | 58.8643 KOps/s | 58.8825 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 0.4787ms | 49.3964μs | 20.2444 KOps/s | 20.5861 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 67.3630μs | 30.6673μs | 32.6080 KOps/s | 33.1130 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 0.4520ms | 31.8016μs | 31.4450 KOps/s | 31.4790 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 0.4389ms | 19.4405μs | 51.4390 KOps/s | 51.5015 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 0.4657ms | 51.2791μs | 19.5011 KOps/s | 19.6221 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 0.4457ms | 33.1295μs | 30.1846 KOps/s | 30.5331 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 67.8230μs | 31.9895μs | 31.2603 KOps/s | 31.2267 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 0.4319ms | 19.3519μs | 51.6745 KOps/s | 51.8123 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 0.4730ms | 53.9023μs | 18.5521 KOps/s | 18.8605 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 0.4613ms | 35.5262μs | 28.1482 KOps/s | 28.4980 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 76.4630μs | 34.1793μs | 29.2575 KOps/s | 29.4910 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 0.4363ms | 21.9395μs | 45.5799 KOps/s | 45.9109 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8177s | 0.7197s | 1.3895 Ops/s | 1.3758 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.6855s | 0.5874s | 1.7025 Ops/s | 1.6825 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.6656s | 1.5772s | 0.6340 Ops/s | 0.6200 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.4471s | 1.3655s | 0.7324 Ops/s | 0.7149 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9016s | 1.8170s | 0.5503 Ops/s | 0.5414 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7005s | 1.6156s | 0.6190 Ops/s | 0.6139 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.5703s | 4.4675s | 0.2238 Ops/s | 0.2205 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.4122s | 4.3684s | 0.2289 Ops/s | 0.2287 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.8755s | 1.7960s | 0.5568 Ops/s | 0.5425 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.6301s | 1.5405s | 0.6491 Ops/s | 0.6421 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 10.3016ms | 9.7274ms | 102.8021 Ops/s | 101.6885 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 19.7089ms | 17.4989ms | 57.1463 Ops/s | 56.9791 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.2342ms | 0.1296ms | 7.7161 KOps/s | 7.7397 KOps/s | |
| test_values[td1_return_estimate-False-False] | 28.6400ms | 26.6397ms | 37.5380 Ops/s | 37.8918 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 17.8761ms | 17.5053ms | 57.1257 Ops/s | 56.8852 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 39.6623ms | 38.9814ms | 25.6532 Ops/s | 24.6362 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 17.8578ms | 17.5059ms | 57.1237 Ops/s | 56.7381 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.6067ms | 8.5458ms | 117.0161 Ops/s | 111.8374 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.7899ms | 1.4990ms | 667.1307 Ops/s | 703.6703 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.4920ms | 0.4085ms | 2.4479 KOps/s | 2.3718 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 34.5905ms | 34.3663ms | 29.0983 Ops/s | 28.9262 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 2.0465ms | 1.7007ms | 587.9847 Ops/s | 587.1227 Ops/s | |
| test_dqn_speed[False-None] | 1.6176ms | 1.3535ms | 738.8024 Ops/s | 738.1115 Ops/s | |
| test_dqn_speed[False-backward] | 1.9316ms | 1.8591ms | 537.8870 Ops/s | 523.1212 Ops/s | |
| test_dqn_speed[True-None] | 0.5648ms | 0.5214ms | 1.9178 KOps/s | 1.8817 KOps/s | |
| test_dqn_speed[True-backward] | 1.0218ms | 0.9700ms | 1.0310 KOps/s | 861.1486 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.8838ms | 0.5201ms | 1.9229 KOps/s | 1.8964 KOps/s | |
| test_ddpg_speed[False-None] | 3.0930ms | 2.7618ms | 362.0800 Ops/s | 362.1365 Ops/s | |
| test_ddpg_speed[False-backward] | 4.3556ms | 3.9489ms | 253.2357 Ops/s | 254.1964 Ops/s | |
| test_ddpg_speed[True-None] | 1.5727ms | 1.3467ms | 742.5791 Ops/s | 735.4795 Ops/s | |
| test_ddpg_speed[True-backward] | 2.3687ms | 2.3197ms | 431.0861 Ops/s | 357.3871 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.7244ms | 1.3420ms | 745.1511 Ops/s | 737.7784 Ops/s | |
| test_sac_speed[False-None] | 8.3662ms | 7.7745ms | 128.6250 Ops/s | 129.9813 Ops/s | |
| test_sac_speed[False-backward] | 11.1295ms | 10.9355ms | 91.4450 Ops/s | 91.7214 Ops/s | |
| test_sac_speed[True-None] | 2.3803ms | 2.0712ms | 482.8116 Ops/s | 485.3676 Ops/s | |
| test_sac_speed[True-backward] | 3.9844ms | 3.8835ms | 257.5013 Ops/s | 250.5610 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 2.3648ms | 2.0510ms | 487.5592 Ops/s | 480.2474 Ops/s | |
| test_redq_speed[False-None] | 14.5650ms | 10.3376ms | 96.7340 Ops/s | 93.6453 Ops/s | |
| test_redq_speed[False-backward] | 18.6387ms | 17.4539ms | 57.2938 Ops/s | 57.8800 Ops/s | |
| test_redq_speed[True-None] | 4.6487ms | 4.1873ms | 238.8157 Ops/s | 243.6965 Ops/s | |
| test_redq_speed[True-backward] | 10.0744ms | 9.4500ms | 105.8200 Ops/s | 105.5043 Ops/s | |
| test_redq_speed[reduce-overhead-None] | 4.4124ms | 4.1404ms | 241.5225 Ops/s | 247.6528 Ops/s | |
| test_redq_deprec_speed[False-None] | 11.2115ms | 10.7788ms | 92.7749 Ops/s | 95.0957 Ops/s | |
| test_redq_deprec_speed[False-backward] | 16.0447ms | 15.4909ms | 64.5542 Ops/s | 65.9920 Ops/s | |
| test_redq_deprec_speed[True-None] | 3.8240ms | 3.5503ms | 281.6648 Ops/s | 272.4641 Ops/s | |
| test_redq_deprec_speed[True-backward] | 7.4958ms | 7.3366ms | 136.3023 Ops/s | 129.9791 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 3.6714ms | 3.4569ms | 289.2790 Ops/s | 285.7621 Ops/s | |
| test_td3_speed[False-None] | 8.7321ms | 7.8842ms | 126.8362 Ops/s | 127.3417 Ops/s | |
| test_td3_speed[False-backward] | 11.9294ms | 10.8136ms | 92.4761 Ops/s | 93.7784 Ops/s | |
| test_td3_speed[True-None] | 1.7734ms | 1.7352ms | 576.2916 Ops/s | 572.0680 Ops/s | |
| test_td3_speed[True-backward] | 3.7126ms | 3.5025ms | 285.5105 Ops/s | 264.3758 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 1.7536ms | 1.7114ms | 584.3129 Ops/s | 585.8395 Ops/s | |
| test_cql_speed[False-None] | 30.1240ms | 25.9847ms | 38.4842 Ops/s | 38.3491 Ops/s | |
| test_cql_speed[False-backward] | 39.2054ms | 34.9661ms | 28.5991 Ops/s | 28.5107 Ops/s | |
| test_cql_speed[True-None] | 15.0734ms | 12.2529ms | 81.6133 Ops/s | 79.8910 Ops/s | |
| test_cql_speed[True-backward] | 18.1692ms | 17.8690ms | 55.9629 Ops/s | 54.8742 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 14.7153ms | 12.3392ms | 81.0425 Ops/s | 82.9582 Ops/s | |
| test_a2c_speed[False-None] | 5.5299ms | 5.2977ms | 188.7608 Ops/s | 188.6835 Ops/s | |
| test_a2c_speed[False-backward] | 11.8083ms | 11.6282ms | 85.9978 Ops/s | 86.1883 Ops/s | |
| test_a2c_speed[True-None] | 4.3226ms | 3.6640ms | 272.9229 Ops/s | 271.9795 Ops/s | |
| test_a2c_speed[True-backward] | 8.6659ms | 8.3524ms | 119.7254 Ops/s | 119.4123 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 3.9774ms | 3.6290ms | 275.5607 Ops/s | 274.8269 Ops/s | |
| test_ppo_speed[False-None] | 6.0422ms | 5.7169ms | 174.9193 Ops/s | 171.4088 Ops/s | |
| test_ppo_speed[False-backward] | 12.6286ms | 12.1875ms | 82.0513 Ops/s | 82.4182 Ops/s | |
| test_ppo_speed[True-None] | 4.0122ms | 3.5410ms | 282.4038 Ops/s | 280.5500 Ops/s | |
| test_ppo_speed[True-backward] | 8.5546ms | 8.1628ms | 122.5064 Ops/s | 120.8308 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 3.8301ms | 3.5300ms | 283.2829 Ops/s | 282.4688 Ops/s | |
| test_reinforce_speed[False-None] | 4.8782ms | 4.4604ms | 224.1959 Ops/s | 224.8047 Ops/s | |
| test_reinforce_speed[False-backward] | 7.5448ms | 7.2533ms | 137.8689 Ops/s | 138.8455 Ops/s | |
| test_reinforce_speed[True-None] | 3.1403ms | 2.7680ms | 361.2697 Ops/s | 357.2876 Ops/s | |
| test_reinforce_speed[True-backward] | 7.6305ms | 7.4481ms | 134.2622 Ops/s | 131.8388 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 3.2291ms | 2.7457ms | 364.2052 Ops/s | 348.1170 Ops/s | |
| test_iql_speed[False-None] | 25.0708ms | 19.7731ms | 50.5738 Ops/s | 50.0290 Ops/s | |
| test_iql_speed[False-backward] | 30.4544ms | 29.6276ms | 33.7523 Ops/s | 33.2741 Ops/s | |
| test_iql_speed[True-None] | 8.4752ms | 8.0769ms | 123.8102 Ops/s | 117.6196 Ops/s | |
| test_iql_speed[True-backward] | 16.4753ms | 15.9724ms | 62.6082 Ops/s | 60.3867 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 8.4184ms | 8.1826ms | 122.2106 Ops/s | 119.7381 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.9802ms | 5.8340ms | 171.4077 Ops/s | 169.8527 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.7111ms | 0.3247ms | 3.0801 KOps/s | 3.0343 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.4951ms | 0.2545ms | 3.9288 KOps/s | 3.6813 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.7795ms | 5.5443ms | 180.3667 Ops/s | 178.5162 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.4542ms | 0.3021ms | 3.3101 KOps/s | 3.3135 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6416ms | 0.2924ms | 3.4202 KOps/s | 3.9152 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6771ms | 1.3692ms | 730.3439 Ops/s | 821.0243 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.6220ms | 1.2723ms | 785.9686 Ops/s | 867.6025 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 12.5545ms | 5.8837ms | 169.9609 Ops/s | 175.8822 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.0005ms | 0.4435ms | 2.2550 KOps/s | 2.3854 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6465ms | 0.4015ms | 2.4908 KOps/s | 2.5172 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.9452ms | 5.5903ms | 178.8799 Ops/s | 178.9095 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.5949ms | 0.3166ms | 3.1586 KOps/s | 3.3289 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6366ms | 0.3583ms | 2.7913 KOps/s | 3.5798 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.7902ms | 5.5777ms | 179.2842 Ops/s | 177.5473 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7171ms | 0.3580ms | 2.7934 KOps/s | 3.1753 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 9.3546ms | 0.3511ms | 2.8484 KOps/s | 3.1842 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 5.8228ms | 5.7543ms | 173.7841 Ops/s | 172.8912 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.9905ms | 0.5100ms | 1.9608 KOps/s | 2.1819 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7127ms | 0.4912ms | 2.0357 KOps/s | 2.1768 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.2816ms | 4.8652ms | 205.5414 Ops/s | 202.2927 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 9.6153ms | 2.1702ms | 460.7893 Ops/s | 460.3537 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 3.2008ms | 0.8813ms | 1.1347 KOps/s | 1.1150 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.5422s | 15.7844ms | 63.3536 Ops/s | 60.1070 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 3.8147ms | 1.7388ms | 575.1182 Ops/s | 524.6732 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 7.4080ms | 1.1674ms | 856.5797 Ops/s | 857.1443 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 6.5545ms | 5.1185ms | 195.3712 Ops/s | 192.0085 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 12.7026ms | 2.0156ms | 496.1344 Ops/s | 493.5029 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 1.2912ms | 1.0188ms | 981.5234 Ops/s | 992.2286 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 37.0078ms | 35.0213ms | 28.5540 Ops/s | 28.2650 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.5622ms | 17.7677ms | 56.2819 Ops/s | 56.1805 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 38.7093ms | 36.0754ms | 27.7197 Ops/s | 27.3741 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 19.3189ms | 17.8626ms | 55.9828 Ops/s | 55.2290 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 39.4385ms | 37.9110ms | 26.3775 Ops/s | 25.5299 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.7363ms | 19.7309ms | 50.6819 Ops/s | 50.3148 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8985ms | 0.2150ms | 4.6513 KOps/s | 4.5106 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.5659ms | 1.3911ms | 718.8581 Ops/s | 714.5053 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.7089ms | 2.2875ms | 437.1520 Ops/s | 415.5390 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.1361ms | 2.9079ms | 343.8850 Ops/s | 340.5614 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2095ms | 0.1344ms | 7.4409 KOps/s | 7.4610 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3315ms | 0.1907ms | 5.2427 KOps/s | 5.1841 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.9269ms | 1.7710ms | 564.6612 Ops/s | 574.3546 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.4331ms | 1.2756ms | 783.9314 Ops/s | 764.8303 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.2156ms | 1.0876ms | 919.4575 Ops/s | 919.9018 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 7.2812ms | 3.4681ms | 288.3435 Ops/s | 284.4180 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 10.9369ms | 5.6652ms | 176.5151 Ops/s | 178.8494 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.3144ms | 6.8304ms | 146.4034 Ops/s | 146.5443 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4279ms | 0.2706ms | 3.6960 KOps/s | 3.7176 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.7002ms | 1.5163ms | 659.5193 Ops/s | 664.5383 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.7915ms | 2.4099ms | 414.9593 Ops/s | 396.6624 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.2289ms | 3.1000ms | 322.5857 Ops/s | 320.1489 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 32.8773ms | 31.8503ms | 31.3968 Ops/s | 30.8932 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 64.6618ms | 63.6129ms | 15.7201 Ops/s | 15.8866 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 37.5098ms | 36.3318ms | 27.5241 Ops/s | 27.6394 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 93.3708ms | 72.1759ms | 13.8550 Ops/s | 14.0710 Ops/s |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 83.7445μs | 82.1958μs | 12.1661 KOps/s | 12.4615 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1464ms | 0.1453ms | 6.8804 KOps/s | 7.1089 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1162s | 0.1160s | 8.6211 Ops/s | 8.9317 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.5754μs | 2.5612μs | 390.4389 KOps/s | 364.4169 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 40.6848μs | 40.3940μs | 24.7562 KOps/s | 26.8071 KOps/s | |
| test_simple | 0.8024s | 0.7900s | 1.2659 Ops/s | 1.2158 Ops/s | |
| test_transformed | 1.4009s | 1.3850s | 0.7220 Ops/s | 0.7045 Ops/s | |
| test_serial | 2.3123s | 2.3015s | 0.4345 Ops/s | 0.4256 Ops/s | |
| test_parallel | 1.9125s | 1.8394s | 0.5437 Ops/s | 0.5515 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.4738ms | 42.7527μs | 23.3903 KOps/s | 23.7642 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 47.5810μs | 23.4606μs | 42.6246 KOps/s | 41.5905 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 0.4466ms | 23.4248μs | 42.6898 KOps/s | 41.8668 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 0.4474ms | 13.0056μs | 76.8901 KOps/s | 75.6332 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 70.4810μs | 45.0852μs | 22.1802 KOps/s | 21.9029 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 0.4505ms | 25.9696μs | 38.5065 KOps/s | 37.9471 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 0.4735ms | 26.2203μs | 38.1384 KOps/s | 37.6935 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 39.1110μs | 15.8850μs | 62.9526 KOps/s | 62.7788 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 0.4732ms | 47.6611μs | 20.9815 KOps/s | 20.8594 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 0.4486ms | 29.3931μs | 34.0216 KOps/s | 34.0316 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 0.4452ms | 26.6950μs | 37.4602 KOps/s | 37.2311 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 38.5110μs | 15.8386μs | 63.1371 KOps/s | 62.8156 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 0.4782ms | 50.7500μs | 19.7044 KOps/s | 19.5954 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 0.4504ms | 31.7666μs | 31.4796 KOps/s | 31.6489 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 0.4517ms | 29.2495μs | 34.1886 KOps/s | 34.4846 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 54.0910μs | 18.5194μs | 53.9975 KOps/s | 53.8423 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 0.4620ms | 48.2062μs | 20.7442 KOps/s | 20.6581 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 0.4465ms | 29.3697μs | 34.0487 KOps/s | 34.4107 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.4987ms | 30.4695μs | 32.8197 KOps/s | 33.1710 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 48.5610μs | 17.6964μs | 56.5086 KOps/s | 57.0479 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 0.4725ms | 50.6177μs | 19.7559 KOps/s | 19.6057 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 0.4497ms | 31.6960μs | 31.5497 KOps/s | 31.3628 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 0.4503ms | 32.2575μs | 31.0005 KOps/s | 30.6966 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 79.7810μs | 20.2208μs | 49.4540 KOps/s | 49.4621 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 0.4720ms | 53.1469μs | 18.8158 KOps/s | 18.6794 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 0.4486ms | 34.6921μs | 28.8250 KOps/s | 29.8381 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 0.4503ms | 32.6992μs | 30.5818 KOps/s | 30.1242 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 41.8810μs | 20.0713μs | 49.8224 KOps/s | 49.3617 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 0.4755ms | 54.4680μs | 18.3594 KOps/s | 18.1921 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 0.4477ms | 36.6029μs | 27.3203 KOps/s | 27.5305 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 0.4515ms | 34.3341μs | 29.1255 KOps/s | 28.7724 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 52.0410μs | 22.6347μs | 44.1800 KOps/s | 43.5621 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.7171s | 0.7136s | 1.4014 Ops/s | 1.3248 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7023s | 0.6054s | 1.6518 Ops/s | 1.6131 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7134s | 1.6381s | 0.6105 Ops/s | 0.6033 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.4892s | 1.4114s | 0.7085 Ops/s | 0.6981 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9556s | 1.8771s | 0.5327 Ops/s | 0.5240 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7373s | 1.6590s | 0.6028 Ops/s | 0.5944 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.7132s | 4.6468s | 0.2152 Ops/s | 0.2147 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.7018s | 4.4525s | 0.2246 Ops/s | 0.2234 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9732s | 1.8933s | 0.5282 Ops/s | 0.5208 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.7078s | 1.6052s | 0.6230 Ops/s | 0.6214 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 22.4518ms | 21.9486ms | 45.5609 Ops/s | 46.9300 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 0.1324s | 3.5685ms | 280.2307 Ops/s | 258.9641 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.1155ms | 84.6289μs | 11.8163 KOps/s | 11.7560 KOps/s | |
| test_values[td1_return_estimate-False-False] | 53.4028ms | 52.1658ms | 19.1696 Ops/s | 19.9124 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 1.3559ms | 1.0988ms | 910.0591 Ops/s | 903.3574 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 86.4269ms | 80.1401ms | 12.4781 Ops/s | 12.1033 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 1.3093ms | 1.0780ms | 927.6393 Ops/s | 907.0933 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 20.7733ms | 20.4725ms | 48.8461 Ops/s | 46.6614 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0565ms | 0.7680ms | 1.3021 KOps/s | 1.2836 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7215ms | 0.6741ms | 1.4836 KOps/s | 1.4426 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5402ms | 1.4926ms | 669.9659 Ops/s | 661.7279 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.7324ms | 0.6923ms | 1.4444 KOps/s | 1.3816 KOps/s | |
| test_dqn_speed[False-None] | 1.6383ms | 1.5392ms | 649.7003 Ops/s | 640.3019 Ops/s | |
| test_dqn_speed[False-backward] | 2.2396ms | 2.1738ms | 460.0342 Ops/s | 451.1104 Ops/s | |
| test_dqn_speed[True-None] | 0.8625ms | 0.5534ms | 1.8072 KOps/s | 1.7639 KOps/s | |
| test_dqn_speed[True-backward] | 1.2290ms | 1.2001ms | 833.2355 Ops/s | 817.1911 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6225ms | 0.5744ms | 1.7409 KOps/s | 1.6799 KOps/s | |
| test_ddpg_speed[False-None] | 3.2747ms | 2.8850ms | 346.6260 Ops/s | 343.2619 Ops/s | |
| test_ddpg_speed[False-backward] | 4.6998ms | 4.2831ms | 233.4764 Ops/s | 231.3746 Ops/s | |
| test_ddpg_speed[True-None] | 1.3841ms | 1.3030ms | 767.4717 Ops/s | 756.5933 Ops/s | |
| test_ddpg_speed[True-backward] | 2.5427ms | 2.4971ms | 400.4602 Ops/s | 389.8693 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.4215ms | 1.3342ms | 749.5398 Ops/s | 741.0665 Ops/s | |
| test_sac_speed[False-None] | 9.0175ms | 8.3756ms | 119.3939 Ops/s | 117.6073 Ops/s | |
| test_sac_speed[False-backward] | 12.7724ms | 11.6970ms | 85.4918 Ops/s | 84.6934 Ops/s | |
| test_sac_speed[True-None] | 1.8849ms | 1.7863ms | 559.8267 Ops/s | 547.5621 Ops/s | |
| test_sac_speed[True-backward] | 3.6020ms | 3.5428ms | 282.2662 Ops/s | 275.1622 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 19.5920ms | 11.0757ms | 90.2874 Ops/s | 82.8869 Ops/s | |
| test_redq_deprec_speed[False-None] | 10.0113ms | 9.3259ms | 107.2287 Ops/s | 105.6522 Ops/s | |
| test_redq_deprec_speed[False-backward] | 13.1821ms | 12.7946ms | 78.1580 Ops/s | 77.5644 Ops/s | |
| test_redq_deprec_speed[True-None] | 2.7313ms | 2.5007ms | 399.8870 Ops/s | 388.3130 Ops/s | |
| test_redq_deprec_speed[True-backward] | 4.3515ms | 4.2497ms | 235.3115 Ops/s | 228.1679 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 16.3049ms | 9.9188ms | 100.8184 Ops/s | 101.0334 Ops/s | |
| test_td3_speed[False-None] | 8.3562ms | 8.2088ms | 121.8212 Ops/s | 120.4402 Ops/s | |
| test_td3_speed[False-backward] | 11.2911ms | 10.8449ms | 92.2094 Ops/s | 91.2542 Ops/s | |
| test_td3_speed[True-None] | 1.7134ms | 1.6590ms | 602.7742 Ops/s | 615.0788 Ops/s | |
| test_td3_speed[True-backward] | 3.2528ms | 3.1971ms | 312.7852 Ops/s | 302.8144 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 47.9191ms | 24.5874ms | 40.6713 Ops/s | 40.1845 Ops/s | |
| test_cql_speed[False-None] | 17.8126ms | 17.3082ms | 57.7762 Ops/s | 56.9336 Ops/s | |
| test_cql_speed[False-backward] | 23.4146ms | 22.9417ms | 43.5887 Ops/s | 42.9751 Ops/s | |
| test_cql_speed[True-None] | 3.3736ms | 3.2022ms | 312.2806 Ops/s | 307.4202 Ops/s | |
| test_cql_speed[True-backward] | 6.0816ms | 5.4807ms | 182.4597 Ops/s | 179.3940 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 18.6150ms | 11.5423ms | 86.6381 Ops/s | 83.5218 Ops/s | |
| test_a2c_speed[False-None] | 3.4158ms | 3.2463ms | 308.0463 Ops/s | 302.1774 Ops/s | |
| test_a2c_speed[False-backward] | 6.8582ms | 6.4498ms | 155.0446 Ops/s | 152.3943 Ops/s | |
| test_a2c_speed[True-None] | 1.4204ms | 1.3184ms | 758.5070 Ops/s | 755.4022 Ops/s | |
| test_a2c_speed[True-backward] | 3.1791ms | 3.0826ms | 324.4028 Ops/s | 336.5985 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 1.0317ms | 0.9689ms | 1.0321 KOps/s | 1.0362 KOps/s | |
| test_ppo_speed[False-None] | 4.0440ms | 3.8649ms | 258.7370 Ops/s | 254.0896 Ops/s | |
| test_ppo_speed[False-backward] | 7.7965ms | 7.2486ms | 137.9574 Ops/s | 138.3706 Ops/s | |
| test_ppo_speed[True-None] | 1.4672ms | 1.4124ms | 708.0253 Ops/s | 707.7801 Ops/s | |
| test_ppo_speed[True-backward] | 3.2691ms | 3.2242ms | 310.1587 Ops/s | 302.5475 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 1.1306ms | 1.0452ms | 956.7391 Ops/s | 937.7425 Ops/s | |
| test_reinforce_speed[False-None] | 2.3952ms | 2.2936ms | 435.9864 Ops/s | 428.9260 Ops/s | |
| test_reinforce_speed[False-backward] | 3.5554ms | 3.4428ms | 290.4577 Ops/s | 285.3651 Ops/s | |
| test_reinforce_speed[True-None] | 1.3218ms | 1.2629ms | 791.8421 Ops/s | 789.6926 Ops/s | |
| test_reinforce_speed[True-backward] | 3.0923ms | 3.0298ms | 330.0579 Ops/s | 324.0054 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 17.4619ms | 9.5924ms | 104.2493 Ops/s | 105.0339 Ops/s | |
| test_iql_speed[False-None] | 10.0297ms | 9.4011ms | 106.3700 Ops/s | 104.0996 Ops/s | |
| test_iql_speed[False-backward] | 13.8378ms | 13.4444ms | 74.3806 Ops/s | 73.1800 Ops/s | |
| test_iql_speed[True-None] | 2.2319ms | 2.1366ms | 468.0327 Ops/s | 460.1777 Ops/s | |
| test_iql_speed[True-backward] | 5.2928ms | 4.8152ms | 207.6756 Ops/s | 205.4794 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 18.2853ms | 10.5794ms | 94.5237 Ops/s | 94.9783 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.3767ms | 5.9856ms | 167.0680 Ops/s | 165.0444 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.9293ms | 0.3454ms | 2.8951 KOps/s | 3.4881 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5842ms | 0.3493ms | 2.8629 KOps/s | 2.6595 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.9834ms | 5.7711ms | 173.2773 Ops/s | 169.0471 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 2.6833ms | 0.3636ms | 2.7504 KOps/s | 3.0647 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5559ms | 0.3427ms | 2.9183 KOps/s | 3.5646 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.7128ms | 1.4388ms | 695.0211 Ops/s | 710.4399 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.5499ms | 1.3558ms | 737.5958 Ops/s | 741.4830 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.1874ms | 6.0634ms | 164.9240 Ops/s | 164.0442 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.7188ms | 0.4674ms | 2.1396 KOps/s | 2.1865 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8005ms | 0.4467ms | 2.2388 KOps/s | 1.9831 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.9757ms | 5.8867ms | 169.8739 Ops/s | 168.3239 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.8550ms | 0.3234ms | 3.0918 KOps/s | 2.8141 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7121ms | 0.3168ms | 3.1569 KOps/s | 2.9328 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1685ms | 5.8222ms | 171.7550 Ops/s | 169.3093 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.5924ms | 0.2820ms | 3.5463 KOps/s | 3.1188 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5415ms | 0.3427ms | 2.9182 KOps/s | 3.4584 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.1823ms | 6.0363ms | 165.6653 Ops/s | 164.8446 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8179ms | 0.4648ms | 2.1514 KOps/s | 1.9169 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7939ms | 0.4591ms | 2.1779 KOps/s | 2.1799 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.5876s | 16.7588ms | 59.6703 Ops/s | 51.6828 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 11.5998ms | 2.0883ms | 478.8482 Ops/s | 533.0775 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 7.2839ms | 1.2858ms | 777.6997 Ops/s | 1.0281 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 6.5718ms | 5.0663ms | 197.3824 Ops/s | 194.0368 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 13.2261ms | 2.0781ms | 481.2059 Ops/s | 488.6475 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 2.9819ms | 1.2227ms | 817.8717 Ops/s | 1.0066 KOps/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.5334s | 15.9182ms | 62.8212 Ops/s | 185.4838 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 4.3350ms | 2.1301ms | 469.4646 Ops/s | 511.6881 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 12.0203ms | 1.5718ms | 636.1970 Ops/s | 816.4758 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 38.0802ms | 35.9684ms | 27.8022 Ops/s | 26.8908 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.8673ms | 18.4138ms | 54.3072 Ops/s | 52.6068 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 40.3254ms | 37.5190ms | 26.6532 Ops/s | 26.4449 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 20.5473ms | 18.6411ms | 53.6449 Ops/s | 52.7065 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 48.4334ms | 39.7921ms | 25.1306 Ops/s | 24.8962 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.5745ms | 20.1939ms | 49.5199 Ops/s | 48.3164 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8834ms | 0.2254ms | 4.4361 KOps/s | 4.3112 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.6962ms | 1.4343ms | 697.2209 Ops/s | 685.4818 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.5838ms | 2.3706ms | 421.8290 Ops/s | 433.0922 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.2127ms | 2.9811ms | 335.4433 Ops/s | 333.8206 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2663ms | 0.1667ms | 5.9983 KOps/s | 5.8212 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3576ms | 0.2306ms | 4.3374 KOps/s | 4.0581 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 2.2652ms | 1.8695ms | 534.8937 Ops/s | 561.4589 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.6364ms | 1.4039ms | 712.2965 Ops/s | 708.9309 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.3249ms | 1.1621ms | 860.4784 Ops/s | 852.1998 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.9038ms | 3.6513ms | 273.8715 Ops/s | 268.7875 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 10.7455ms | 5.9690ms | 167.5328 Ops/s | 170.6738 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.6356ms | 7.4494ms | 134.2386 Ops/s | 139.0672 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4338ms | 0.2743ms | 3.6458 KOps/s | 3.5904 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.7576ms | 1.5450ms | 647.2667 Ops/s | 642.8983 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.8958ms | 2.4741ms | 404.1807 Ops/s | 410.0503 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.4939ms | 3.1753ms | 314.9268 Ops/s | 315.0125 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 33.7506ms | 33.2481ms | 30.0769 Ops/s | 29.3722 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 67.2205ms | 65.3301ms | 15.3069 Ops/s | 14.9779 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 38.2399ms | 37.2759ms | 26.8270 Ops/s | 26.1363 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 73.3180ms | 72.8512ms | 13.7266 Ops/s | 13.2485 Ops/s | |
| test_collector_without_rb_cuda[100-img_shape0-atari] | 56.1165ms | 55.1131ms | 18.1445 Ops/s | 17.6635 Ops/s | |
| test_collector_without_rb_cuda[200-img_shape1-large_batch] | 0.1099s | 0.1095s | 9.1296 Ops/s | 8.7967 Ops/s | |
| test_collector_with_rb_cuda[100-img_shape0-atari] | 57.3732ms | 56.9226ms | 17.5677 Ops/s | 16.9302 Ops/s | |
| test_collector_with_rb_cuda[200-img_shape1-large_batch] | 0.7793s | 0.1877s | 5.3280 Ops/s | 8.5050 Ops/s |
vmoens
added a commit
that referenced
this pull request
Feb 12, 2026
…on mode Rewrite the AsyncBatchedCollector to use a coordinator thread that pipelines env stepping and batched inference without a global sync barrier. Add a `direct=True` mode where each env thread submits directly to the InferenceServer, eliminating the coordinator thread and its serialization overhead. Benchmark results (8 mock pixel envs, Nature-CNN, CPU): AsyncBatchedCollector direct: 3183 fps (+72% vs coordinator) AsyncBatchedCollector threading: 1850 fps (coordinator mode) AsyncBatchedCollector mp: 1042 fps (coordinator mode) Co-authored-by: Cursor <cursoragent@cursor.com> ghstack-source-id: c4d370a Pull-Request: #3499
vmoens
added a commit
that referenced
this pull request
Feb 13, 2026
…on mode Rewrite the AsyncBatchedCollector to use a coordinator thread that pipelines env stepping and batched inference without a global sync barrier. Add a `direct=True` mode where each env thread submits directly to the InferenceServer, eliminating the coordinator thread and its serialization overhead. Benchmark results (8 mock pixel envs, Nature-CNN, CPU): AsyncBatchedCollector direct: 3183 fps (+72% vs coordinator) AsyncBatchedCollector threading: 1850 fps (coordinator mode) AsyncBatchedCollector mp: 1042 fps (coordinator mode) Co-authored-by: Cursor <cursoragent@cursor.com> ghstack-source-id: 784abfc Pull-Request: #3499 Co-authored-by: Cursor <cursoragent@cursor.com>
vmoens
added a commit
that referenced
this pull request
Feb 14, 2026
…on mode Rewrite the AsyncBatchedCollector to use a coordinator thread that pipelines env stepping and batched inference without a global sync barrier. Add a `direct=True` mode where each env thread submits directly to the InferenceServer, eliminating the coordinator thread and its serialization overhead. Benchmark results (8 mock pixel envs, Nature-CNN, CPU): AsyncBatchedCollector direct: 3183 fps (+72% vs coordinator) AsyncBatchedCollector threading: 1850 fps (coordinator mode) AsyncBatchedCollector mp: 1042 fps (coordinator mode) Co-authored-by: Cursor <cursoragent@cursor.com> ghstack-source-id: 270d3e3 Pull-Request: #3499 Co-authored-by: Cursor <cursoragent@cursor.com>
vmoens
added a commit
that referenced
this pull request
Feb 21, 2026
…on mode (#3499) Rewrite the AsyncBatchedCollector to use a coordinator thread that pipelines env stepping and batched inference without a global sync barrier. Add a `direct=True` mode where each env thread submits directly to the InferenceServer, eliminating the coordinator thread and its serialization overhead. Benchmark results (8 mock pixel envs, Nature-CNN, CPU): AsyncBatchedCollector direct: 3183 fps (+72% vs coordinator) AsyncBatchedCollector threading: 1850 fps (coordinator mode) AsyncBatchedCollector mp: 1042 fps (coordinator mode) Co-authored-by: Cursor <cursoragent@cursor.com> ghstack-source-id: 861a24d Pull-Request: #3499 Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Rewrite the AsyncBatchedCollector to use a coordinator thread that
pipelines env stepping and batched inference without a global sync
barrier. Add a
direct=Truemode where each env thread submitsdirectly to the InferenceServer, eliminating the coordinator thread
and its serialization overhead.
Benchmark results (8 mock pixel envs, Nature-CNN, CPU):
AsyncBatchedCollector direct: 3183 fps (+72% vs coordinator)
AsyncBatchedCollector threading: 1850 fps (coordinator mode)
AsyncBatchedCollector mp: 1042 fps (coordinator mode)
Co-authored-by: Cursor cursoragent@cursor.com