[Perf] ParallelEnv: fast-path device transfer in step_and_maybe_reset#3458
Merged
vmoens merged 2 commits intogh/vmoens/219/basefrom Feb 7, 2026
Merged
[Perf] ParallelEnv: fast-path device transfer in step_and_maybe_reset#3458vmoens merged 2 commits intogh/vmoens/219/basefrom
vmoens merged 2 commits intogh/vmoens/219/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3458
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit fbd36ac with merge base ab49b59 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This was referenced Feb 6, 2026
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 79.6692μs | 77.9475μs | 12.8291 KOps/s | 12.4608 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1350ms | 0.1338ms | 7.4716 KOps/s | 7.4192 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1034s | 0.1029s | 9.7151 Ops/s | 9.4630 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.5026μs | 2.4950μs | 400.7975 KOps/s | 411.2053 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 36.6250μs | 36.4308μs | 27.4493 KOps/s | 28.0176 KOps/s | |
| test_simple | 0.5349s | 0.5336s | 1.8740 Ops/s | 1.7884 Ops/s | |
| test_transformed | 1.2342s | 1.1398s | 0.8774 Ops/s | 0.8867 Ops/s | |
| test_serial | 1.6457s | 1.6359s | 0.6113 Ops/s | 0.6045 Ops/s | |
| test_parallel | 1.1388s | 1.0326s | 0.9685 Ops/s | 0.9595 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.1444ms | 42.4077μs | 23.5806 KOps/s | 22.8043 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 59.2430μs | 24.2949μs | 41.1609 KOps/s | 40.1519 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 49.2520μs | 24.1839μs | 41.3497 KOps/s | 41.0573 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 45.2520μs | 13.4230μs | 74.4989 KOps/s | 73.8747 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 89.3550μs | 45.8145μs | 21.8272 KOps/s | 21.6723 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 53.9930μs | 26.3342μs | 37.9735 KOps/s | 36.6220 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 65.5840μs | 26.3149μs | 38.0013 KOps/s | 37.4400 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 50.3520μs | 16.1666μs | 61.8560 KOps/s | 61.3442 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 88.1940μs | 49.7113μs | 20.1162 KOps/s | 20.4208 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 58.2630μs | 29.5847μs | 33.8012 KOps/s | 33.0411 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 67.4540μs | 27.3006μs | 36.6292 KOps/s | 37.4326 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 39.5920μs | 16.1264μs | 62.0102 KOps/s | 61.2831 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 88.4840μs | 51.3285μs | 19.4824 KOps/s | 19.3171 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 64.0930μs | 31.2260μs | 32.0246 KOps/s | 30.3707 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 63.2040μs | 29.3154μs | 34.1117 KOps/s | 33.8551 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 55.0920μs | 18.9501μs | 52.7703 KOps/s | 52.6343 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 94.5140μs | 49.6910μs | 20.1244 KOps/s | 20.4505 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 70.3740μs | 29.0420μs | 34.4329 KOps/s | 33.1163 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.4272ms | 31.3209μs | 31.9276 KOps/s | 31.9722 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 41.4720μs | 18.1100μs | 55.2181 KOps/s | 55.6443 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 93.7440μs | 51.5562μs | 19.3963 KOps/s | 19.1971 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 58.8330μs | 32.3998μs | 30.8644 KOps/s | 29.9179 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 69.8440μs | 33.3225μs | 30.0097 KOps/s | 30.1544 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 49.6820μs | 20.1732μs | 49.5707 KOps/s | 47.8866 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 90.3650μs | 54.8598μs | 18.2283 KOps/s | 18.1527 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 69.8930μs | 34.9549μs | 28.6083 KOps/s | 27.7935 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 80.5530μs | 33.7250μs | 29.6516 KOps/s | 29.6125 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 52.3830μs | 20.8657μs | 47.9255 KOps/s | 48.0495 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 0.1048ms | 56.5161μs | 17.6941 KOps/s | 17.6051 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 69.8330μs | 37.3332μs | 26.7858 KOps/s | 26.2823 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 67.1730μs | 35.3855μs | 28.2601 KOps/s | 27.7153 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 58.7830μs | 22.8091μs | 43.8421 KOps/s | 43.7439 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8559s | 0.7536s | 1.3270 Ops/s | 1.3269 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7187s | 0.6208s | 1.6109 Ops/s | 1.6135 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7250s | 1.6475s | 0.6070 Ops/s | 0.6065 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5042s | 1.4254s | 0.7015 Ops/s | 0.7016 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9705s | 1.8883s | 0.5296 Ops/s | 0.5267 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7456s | 1.6693s | 0.5990 Ops/s | 0.5943 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.6265s | 4.5720s | 0.2187 Ops/s | 0.2214 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.4134s | 4.3793s | 0.2283 Ops/s | 0.2275 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9515s | 1.8531s | 0.5396 Ops/s | 0.5115 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.6841s | 1.5989s | 0.6254 Ops/s | 0.6311 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 9.7836ms | 9.6252ms | 103.8937 Ops/s | 102.7139 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 19.9723ms | 17.4518ms | 57.3008 Ops/s | 55.9789 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.2308ms | 0.1252ms | 7.9844 KOps/s | 7.9000 KOps/s | |
| test_values[td1_return_estimate-False-False] | 26.4377ms | 25.9841ms | 38.4850 Ops/s | 38.1172 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 18.5143ms | 17.5894ms | 56.8523 Ops/s | 55.7263 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 38.8565ms | 38.4967ms | 25.9762 Ops/s | 25.5848 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 18.7263ms | 17.5847ms | 56.8676 Ops/s | 56.2933 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.7018ms | 8.5146ms | 117.4458 Ops/s | 117.0619 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.9149ms | 1.4357ms | 696.5108 Ops/s | 643.0232 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.5331ms | 0.4123ms | 2.4257 KOps/s | 2.3745 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 38.0940ms | 34.2844ms | 29.1678 Ops/s | 31.9422 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 2.1665ms | 1.7219ms | 580.7655 Ops/s | 586.4523 Ops/s | |
| test_dqn_speed[False-None] | 1.5047ms | 1.3580ms | 736.3681 Ops/s | 719.6174 Ops/s | |
| test_dqn_speed[False-backward] | 1.9221ms | 1.8712ms | 534.4129 Ops/s | 520.4364 Ops/s | |
| test_dqn_speed[True-None] | 1.0711ms | 0.5367ms | 1.8633 KOps/s | 1.7590 KOps/s | |
| test_dqn_speed[True-backward] | 1.0326ms | 0.9880ms | 1.0121 KOps/s | 908.8394 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6315ms | 0.5252ms | 1.9040 KOps/s | 1.8520 KOps/s | |
| test_ddpg_speed[False-None] | 3.1630ms | 2.7769ms | 360.1096 Ops/s | 366.8750 Ops/s | |
| test_ddpg_speed[False-backward] | 4.1396ms | 4.0007ms | 249.9570 Ops/s | 251.1995 Ops/s | |
| test_ddpg_speed[True-None] | 1.5530ms | 1.3850ms | 722.0227 Ops/s | 704.3901 Ops/s | |
| test_ddpg_speed[True-backward] | 2.4030ms | 2.3588ms | 423.9515 Ops/s | 421.3855 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.6103ms | 1.3788ms | 725.2847 Ops/s | 722.2930 Ops/s | |
| test_sac_speed[False-None] | 8.4472ms | 7.8887ms | 126.7636 Ops/s | 126.3622 Ops/s | |
| test_sac_speed[False-backward] | 11.6807ms | 11.1550ms | 89.6457 Ops/s | 89.7176 Ops/s | |
| test_sac_speed[True-None] | 2.3109ms | 2.1398ms | 467.3243 Ops/s | 452.8961 Ops/s | |
| test_sac_speed[True-backward] | 4.1259ms | 4.0056ms | 249.6491 Ops/s | 247.7975 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 2.5114ms | 2.1274ms | 470.0502 Ops/s | 451.0358 Ops/s | |
| test_redq_speed[False-None] | 10.9571ms | 10.3404ms | 96.7081 Ops/s | 94.0680 Ops/s | |
| test_redq_speed[False-backward] | 21.1370ms | 17.8732ms | 55.9498 Ops/s | 58.1914 Ops/s | |
| test_redq_speed[True-None] | 4.7080ms | 4.3575ms | 229.4869 Ops/s | 233.4952 Ops/s | |
| test_redq_speed[True-backward] | 10.0526ms | 9.7874ms | 102.1724 Ops/s | 106.3557 Ops/s | |
| test_redq_speed[reduce-overhead-None] | 4.8358ms | 4.4267ms | 225.9009 Ops/s | 229.2342 Ops/s | |
| test_redq_deprec_speed[False-None] | 11.4406ms | 10.9557ms | 91.2765 Ops/s | 93.8165 Ops/s | |
| test_redq_deprec_speed[False-backward] | 16.3431ms | 15.8397ms | 63.1326 Ops/s | 65.3660 Ops/s | |
| test_redq_deprec_speed[True-None] | 3.9539ms | 3.6685ms | 272.5886 Ops/s | 276.8422 Ops/s | |
| test_redq_deprec_speed[True-backward] | 7.7721ms | 7.5325ms | 132.7576 Ops/s | 132.2917 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 4.0160ms | 3.6286ms | 275.5846 Ops/s | 280.3592 Ops/s | |
| test_td3_speed[False-None] | 8.4857ms | 8.1268ms | 123.0490 Ops/s | 127.9793 Ops/s | |
| test_td3_speed[False-backward] | 11.3370ms | 10.8253ms | 92.3760 Ops/s | 93.8692 Ops/s | |
| test_td3_speed[True-None] | 1.8734ms | 1.8325ms | 545.7151 Ops/s | 553.6461 Ops/s | |
| test_td3_speed[True-backward] | 3.7296ms | 3.6377ms | 274.8996 Ops/s | 251.4885 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 1.8335ms | 1.7875ms | 559.4409 Ops/s | 558.7315 Ops/s | |
| test_cql_speed[False-None] | 28.1171ms | 25.7215ms | 38.8779 Ops/s | 39.5531 Ops/s | |
| test_cql_speed[False-backward] | 37.7470ms | 34.9665ms | 28.5988 Ops/s | 28.9108 Ops/s | |
| test_cql_speed[True-None] | 12.7156ms | 12.3408ms | 81.0320 Ops/s | 80.2824 Ops/s | |
| test_cql_speed[True-backward] | 18.8840ms | 18.3804ms | 54.4058 Ops/s | 56.5348 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 12.6358ms | 12.3643ms | 80.8777 Ops/s | 80.9382 Ops/s | |
| test_a2c_speed[False-None] | 5.6001ms | 5.2970ms | 188.7879 Ops/s | 192.9837 Ops/s | |
| test_a2c_speed[False-backward] | 12.0435ms | 11.7763ms | 84.9163 Ops/s | 85.8763 Ops/s | |
| test_a2c_speed[True-None] | 4.0743ms | 3.6855ms | 271.3360 Ops/s | 282.0821 Ops/s | |
| test_a2c_speed[True-backward] | 8.7649ms | 8.5557ms | 116.8812 Ops/s | 113.0888 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 4.1630ms | 3.7101ms | 269.5370 Ops/s | 270.6534 Ops/s | |
| test_ppo_speed[False-None] | 6.2237ms | 5.9214ms | 168.8788 Ops/s | 174.5965 Ops/s | |
| test_ppo_speed[False-backward] | 12.9143ms | 12.6021ms | 79.3516 Ops/s | 81.9815 Ops/s | |
| test_ppo_speed[True-None] | 4.0235ms | 3.6531ms | 273.7376 Ops/s | 275.6998 Ops/s | |
| test_ppo_speed[True-backward] | 8.7936ms | 8.4534ms | 118.2962 Ops/s | 120.4557 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 3.7893ms | 3.6076ms | 277.1945 Ops/s | 275.2368 Ops/s | |
| test_reinforce_speed[False-None] | 4.9983ms | 4.5216ms | 221.1610 Ops/s | 221.5471 Ops/s | |
| test_reinforce_speed[False-backward] | 7.5683ms | 7.3612ms | 135.8483 Ops/s | 137.1550 Ops/s | |
| test_reinforce_speed[True-None] | 3.2599ms | 2.8657ms | 348.9522 Ops/s | 344.6285 Ops/s | |
| test_reinforce_speed[True-backward] | 7.9996ms | 7.7570ms | 128.9163 Ops/s | 130.3903 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 3.2933ms | 2.8770ms | 347.5852 Ops/s | 344.0248 Ops/s | |
| test_iql_speed[False-None] | 23.3709ms | 19.5683ms | 51.1031 Ops/s | 50.2563 Ops/s | |
| test_iql_speed[False-backward] | 36.4573ms | 30.4454ms | 32.8457 Ops/s | 33.2830 Ops/s | |
| test_iql_speed[True-None] | 9.1074ms | 8.5445ms | 117.0338 Ops/s | 116.0876 Ops/s | |
| test_iql_speed[True-backward] | 17.0279ms | 16.7774ms | 59.6040 Ops/s | 60.6434 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 9.9940ms | 8.5899ms | 116.4154 Ops/s | 113.1260 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.9811ms | 5.8348ms | 171.3849 Ops/s | 170.2374 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.7233ms | 0.3718ms | 2.6893 KOps/s | 3.5998 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5539ms | 0.2986ms | 3.3491 KOps/s | 3.8635 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.9908ms | 5.6236ms | 177.8206 Ops/s | 176.4864 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.8983ms | 0.3267ms | 3.0607 KOps/s | 3.6901 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6354ms | 0.2860ms | 3.4963 KOps/s | 3.9302 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6200ms | 1.3398ms | 746.3673 Ops/s | 820.6186 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.6783ms | 1.2864ms | 777.3415 Ops/s | 878.4665 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 9.4219ms | 5.8556ms | 170.7762 Ops/s | 173.3214 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.0613ms | 0.4677ms | 2.1381 KOps/s | 2.3484 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8191ms | 0.4809ms | 2.0796 KOps/s | 2.4992 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.8217ms | 5.6501ms | 176.9890 Ops/s | 177.1088 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.8505ms | 0.3621ms | 2.7620 KOps/s | 3.3654 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5378ms | 0.3508ms | 2.8503 KOps/s | 3.7923 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.8438ms | 5.5778ms | 179.2806 Ops/s | 177.7050 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.8205ms | 0.3608ms | 2.7713 KOps/s | 3.6426 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5586ms | 0.3450ms | 2.8984 KOps/s | 2.9173 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 5.8418ms | 5.7338ms | 174.4053 Ops/s | 173.3857 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.9416ms | 0.5067ms | 1.9736 KOps/s | 2.2528 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6684ms | 0.4903ms | 2.0397 KOps/s | 2.2289 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.3856ms | 4.9675ms | 201.3099 Ops/s | 59.0645 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 5.2321ms | 2.1518ms | 464.7323 Ops/s | 511.5933 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 2.1538ms | 1.2032ms | 831.0991 Ops/s | 1.1344 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.5531s | 16.1676ms | 61.8520 Ops/s | 196.4381 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 4.2261ms | 1.7955ms | 556.9581 Ops/s | 533.1057 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.0446ms | 0.8517ms | 1.1741 KOps/s | 792.0274 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 9.4670ms | 5.2038ms | 192.1689 Ops/s | 59.6910 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 9.0958ms | 2.0641ms | 484.4639 Ops/s | 528.5379 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 1.2263ms | 0.9956ms | 1.0044 KOps/s | 958.6802 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 37.0664ms | 34.9089ms | 28.6460 Ops/s | 28.5139 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.2390ms | 17.6433ms | 56.6788 Ops/s | 56.4156 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 38.9538ms | 36.4270ms | 27.4522 Ops/s | 27.6354 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 19.6444ms | 18.0549ms | 55.3868 Ops/s | 54.9392 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 39.2744ms | 37.7138ms | 26.5155 Ops/s | 26.4797 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 20.5928ms | 19.3631ms | 51.6445 Ops/s | 51.4096 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8371ms | 0.2146ms | 4.6600 KOps/s | 4.5355 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.7745ms | 1.4174ms | 705.5208 Ops/s | 715.1122 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.7466ms | 2.3238ms | 430.3206 Ops/s | 416.5543 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.0803ms | 2.9234ms | 342.0659 Ops/s | 341.5278 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2078ms | 0.1300ms | 7.6922 KOps/s | 7.6976 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3282ms | 0.1810ms | 5.5236 KOps/s | 5.1793 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.9403ms | 1.7644ms | 566.7723 Ops/s | 573.5394 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.5837ms | 1.2805ms | 780.9634 Ops/s | 764.8914 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.2418ms | 1.0863ms | 920.5940 Ops/s | 916.8938 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.6867ms | 3.5071ms | 285.1383 Ops/s | 286.4685 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 11.0627ms | 5.6249ms | 177.7824 Ops/s | 180.7879 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.0356ms | 6.8929ms | 145.0758 Ops/s | 141.5778 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4243ms | 0.2693ms | 3.7136 KOps/s | 3.7026 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.6989ms | 1.5296ms | 653.7488 Ops/s | 668.6264 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.8282ms | 2.4438ms | 409.1915 Ops/s | 399.4314 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.2904ms | 3.1412ms | 318.3507 Ops/s | 320.5043 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 33.7639ms | 33.1667ms | 30.1508 Ops/s | 30.3116 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 66.9921ms | 65.4223ms | 15.2853 Ops/s | 15.3618 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 38.5026ms | 37.6547ms | 26.5571 Ops/s | 26.5951 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 74.4906ms | 73.6932ms | 13.5698 Ops/s | 13.6345 Ops/s |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 83.9278μs | 81.6100μs | 12.2534 KOps/s | 12.4390 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1413ms | 0.1408ms | 7.1042 KOps/s | 7.1823 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1091s | 0.1088s | 9.1940 Ops/s | 9.1406 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.7011μs | 2.6956μs | 370.9682 KOps/s | 372.4892 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 37.4997μs | 37.2678μs | 26.8328 KOps/s | 26.1360 KOps/s | |
| test_simple | 0.8022s | 0.7978s | 1.2534 Ops/s | 1.2236 Ops/s | |
| test_transformed | 1.5459s | 1.4466s | 0.6913 Ops/s | 0.6872 Ops/s | |
| test_serial | 2.3907s | 2.3101s | 0.4329 Ops/s | 0.4322 Ops/s | |
| test_parallel | 1.9089s | 1.8084s | 0.5530 Ops/s | 0.5539 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.2568ms | 45.3964μs | 22.0282 KOps/s | 21.9010 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 49.7110μs | 25.5063μs | 39.2060 KOps/s | 39.7021 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 65.1920μs | 24.8562μs | 40.2314 KOps/s | 39.5159 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 64.0710μs | 14.1194μs | 70.8247 KOps/s | 71.0671 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 83.6020μs | 47.7708μs | 20.9333 KOps/s | 20.8730 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 68.0720μs | 27.8268μs | 35.9366 KOps/s | 35.8279 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 61.4710μs | 27.3773μs | 36.5266 KOps/s | 36.6631 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 54.6410μs | 16.6506μs | 60.0581 KOps/s | 60.0576 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 78.7610μs | 50.2292μs | 19.9087 KOps/s | 20.0103 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 63.7620μs | 31.0479μs | 32.2083 KOps/s | 32.5707 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 68.4510μs | 27.5946μs | 36.2390 KOps/s | 36.1455 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 43.0010μs | 16.7498μs | 59.7023 KOps/s | 59.1109 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 90.2720μs | 52.3084μs | 19.1174 KOps/s | 18.4665 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 65.6120μs | 32.9826μs | 30.3190 KOps/s | 29.2894 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 67.4320μs | 29.8287μs | 33.5248 KOps/s | 33.0590 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 44.5210μs | 19.2390μs | 51.9779 KOps/s | 50.3711 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 84.4320μs | 51.1544μs | 19.5487 KOps/s | 19.9300 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 62.1210μs | 30.8492μs | 32.4158 KOps/s | 32.0654 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.3246ms | 32.2782μs | 30.9807 KOps/s | 31.1233 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 49.2110μs | 18.4495μs | 54.2020 KOps/s | 53.9321 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 83.2820μs | 53.4655μs | 18.7037 KOps/s | 18.8229 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 69.0110μs | 33.2397μs | 30.0845 KOps/s | 29.9010 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 77.3910μs | 34.4001μs | 29.0697 KOps/s | 29.8660 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 58.5010μs | 20.8886μs | 47.8729 KOps/s | 47.4674 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 93.8020μs | 56.4749μs | 17.7070 KOps/s | 17.9712 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 65.6310μs | 36.5024μs | 27.3955 KOps/s | 27.4907 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 72.3610μs | 34.0082μs | 29.4047 KOps/s | 28.9270 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 63.7210μs | 21.1617μs | 47.2552 KOps/s | 47.6593 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 97.1320μs | 58.0753μs | 17.2190 KOps/s | 17.3696 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 68.0710μs | 39.0007μs | 25.6406 KOps/s | 25.8101 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 75.6810μs | 36.0346μs | 27.7511 KOps/s | 27.3769 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 46.8910μs | 23.8776μs | 41.8802 KOps/s | 41.7286 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8677s | 0.7684s | 1.3014 Ops/s | 1.3028 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7301s | 0.6334s | 1.5789 Ops/s | 1.5854 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7567s | 1.6741s | 0.5973 Ops/s | 0.5945 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5272s | 1.4508s | 0.6893 Ops/s | 0.6849 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9947s | 1.9162s | 0.5219 Ops/s | 0.5159 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7751s | 1.6898s | 0.5918 Ops/s | 0.5875 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.7951s | 4.6458s | 0.2152 Ops/s | 0.2161 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.6379s | 4.4896s | 0.2227 Ops/s | 0.2225 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9814s | 1.8819s | 0.5314 Ops/s | 0.5216 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.7196s | 1.6172s | 0.6184 Ops/s | 0.6164 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 20.6052ms | 20.0933ms | 49.7680 Ops/s | 49.7009 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 0.1445s | 3.7984ms | 263.2655 Ops/s | 269.6266 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.1070ms | 81.5733μs | 12.2589 KOps/s | 12.1981 KOps/s | |
| test_values[td1_return_estimate-False-False] | 49.3758ms | 47.4641ms | 21.0686 Ops/s | 20.9779 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 1.2836ms | 1.0745ms | 930.6544 Ops/s | 924.5530 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 81.2111ms | 78.1385ms | 12.7978 Ops/s | 12.7387 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 1.2582ms | 1.0709ms | 933.7830 Ops/s | 927.2909 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 20.6400ms | 20.3010ms | 49.2586 Ops/s | 48.8625 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0161ms | 0.7445ms | 1.3432 KOps/s | 1.3315 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7173ms | 0.6666ms | 1.5002 KOps/s | 1.4864 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5706ms | 1.4838ms | 673.9261 Ops/s | 674.6681 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.7292ms | 0.6827ms | 1.4648 KOps/s | 1.4530 KOps/s | |
| test_dqn_speed[False-None] | 1.6357ms | 1.5273ms | 654.7505 Ops/s | 654.4231 Ops/s | |
| test_dqn_speed[False-backward] | 2.1934ms | 2.1481ms | 465.5320 Ops/s | 464.8767 Ops/s | |
| test_dqn_speed[True-None] | 0.7221ms | 0.5644ms | 1.7718 KOps/s | 1.7529 KOps/s | |
| test_dqn_speed[True-backward] | 1.3254ms | 1.2113ms | 825.5448 Ops/s | 902.1882 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6483ms | 0.5863ms | 1.7055 KOps/s | 1.6475 KOps/s | |
| test_ddpg_speed[False-None] | 3.2684ms | 2.8713ms | 348.2720 Ops/s | 345.6010 Ops/s | |
| test_ddpg_speed[False-backward] | 4.6371ms | 4.2636ms | 234.5438 Ops/s | 242.3560 Ops/s | |
| test_ddpg_speed[True-None] | 1.5078ms | 1.3284ms | 752.7580 Ops/s | 747.9624 Ops/s | |
| test_ddpg_speed[True-backward] | 2.6184ms | 2.5349ms | 394.4929 Ops/s | 416.8307 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.4895ms | 1.3581ms | 736.3285 Ops/s | 737.3089 Ops/s | |
| test_sac_speed[False-None] | 8.7908ms | 8.2569ms | 121.1110 Ops/s | 121.2842 Ops/s | |
| test_sac_speed[False-backward] | 11.8945ms | 11.3710ms | 87.9428 Ops/s | 89.6973 Ops/s | |
| test_sac_speed[True-None] | 1.9660ms | 1.8412ms | 543.1228 Ops/s | 545.0603 Ops/s | |
| test_sac_speed[True-backward] | 3.6537ms | 3.5870ms | 278.7814 Ops/s | 274.1408 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 18.5390ms | 10.7757ms | 92.8015 Ops/s | 80.7939 Ops/s | |
| test_redq_deprec_speed[False-None] | 10.0329ms | 9.2026ms | 108.6650 Ops/s | 106.3963 Ops/s | |
| test_redq_deprec_speed[False-backward] | 12.9545ms | 12.4773ms | 80.1457 Ops/s | 78.7387 Ops/s | |
| test_redq_deprec_speed[True-None] | 2.7228ms | 2.5510ms | 391.9972 Ops/s | 395.3414 Ops/s | |
| test_redq_deprec_speed[True-backward] | 4.6510ms | 4.2910ms | 233.0451 Ops/s | 235.4562 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 16.2935ms | 9.8528ms | 101.4939 Ops/s | 100.5334 Ops/s | |
| test_td3_speed[False-None] | 8.2025ms | 8.0920ms | 123.5782 Ops/s | 122.7390 Ops/s | |
| test_td3_speed[False-backward] | 11.0353ms | 10.5865ms | 94.4602 Ops/s | 95.0467 Ops/s | |
| test_td3_speed[True-None] | 1.6868ms | 1.6572ms | 603.4258 Ops/s | 603.0164 Ops/s | |
| test_td3_speed[True-backward] | 3.3126ms | 3.2521ms | 307.4892 Ops/s | 314.3987 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 83.4214ms | 24.8254ms | 40.2813 Ops/s | 39.8455 Ops/s | |
| test_cql_speed[False-None] | 17.3609ms | 17.1205ms | 58.4094 Ops/s | 57.8359 Ops/s | |
| test_cql_speed[False-backward] | 23.3668ms | 22.5260ms | 44.3931 Ops/s | 44.6785 Ops/s | |
| test_cql_speed[True-None] | 3.9117ms | 3.3351ms | 299.8371 Ops/s | 301.1543 Ops/s | |
| test_cql_speed[True-backward] | 5.8753ms | 5.5186ms | 181.2057 Ops/s | 184.5320 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 18.9448ms | 12.0463ms | 83.0128 Ops/s | 83.5172 Ops/s | |
| test_a2c_speed[False-None] | 4.2679ms | 3.2321ms | 309.3919 Ops/s | 309.1116 Ops/s | |
| test_a2c_speed[False-backward] | 6.5833ms | 6.2274ms | 160.5814 Ops/s | 158.7673 Ops/s | |
| test_a2c_speed[True-None] | 1.4250ms | 1.3536ms | 738.7562 Ops/s | 734.4902 Ops/s | |
| test_a2c_speed[True-backward] | 3.8978ms | 3.1324ms | 319.2437 Ops/s | 319.9171 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 1.0833ms | 0.9963ms | 1.0038 KOps/s | 1.0208 KOps/s | |
| test_ppo_speed[False-None] | 3.9445ms | 3.7950ms | 263.5017 Ops/s | 260.6994 Ops/s | |
| test_ppo_speed[False-backward] | 7.3520ms | 6.9834ms | 143.1964 Ops/s | 143.0382 Ops/s | |
| test_ppo_speed[True-None] | 1.5231ms | 1.4503ms | 689.5038 Ops/s | 699.4508 Ops/s | |
| test_ppo_speed[True-backward] | 3.3485ms | 3.2833ms | 304.5745 Ops/s | 318.8231 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 1.5063ms | 1.0573ms | 945.8311 Ops/s | 940.0537 Ops/s | |
| test_reinforce_speed[False-None] | 2.6755ms | 2.2490ms | 444.6417 Ops/s | 435.8025 Ops/s | |
| test_reinforce_speed[False-backward] | 3.8233ms | 3.3667ms | 297.0238 Ops/s | 294.1458 Ops/s | |
| test_reinforce_speed[True-None] | 1.4044ms | 1.3079ms | 764.5825 Ops/s | 764.3224 Ops/s | |
| test_reinforce_speed[True-backward] | 3.1580ms | 3.0445ms | 328.4616 Ops/s | 327.5412 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 17.5608ms | 9.6211ms | 103.9386 Ops/s | 102.8966 Ops/s | |
| test_iql_speed[False-None] | 9.8896ms | 9.2863ms | 107.6851 Ops/s | 106.4654 Ops/s | |
| test_iql_speed[False-backward] | 13.0039ms | 12.8202ms | 78.0017 Ops/s | 77.0028 Ops/s | |
| test_iql_speed[True-None] | 2.6280ms | 2.2110ms | 452.2741 Ops/s | 450.4900 Ops/s | |
| test_iql_speed[True-backward] | 5.0469ms | 4.8977ms | 204.1757 Ops/s | 203.1529 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 17.8182ms | 10.4342ms | 95.8386 Ops/s | 92.8798 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.1263ms | 5.9767ms | 167.3170 Ops/s | 167.4734 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7976ms | 0.3643ms | 2.7453 KOps/s | 3.6209 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6310ms | 0.2922ms | 3.4225 KOps/s | 3.8394 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.3806ms | 5.7910ms | 172.6811 Ops/s | 175.2266 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.9243ms | 0.3259ms | 3.0686 KOps/s | 2.7888 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5642ms | 0.3147ms | 3.1781 KOps/s | 2.9437 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6878ms | 1.2916ms | 774.2198 Ops/s | 731.6972 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.6659ms | 1.1426ms | 875.2053 Ops/s | 772.2239 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.4050ms | 5.9770ms | 167.3080 Ops/s | 165.9263 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.0433ms | 0.4504ms | 2.2205 KOps/s | 2.0584 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6146ms | 0.4248ms | 2.3539 KOps/s | 2.1735 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.2966ms | 5.8692ms | 170.3809 Ops/s | 168.6440 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.9388ms | 0.3597ms | 2.7802 KOps/s | 3.5726 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5707ms | 0.3446ms | 2.9017 KOps/s | 3.3050 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.3700ms | 5.7991ms | 172.4398 Ops/s | 170.6877 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.0841ms | 0.2742ms | 3.6475 KOps/s | 2.9914 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.4547ms | 0.2610ms | 3.8316 KOps/s | 3.2264 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.4368ms | 6.0036ms | 166.5678 Ops/s | 165.7261 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.5644s | 1.3123ms | 762.0095 Ops/s | 2.1024 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6746ms | 0.4894ms | 2.0435 KOps/s | 2.4272 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.5774ms | 5.1157ms | 195.4784 Ops/s | 196.0941 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 8.5087ms | 1.9715ms | 507.2308 Ops/s | 439.6075 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 9.7156ms | 1.2871ms | 776.9621 Ops/s | 998.7247 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 6.9391ms | 5.0270ms | 198.9249 Ops/s | 49.8469 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 4.0630ms | 1.8018ms | 555.0058 Ops/s | 479.8795 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.0415ms | 0.9119ms | 1.0966 KOps/s | 810.0927 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.5375s | 16.0212ms | 62.4174 Ops/s | 185.7242 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 4.1428ms | 1.9396ms | 515.5643 Ops/s | 466.1791 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 2.2430ms | 1.1219ms | 891.3702 Ops/s | 919.0262 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 37.4228ms | 35.1599ms | 28.4415 Ops/s | 28.0314 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.3446ms | 17.7536ms | 56.3266 Ops/s | 55.7369 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 40.2737ms | 36.7617ms | 27.2022 Ops/s | 27.1048 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 19.6009ms | 17.9238ms | 55.7916 Ops/s | 55.0019 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 40.2746ms | 38.4836ms | 25.9851 Ops/s | 25.7638 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 20.7312ms | 19.5503ms | 51.1501 Ops/s | 50.9672 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8738ms | 0.2183ms | 4.5815 KOps/s | 4.6464 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.5600ms | 1.4119ms | 708.2603 Ops/s | 708.5095 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.5242ms | 2.2994ms | 434.8922 Ops/s | 424.1399 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.3289ms | 2.8933ms | 345.6202 Ops/s | 342.6124 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.6349ms | 0.1604ms | 6.2363 KOps/s | 6.1801 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3890ms | 0.2200ms | 4.5449 KOps/s | 4.3750 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.8597ms | 1.7040ms | 586.8392 Ops/s | 547.3499 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.5455ms | 1.3859ms | 721.5325 Ops/s | 731.6624 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.5775ms | 1.1599ms | 862.1680 Ops/s | 871.5197 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 4.1063ms | 3.5830ms | 279.0939 Ops/s | 277.5560 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 6.2708ms | 5.8211ms | 171.7894 Ops/s | 171.8828 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.7923ms | 7.3700ms | 135.6847 Ops/s | 136.4020 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4162ms | 0.2703ms | 3.6996 KOps/s | 3.6741 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.9716ms | 1.5255ms | 655.5331 Ops/s | 653.4148 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.8831ms | 2.3910ms | 418.2402 Ops/s | 434.9448 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.5477ms | 3.1027ms | 322.3037 Ops/s | 318.9266 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 34.7905ms | 33.6178ms | 29.7462 Ops/s | 29.5790 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 67.4346ms | 66.1010ms | 15.1284 Ops/s | 15.0111 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 39.7313ms | 37.9364ms | 26.3599 Ops/s | 26.3699 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 75.6723ms | 73.8230ms | 13.5459 Ops/s | 13.1207 Ops/s | |
| test_collector_without_rb_cuda[100-img_shape0-atari] | 0.7618s | 95.2890ms | 10.4944 Ops/s | 17.3836 Ops/s | |
| test_collector_without_rb_cuda[200-img_shape1-large_batch] | 0.1155s | 0.1125s | 8.8853 Ops/s | 8.7001 Ops/s | |
| test_collector_with_rb_cuda[100-img_shape0-atari] | 61.5329ms | 58.4443ms | 17.1103 Ops/s | 16.8825 Ops/s | |
| test_collector_with_rb_cuda[200-img_shape1-large_batch] | 0.1169s | 0.1159s | 8.6256 Ops/s | 8.4938 Ops/s |
vmoens
added a commit
that referenced
this pull request
Feb 7, 2026
Optimise the output-reading phase of step_and_maybe_reset when shared memory and target device are both known and different (the common CPU-shared -> CUDA case). - When shared_device is not None and shared_device != device: use a single td.to(device) instead of _fast_apply with per-tensor check. Since .to() already creates new tensors, the extra .clone() is unnecessary. - Keep the _fast_apply fallback for the mixed-device case. - Move _sync_w2m() into a conditional - only called when a cross-device transfer actually happened. Co-authored-by: Cursor <[email protected]> ghstack-source-id: 07aba16 Pull-Request: #3458
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Optimise the output-reading phase of step_and_maybe_reset when shared
memory and target device are both known and different (the common
CPU-shared -> CUDA case).
single td.to(device) instead of _fast_apply with per-tensor check.
Since .to() already creates new tensors, the extra .clone() is
unnecessary.
transfer actually happened.
Co-authored-by: Cursor [email protected]