[Example] Dreamer: SerialEnv mode and collector compile config#3459
[Example] Dreamer: SerialEnv mode and collector compile config#3459vmoens wants to merge 3 commits intogh/vmoens/220/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3459
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 3821082 with merge base 73b853b ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
1 similar comment
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 81.7410μs | 80.1110μs | 12.4827 KOps/s | 11.8077 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1470ms | 0.1424ms | 7.0244 KOps/s | 7.1205 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1299s | 0.1293s | 7.7327 Ops/s | 8.5998 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.7699μs | 2.7612μs | 362.1553 KOps/s | 377.1386 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 39.5596μs | 38.7450μs | 25.8098 KOps/s | 25.0637 KOps/s | |
| test_simple | 0.5612s | 0.5555s | 1.8002 Ops/s | 1.7081 Ops/s | |
| test_transformed | 1.2543s | 1.1635s | 0.8595 Ops/s | 0.8560 Ops/s | |
| test_serial | 1.7110s | 1.7048s | 0.5866 Ops/s | 0.5827 Ops/s | |
| test_parallel | 1.1492s | 1.0574s | 0.9457 Ops/s | 0.9398 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.3413ms | 44.0442μs | 22.7045 KOps/s | 22.8973 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 48.6910μs | 25.0129μs | 39.9794 KOps/s | 38.9386 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 56.4110μs | 24.6183μs | 40.6201 KOps/s | 39.4179 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 44.1310μs | 13.7164μs | 72.9053 KOps/s | 71.9032 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 76.1420μs | 47.5843μs | 21.0153 KOps/s | 20.9140 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 58.2110μs | 27.5543μs | 36.2919 KOps/s | 34.9772 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 64.0010μs | 27.6647μs | 36.1472 KOps/s | 35.7622 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 43.2110μs | 16.3884μs | 61.0187 KOps/s | 59.6555 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 81.9920μs | 50.2065μs | 19.9178 KOps/s | 19.3674 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 55.9010μs | 30.5469μs | 32.7365 KOps/s | 31.7125 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 60.0910μs | 27.8068μs | 35.9625 KOps/s | 36.3456 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 45.6010μs | 16.5727μs | 60.3402 KOps/s | 59.5256 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 82.2710μs | 52.4528μs | 19.0648 KOps/s | 18.7822 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 66.4320μs | 32.8256μs | 30.4640 KOps/s | 29.6421 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 64.6810μs | 29.9604μs | 33.3774 KOps/s | 32.5733 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 53.9010μs | 19.1915μs | 52.1063 KOps/s | 52.2897 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 87.1620μs | 50.2927μs | 19.8836 KOps/s | 19.8621 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 58.5120μs | 30.6854μs | 32.5888 KOps/s | 32.0102 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.3809ms | 31.9376μs | 31.3110 KOps/s | 31.5699 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 46.9910μs | 18.2160μs | 54.8969 KOps/s | 54.1606 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 0.1555ms | 50.4988μs | 19.8025 KOps/s | 18.9769 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 56.7210μs | 32.8086μs | 30.4798 KOps/s | 29.6532 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 67.1310μs | 34.0519μs | 29.3670 KOps/s | 29.3418 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 46.2810μs | 20.8353μs | 47.9955 KOps/s | 47.4971 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 93.3620μs | 55.1586μs | 18.1295 KOps/s | 17.8470 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 83.2520μs | 36.1576μs | 27.6567 KOps/s | 27.3643 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 61.5020μs | 33.2416μs | 30.0828 KOps/s | 29.3152 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 0.1666ms | 20.6902μs | 48.3320 KOps/s | 46.9658 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 93.2920μs | 56.6288μs | 17.6589 KOps/s | 17.0071 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 71.9310μs | 38.4141μs | 26.0321 KOps/s | 25.4714 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 71.0110μs | 35.5468μs | 28.1320 KOps/s | 27.0656 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 48.9710μs | 23.3010μs | 42.9167 KOps/s | 42.4548 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8647s | 0.7693s | 1.2998 Ops/s | 1.2911 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7307s | 0.6360s | 1.5723 Ops/s | 1.5705 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7553s | 1.6842s | 0.5938 Ops/s | 0.5896 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5451s | 1.4653s | 0.6824 Ops/s | 0.6816 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 2.0221s | 1.9424s | 0.5148 Ops/s | 0.5129 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7976s | 1.7189s | 0.5818 Ops/s | 0.5832 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.8158s | 4.6914s | 0.2132 Ops/s | 0.2115 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.6322s | 4.5295s | 0.2208 Ops/s | 0.2217 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 2.0622s | 1.9210s | 0.5206 Ops/s | 0.5161 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.7980s | 1.6369s | 0.6109 Ops/s | 0.6186 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 11.3584ms | 11.1766ms | 89.4729 Ops/s | 93.9097 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 13.5288ms | 11.0435ms | 90.5510 Ops/s | 87.0922 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.2251ms | 0.1312ms | 7.6202 KOps/s | 7.6364 KOps/s | |
| test_values[td1_return_estimate-False-False] | 30.9869ms | 30.5566ms | 32.7262 Ops/s | 34.4845 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 11.5082ms | 11.1257ms | 89.8821 Ops/s | 91.0159 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 45.9731ms | 45.4234ms | 22.0151 Ops/s | 23.2892 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 12.0987ms | 11.1584ms | 89.6182 Ops/s | 91.3964 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 10.1036ms | 9.9930ms | 100.0700 Ops/s | 104.9391 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.7729ms | 1.5773ms | 633.9755 Ops/s | 660.0740 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.4753ms | 0.4368ms | 2.2894 KOps/s | 2.2645 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 29.8874ms | 26.3122ms | 38.0051 Ops/s | 52.9964 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 1.9294ms | 1.7043ms | 586.7545 Ops/s | 585.2096 Ops/s | |
| test_dqn_speed[False-None] | 1.6185ms | 1.4363ms | 696.2538 Ops/s | 706.1845 Ops/s | |
| test_dqn_speed[False-backward] | 2.0209ms | 1.9470ms | 513.6005 Ops/s | 515.7591 Ops/s | |
| test_dqn_speed[True-None] | 0.6156ms | 0.5559ms | 1.7989 KOps/s | 1.7906 KOps/s | |
| test_dqn_speed[True-backward] | 1.0605ms | 1.0212ms | 979.2357 Ops/s | 976.8853 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6847ms | 0.5457ms | 1.8327 KOps/s | 1.8058 KOps/s | |
| test_ddpg_speed[False-None] | 3.2790ms | 2.8861ms | 346.4921 Ops/s | 347.7361 Ops/s | |
| test_ddpg_speed[False-backward] | 4.3483ms | 4.1231ms | 242.5363 Ops/s | 242.7113 Ops/s | |
| test_ddpg_speed[True-None] | 1.6953ms | 1.4366ms | 696.0703 Ops/s | 692.5522 Ops/s | |
| test_ddpg_speed[True-backward] | 2.5386ms | 2.4508ms | 408.0247 Ops/s | 410.2820 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.5495ms | 1.4136ms | 707.3969 Ops/s | 702.1881 Ops/s | |
| test_sac_speed[False-None] | 8.7401ms | 8.1585ms | 122.5722 Ops/s | 123.1197 Ops/s | |
| test_sac_speed[False-backward] | 11.8714ms | 11.4042ms | 87.6868 Ops/s | 87.4200 Ops/s | |
| test_sac_speed[True-None] | 2.7424ms | 2.1696ms | 460.9122 Ops/s | 458.9877 Ops/s | |
| test_sac_speed[True-backward] | 4.2288ms | 4.0953ms | 244.1846 Ops/s | 192.3427 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 2.3106ms | 2.1751ms | 459.7581 Ops/s | 458.2367 Ops/s | |
| test_redq_speed[False-None] | 10.9768ms | 10.5546ms | 94.7455 Ops/s | 94.7674 Ops/s | |
| test_redq_speed[False-backward] | 18.8013ms | 17.9815ms | 55.6128 Ops/s | 56.1285 Ops/s | |
| test_redq_speed[True-None] | 4.6380ms | 4.4670ms | 223.8661 Ops/s | 223.6544 Ops/s | |
| test_redq_speed[True-backward] | 10.1771ms | 9.8866ms | 101.1471 Ops/s | 102.1692 Ops/s | |
| test_redq_speed[reduce-overhead-None] | 4.7154ms | 4.4657ms | 223.9292 Ops/s | 199.2463 Ops/s | |
| test_redq_deprec_speed[False-None] | 11.8687ms | 11.3612ms | 88.0192 Ops/s | 89.3784 Ops/s | |
| test_redq_deprec_speed[False-backward] | 16.9184ms | 16.2971ms | 61.3606 Ops/s | 62.2166 Ops/s | |
| test_redq_deprec_speed[True-None] | 3.8817ms | 3.7220ms | 268.6698 Ops/s | 266.9444 Ops/s | |
| test_redq_deprec_speed[True-backward] | 8.0258ms | 7.7428ms | 129.1525 Ops/s | 122.8908 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 3.8021ms | 3.6709ms | 272.4156 Ops/s | 260.6343 Ops/s | |
| test_td3_speed[False-None] | 8.4243ms | 8.1741ms | 122.3383 Ops/s | 123.3898 Ops/s | |
| test_td3_speed[False-backward] | 11.6151ms | 11.1034ms | 90.0625 Ops/s | 90.9083 Ops/s | |
| test_td3_speed[True-None] | 1.9337ms | 1.8666ms | 535.7390 Ops/s | 528.4883 Ops/s | |
| test_td3_speed[True-backward] | 4.2554ms | 3.7410ms | 267.3084 Ops/s | 248.9343 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 1.9590ms | 1.8378ms | 544.1347 Ops/s | 548.7210 Ops/s | |
| test_cql_speed[False-None] | 29.8942ms | 26.5482ms | 37.6674 Ops/s | 37.9808 Ops/s | |
| test_cql_speed[False-backward] | 38.7504ms | 35.9161ms | 27.8426 Ops/s | 28.2025 Ops/s | |
| test_cql_speed[True-None] | 15.1352ms | 12.5406ms | 79.7408 Ops/s | 77.7450 Ops/s | |
| test_cql_speed[True-backward] | 19.1692ms | 18.6593ms | 53.5925 Ops/s | 55.6573 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 12.7334ms | 12.4776ms | 80.1435 Ops/s | 78.5062 Ops/s | |
| test_a2c_speed[False-None] | 5.6693ms | 5.5191ms | 181.1901 Ops/s | 182.0806 Ops/s | |
| test_a2c_speed[False-backward] | 12.3122ms | 12.0686ms | 82.8597 Ops/s | 84.1177 Ops/s | |
| test_a2c_speed[True-None] | 3.8989ms | 3.7355ms | 267.7043 Ops/s | 265.7272 Ops/s | |
| test_a2c_speed[True-backward] | 8.8251ms | 8.6450ms | 115.6743 Ops/s | 116.0023 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 3.8591ms | 3.7484ms | 266.7771 Ops/s | 268.5529 Ops/s | |
| test_ppo_speed[False-None] | 6.1927ms | 6.0086ms | 166.4293 Ops/s | 166.2338 Ops/s | |
| test_ppo_speed[False-backward] | 12.9567ms | 12.7169ms | 78.6353 Ops/s | 79.2761 Ops/s | |
| test_ppo_speed[True-None] | 3.8365ms | 3.6737ms | 272.2051 Ops/s | 271.8522 Ops/s | |
| test_ppo_speed[True-backward] | 8.6897ms | 8.4869ms | 117.8291 Ops/s | 118.7639 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 3.7716ms | 3.6537ms | 273.6915 Ops/s | 271.1671 Ops/s | |
| test_reinforce_speed[False-None] | 4.8410ms | 4.5380ms | 220.3624 Ops/s | 214.2805 Ops/s | |
| test_reinforce_speed[False-backward] | 7.5555ms | 7.3755ms | 135.5849 Ops/s | 134.5215 Ops/s | |
| test_reinforce_speed[True-None] | 3.0462ms | 2.9174ms | 342.7754 Ops/s | 339.4848 Ops/s | |
| test_reinforce_speed[True-backward] | 8.0220ms | 7.8211ms | 127.8591 Ops/s | 129.7306 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 3.0381ms | 2.8991ms | 344.9292 Ops/s | 343.5962 Ops/s | |
| test_iql_speed[False-None] | 27.5565ms | 20.4973ms | 48.7869 Ops/s | 49.8437 Ops/s | |
| test_iql_speed[False-backward] | 35.2710ms | 30.7283ms | 32.5433 Ops/s | 32.8520 Ops/s | |
| test_iql_speed[True-None] | 9.1084ms | 8.6123ms | 116.1124 Ops/s | 112.3479 Ops/s | |
| test_iql_speed[True-backward] | 17.2291ms | 16.8512ms | 59.3429 Ops/s | 59.1445 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 8.8885ms | 8.6157ms | 116.0665 Ops/s | 114.3327 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.3028ms | 6.1018ms | 163.8863 Ops/s | 163.8213 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.7714ms | 0.3466ms | 2.8855 KOps/s | 3.1267 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6283ms | 0.3297ms | 3.0328 KOps/s | 3.1944 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.0963ms | 5.8669ms | 170.4484 Ops/s | 172.1881 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.0310ms | 0.3505ms | 2.8531 KOps/s | 2.9835 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6762ms | 0.3239ms | 3.0876 KOps/s | 2.6879 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6291ms | 1.3730ms | 728.3142 Ops/s | 757.4304 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.5728ms | 1.3381ms | 747.3514 Ops/s | 723.9300 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 10.1301ms | 6.1958ms | 161.3988 Ops/s | 166.2646 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.1254ms | 0.4674ms | 2.1394 KOps/s | 1.8436 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8532ms | 0.4415ms | 2.2648 KOps/s | 1.9271 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.9695ms | 5.8853ms | 169.9162 Ops/s | 171.2749 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.7188ms | 0.3383ms | 2.9556 KOps/s | 3.2083 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5723ms | 0.3379ms | 2.9595 KOps/s | 3.0070 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1122ms | 5.8063ms | 172.2280 Ops/s | 171.8255 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.0519ms | 0.3340ms | 2.9944 KOps/s | 3.5086 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.4834ms | 0.2783ms | 3.5934 KOps/s | 3.7519 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.2538ms | 6.0161ms | 166.2213 Ops/s | 166.8205 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8130ms | 0.4814ms | 2.0771 KOps/s | 2.0306 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8426ms | 0.4574ms | 2.1864 KOps/s | 2.0963 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.3536ms | 4.9479ms | 202.1050 Ops/s | 57.6747 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 4.3855ms | 2.0668ms | 483.8384 Ops/s | 507.8408 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 1.1001ms | 0.8947ms | 1.1177 KOps/s | 813.7066 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.5197s | 15.4070ms | 64.9054 Ops/s | 197.8571 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 7.2794ms | 1.9561ms | 511.2244 Ops/s | 474.8699 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 10.9920ms | 1.2768ms | 783.2338 Ops/s | 861.9290 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 7.7634ms | 5.2174ms | 191.6664 Ops/s | 187.4113 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 8.8676ms | 2.0428ms | 489.5236 Ops/s | 75.5616 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 3.2167ms | 1.0795ms | 926.3354 Ops/s | 919.4630 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 38.2428ms | 36.5261ms | 27.3777 Ops/s | 27.3976 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 20.3084ms | 18.5369ms | 53.9464 Ops/s | 54.1727 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 41.9362ms | 37.8741ms | 26.4033 Ops/s | 26.7128 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 20.6333ms | 18.9445ms | 52.7858 Ops/s | 53.2730 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 41.4222ms | 40.0446ms | 24.9722 Ops/s | 25.5252 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.4390ms | 20.3014ms | 49.2578 Ops/s | 49.9747 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8864ms | 0.2208ms | 4.5289 KOps/s | 4.5930 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.7912ms | 1.4152ms | 706.5938 Ops/s | 723.6730 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.4846ms | 2.3198ms | 431.0799 Ops/s | 421.0712 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.1372ms | 2.9332ms | 340.9283 Ops/s | 342.5039 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2217ms | 0.1402ms | 7.1302 KOps/s | 7.4652 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3415ms | 0.1805ms | 5.5388 KOps/s | 5.5075 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.9190ms | 1.7815ms | 561.3092 Ops/s | 554.9994 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.5160ms | 1.3460ms | 742.9452 Ops/s | 756.2999 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.2723ms | 1.1355ms | 880.6319 Ops/s | 902.3457 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.8077ms | 3.5875ms | 278.7469 Ops/s | 285.7362 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 5.8826ms | 5.6679ms | 176.4335 Ops/s | 172.8745 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.5767ms | 7.3328ms | 136.3731 Ops/s | 138.6792 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4339ms | 0.2772ms | 3.6074 KOps/s | 3.6289 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.7396ms | 1.5279ms | 654.4769 Ops/s | 671.6455 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.6659ms | 2.4674ms | 405.2875 Ops/s | 400.4949 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.3292ms | 3.1411ms | 318.3576 Ops/s | 320.5882 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 35.4735ms | 34.5200ms | 28.9687 Ops/s | 29.2869 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 68.6543ms | 67.9917ms | 14.7077 Ops/s | 14.8069 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 40.2360ms | 39.2800ms | 25.4583 Ops/s | 25.7892 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 77.0070ms | 76.3268ms | 13.1016 Ops/s | 13.0740 Ops/s |
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 83.3901μs | 82.1304μs | 12.1758 KOps/s | 12.1174 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1459ms | 0.1433ms | 6.9799 KOps/s | 7.1534 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1129s | 0.1105s | 9.0471 Ops/s | 8.9686 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.6846μs | 2.6821μs | 372.8414 KOps/s | 386.1623 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 38.6420μs | 38.2161μs | 26.1670 KOps/s | 25.4016 KOps/s | |
| test_simple | 0.8094s | 0.7969s | 1.2548 Ops/s | 1.2246 Ops/s | |
| test_transformed | 1.5478s | 1.4531s | 0.6882 Ops/s | 0.6889 Ops/s | |
| test_serial | 2.4553s | 2.3507s | 0.4254 Ops/s | 0.4315 Ops/s | |
| test_parallel | 1.9378s | 1.8336s | 0.5454 Ops/s | 0.5594 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.2920ms | 45.9434μs | 21.7659 KOps/s | 22.9598 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 59.3710μs | 26.7569μs | 37.3735 KOps/s | 40.2417 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 69.0810μs | 27.7016μs | 36.0990 KOps/s | 41.2409 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 55.4610μs | 15.4360μs | 64.7838 KOps/s | 72.7343 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 98.2220μs | 50.7368μs | 19.7096 KOps/s | 21.2441 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 65.8910μs | 29.8539μs | 33.4964 KOps/s | 36.9442 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 77.6710μs | 29.9795μs | 33.3562 KOps/s | 36.7128 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 49.8410μs | 17.3663μs | 57.5829 KOps/s | 60.5743 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 96.2320μs | 51.8533μs | 19.2852 KOps/s | 20.1089 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 70.7110μs | 31.1665μs | 32.0858 KOps/s | 33.0069 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 72.1710μs | 30.2844μs | 33.0203 KOps/s | 36.5667 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 55.9610μs | 17.3473μs | 57.6458 KOps/s | 60.7066 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 96.5410μs | 55.7850μs | 17.9260 KOps/s | 18.9857 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 91.6310μs | 34.0875μs | 29.3363 KOps/s | 30.5064 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 78.9310μs | 32.5479μs | 30.7240 KOps/s | 33.8877 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 53.5110μs | 20.4833μs | 48.8202 KOps/s | 52.1178 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 93.4720μs | 51.9464μs | 19.2506 KOps/s | 19.9794 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 73.9610μs | 31.6654μs | 31.5802 KOps/s | 32.5241 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.2124ms | 33.8917μs | 29.5057 KOps/s | 31.8018 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 61.9910μs | 19.0898μs | 52.3841 KOps/s | 55.1736 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 99.4920μs | 54.8228μs | 18.2406 KOps/s | 18.6754 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 85.1910μs | 34.2082μs | 29.2328 KOps/s | 29.9187 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 79.5810μs | 35.4132μs | 28.2381 KOps/s | 29.7675 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 58.6010μs | 21.7067μs | 46.0687 KOps/s | 48.1553 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 0.1051ms | 57.0572μs | 17.5263 KOps/s | 18.0189 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 0.1090ms | 36.5707μs | 27.3443 KOps/s | 27.6662 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 76.1510μs | 35.2295μs | 28.3853 KOps/s | 29.5687 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 53.6010μs | 21.1352μs | 47.3145 KOps/s | 48.4825 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 0.1036ms | 57.9856μs | 17.2457 KOps/s | 17.5320 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 78.1110μs | 39.3857μs | 25.3899 KOps/s | 25.8471 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 0.1149ms | 36.7539μs | 27.2080 KOps/s | 27.6898 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 58.9210μs | 23.5483μs | 42.4659 KOps/s | 43.0791 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8917s | 0.8006s | 1.2491 Ops/s | 1.2912 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7372s | 0.6480s | 1.5432 Ops/s | 1.5713 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7950s | 1.7222s | 0.5807 Ops/s | 0.5940 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.6069s | 1.4982s | 0.6674 Ops/s | 0.6834 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 2.0503s | 1.9745s | 0.5065 Ops/s | 0.5117 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.8599s | 1.7745s | 0.5635 Ops/s | 0.5858 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.6750s | 4.6258s | 0.2162 Ops/s | 0.2133 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.5221s | 4.4293s | 0.2258 Ops/s | 0.2261 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9984s | 1.9124s | 0.5229 Ops/s | 0.5277 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.7275s | 1.6163s | 0.6187 Ops/s | 0.6241 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 22.0253ms | 21.4814ms | 46.5519 Ops/s | 49.1741 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 0.1256s | 3.4345ms | 291.1648 Ops/s | 278.6910 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.1112ms | 84.2399μs | 11.8709 KOps/s | 11.9303 KOps/s | |
| test_values[td1_return_estimate-False-False] | 52.0021ms | 50.7898ms | 19.6890 Ops/s | 20.4149 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 1.3171ms | 1.0908ms | 916.7956 Ops/s | 918.5752 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 84.5117ms | 83.2585ms | 12.0108 Ops/s | 12.4889 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 1.3094ms | 1.1027ms | 906.8574 Ops/s | 923.2966 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 22.3545ms | 21.9552ms | 45.5474 Ops/s | 48.5765 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0388ms | 0.7587ms | 1.3180 KOps/s | 1.3129 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.8674ms | 0.7061ms | 1.4162 KOps/s | 1.4722 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.6187ms | 1.5064ms | 663.8208 Ops/s | 670.6609 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.7529ms | 0.6975ms | 1.4337 KOps/s | 1.4371 KOps/s | |
| test_dqn_speed[False-None] | 1.7041ms | 1.5472ms | 646.3451 Ops/s | 649.6435 Ops/s | |
| test_dqn_speed[False-backward] | 2.3802ms | 2.1709ms | 460.6458 Ops/s | 460.3754 Ops/s | |
| test_dqn_speed[True-None] | 1.0536ms | 0.5706ms | 1.7525 KOps/s | 1.6742 KOps/s | |
| test_dqn_speed[True-backward] | 1.1467ms | 1.1056ms | 904.4799 Ops/s | 825.0757 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.7038ms | 0.6206ms | 1.6114 KOps/s | 1.6010 KOps/s | |
| test_ddpg_speed[False-None] | 3.2902ms | 2.9663ms | 337.1202 Ops/s | 343.6536 Ops/s | |
| test_ddpg_speed[False-backward] | 4.5528ms | 4.1504ms | 240.9431 Ops/s | 233.5103 Ops/s | |
| test_ddpg_speed[True-None] | 1.4697ms | 1.3341ms | 749.5615 Ops/s | 748.2629 Ops/s | |
| test_ddpg_speed[True-backward] | 2.5604ms | 2.3965ms | 417.2786 Ops/s | 393.8400 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.4862ms | 1.3723ms | 728.7253 Ops/s | 735.0869 Ops/s | |
| test_sac_speed[False-None] | 8.9377ms | 8.3842ms | 119.2723 Ops/s | 121.0274 Ops/s | |
| test_sac_speed[False-backward] | 11.9099ms | 11.2920ms | 88.5585 Ops/s | 87.4422 Ops/s | |
| test_sac_speed[True-None] | 2.0029ms | 1.8395ms | 543.6170 Ops/s | 518.7265 Ops/s | |
| test_sac_speed[True-backward] | 4.0438ms | 3.6400ms | 274.7287 Ops/s | 273.9519 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 19.4361ms | 10.8105ms | 92.5024 Ops/s | 82.3908 Ops/s | |
| test_redq_deprec_speed[False-None] | 9.8169ms | 9.2927ms | 107.6114 Ops/s | 107.6450 Ops/s | |
| test_redq_deprec_speed[False-backward] | 13.1573ms | 12.5969ms | 79.3849 Ops/s | 79.2559 Ops/s | |
| test_redq_deprec_speed[True-None] | 3.0185ms | 2.5384ms | 393.9556 Ops/s | 378.2631 Ops/s | |
| test_redq_deprec_speed[True-backward] | 4.7879ms | 4.3661ms | 229.0357 Ops/s | 238.4026 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 15.7665ms | 9.7917ms | 102.1278 Ops/s | 101.7915 Ops/s | |
| test_td3_speed[False-None] | 47.6986ms | 8.5883ms | 116.4378 Ops/s | 121.1402 Ops/s | |
| test_td3_speed[False-backward] | 11.6227ms | 10.8133ms | 92.4786 Ops/s | 92.6127 Ops/s | |
| test_td3_speed[True-None] | 1.8041ms | 1.7405ms | 574.5575 Ops/s | 607.7198 Ops/s | |
| test_td3_speed[True-backward] | 3.3791ms | 3.2934ms | 303.6384 Ops/s | 303.3251 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 61.7145ms | 24.6218ms | 40.6145 Ops/s | 41.0374 Ops/s | |
| test_cql_speed[False-None] | 18.2252ms | 17.3298ms | 57.7042 Ops/s | 58.0475 Ops/s | |
| test_cql_speed[False-backward] | 23.5662ms | 22.8809ms | 43.7046 Ops/s | 30.6999 Ops/s | |
| test_cql_speed[True-None] | 3.4639ms | 3.3063ms | 302.4557 Ops/s | 303.1505 Ops/s | |
| test_cql_speed[True-backward] | 5.9315ms | 5.5642ms | 179.7217 Ops/s | 177.2865 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 0.6855s | 15.3647ms | 65.0841 Ops/s | 83.9618 Ops/s | |
| test_a2c_speed[False-None] | 3.9086ms | 3.2454ms | 308.1251 Ops/s | 305.2437 Ops/s | |
| test_a2c_speed[False-backward] | 6.7586ms | 6.3641ms | 157.1318 Ops/s | 157.7261 Ops/s | |
| test_a2c_speed[True-None] | 1.4368ms | 1.3437ms | 744.2286 Ops/s | 742.5772 Ops/s | |
| test_a2c_speed[True-backward] | 3.1941ms | 3.1419ms | 318.2836 Ops/s | 333.1554 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 1.1513ms | 0.9961ms | 1.0040 KOps/s | 997.8429 Ops/s | |
| test_ppo_speed[False-None] | 4.0161ms | 3.8605ms | 259.0317 Ops/s | 258.0381 Ops/s | |
| test_ppo_speed[False-backward] | 7.5356ms | 7.1541ms | 139.7801 Ops/s | 145.0088 Ops/s | |
| test_ppo_speed[True-None] | 1.6841ms | 1.4563ms | 686.6682 Ops/s | 696.7777 Ops/s | |
| test_ppo_speed[True-backward] | 3.5405ms | 3.2973ms | 303.2752 Ops/s | 301.4842 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 1.4779ms | 1.0517ms | 950.8275 Ops/s | 924.5459 Ops/s | |
| test_reinforce_speed[False-None] | 2.7361ms | 2.2986ms | 435.0519 Ops/s | 434.9750 Ops/s | |
| test_reinforce_speed[False-backward] | 3.8311ms | 3.4185ms | 292.5296 Ops/s | 294.4584 Ops/s | |
| test_reinforce_speed[True-None] | 1.7445ms | 1.3051ms | 766.2258 Ops/s | 773.7379 Ops/s | |
| test_reinforce_speed[True-backward] | 3.1248ms | 3.0695ms | 325.7908 Ops/s | 317.6479 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 0.5711s | 10.6680ms | 93.7387 Ops/s | 106.4299 Ops/s | |
| test_iql_speed[False-None] | 10.2864ms | 9.4346ms | 105.9925 Ops/s | 105.7386 Ops/s | |
| test_iql_speed[False-backward] | 13.7915ms | 13.3819ms | 74.7279 Ops/s | 74.7336 Ops/s | |
| test_iql_speed[True-None] | 2.6205ms | 2.2064ms | 453.2288 Ops/s | 450.6690 Ops/s | |
| test_iql_speed[True-backward] | 5.3311ms | 4.8963ms | 204.2340 Ops/s | 201.8753 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 17.6590ms | 10.4587ms | 95.6137 Ops/s | 75.2150 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.3947ms | 5.9221ms | 168.8600 Ops/s | 165.4683 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.8274ms | 0.2859ms | 3.4975 KOps/s | 3.5285 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5004ms | 0.2679ms | 3.7322 KOps/s | 3.7556 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1536ms | 5.6629ms | 176.5895 Ops/s | 171.2533 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.2617ms | 0.3062ms | 3.2662 KOps/s | 3.2958 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5386ms | 0.3277ms | 3.0513 KOps/s | 3.2465 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.7956ms | 1.2802ms | 781.1041 Ops/s | 798.1099 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.5987ms | 1.1754ms | 850.7477 Ops/s | 842.5013 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.2880ms | 5.8340ms | 171.4085 Ops/s | 165.2467 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.0205ms | 0.4351ms | 2.2982 KOps/s | 2.3115 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8570ms | 0.4136ms | 2.4177 KOps/s | 1.9157 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.1959ms | 5.7891ms | 172.7372 Ops/s | 169.2744 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.1399ms | 0.2868ms | 3.4872 KOps/s | 2.8300 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.4809ms | 0.2694ms | 3.7114 KOps/s | 2.9800 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.3614ms | 5.7748ms | 173.1674 Ops/s | 171.7937 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.1085ms | 0.2845ms | 3.5150 KOps/s | 3.1034 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.4640ms | 0.2755ms | 3.6301 KOps/s | 3.2973 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.1405ms | 5.9785ms | 167.2672 Ops/s | 166.1122 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.1860ms | 0.4439ms | 2.2526 KOps/s | 651.5792 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6156ms | 0.4278ms | 2.3374 KOps/s | 2.1723 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.3896ms | 4.9519ms | 201.9420 Ops/s | 197.6712 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 4.4183ms | 2.1155ms | 472.7009 Ops/s | 509.4766 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 2.5744ms | 0.9879ms | 1.0123 KOps/s | 1.0493 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.5903s | 16.8055ms | 59.5045 Ops/s | 196.3271 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 10.0593ms | 1.9974ms | 500.6590 Ops/s | 496.7615 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 10.1696ms | 1.3348ms | 749.1814 Ops/s | 1.0401 KOps/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 7.1843ms | 5.2340ms | 191.0584 Ops/s | 51.7900 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 10.3481ms | 2.0908ms | 478.2971 Ops/s | 496.0294 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 2.1753ms | 1.1833ms | 845.1254 Ops/s | 878.3999 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 37.8482ms | 35.8719ms | 27.8770 Ops/s | 27.5673 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.9941ms | 18.3848ms | 54.3928 Ops/s | 55.1962 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 42.4274ms | 37.1638ms | 26.9079 Ops/s | 26.5029 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 0.5527s | 29.3649ms | 34.0543 Ops/s | 54.1843 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 40.8875ms | 39.0580ms | 25.6029 Ops/s | 25.2755 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.4856ms | 19.9496ms | 50.1263 Ops/s | 50.4413 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8789ms | 0.2258ms | 4.4280 KOps/s | 4.5522 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.8347ms | 1.3535ms | 738.8081 Ops/s | 686.3056 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.9824ms | 2.3704ms | 421.8714 Ops/s | 431.3540 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.3994ms | 2.9444ms | 339.6242 Ops/s | 337.7618 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2513ms | 0.1670ms | 5.9890 KOps/s | 6.0917 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.6687ms | 0.2283ms | 4.3803 KOps/s | 4.2318 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.9704ms | 1.8577ms | 538.2895 Ops/s | 534.6035 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.5954ms | 1.4035ms | 712.4836 Ops/s | 739.4067 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.5918ms | 1.1654ms | 858.0637 Ops/s | 871.6496 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.7084ms | 3.6215ms | 276.1310 Ops/s | 273.0836 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 5.9718ms | 5.8792ms | 170.0925 Ops/s | 175.1174 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.6591ms | 7.4954ms | 133.4159 Ops/s | 142.1125 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4600ms | 0.2717ms | 3.6806 KOps/s | 3.6133 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.6904ms | 1.4646ms | 682.7867 Ops/s | 637.4237 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.6728ms | 2.4971ms | 400.4624 Ops/s | 409.7371 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.3224ms | 3.1857ms | 313.8984 Ops/s | 317.6811 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 35.1205ms | 34.3045ms | 29.1507 Ops/s | 29.3452 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 69.6800ms | 68.1851ms | 14.6660 Ops/s | 14.9899 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 39.9600ms | 39.2535ms | 25.4755 Ops/s | 25.8875 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 78.7476ms | 77.2681ms | 12.9419 Ops/s | 13.3243 Ops/s | |
| test_collector_without_rb_cuda[100-img_shape0-atari] | 59.9985ms | 59.4346ms | 16.8252 Ops/s | 17.1584 Ops/s | |
| test_collector_without_rb_cuda[200-img_shape1-large_batch] | 0.1186s | 0.1175s | 8.5112 Ops/s | 8.7291 Ops/s | |
| test_collector_with_rb_cuda[100-img_shape0-atari] | 61.9554ms | 60.9052ms | 16.4190 Ops/s | 16.9203 Ops/s | |
| test_collector_with_rb_cuda[200-img_shape1-large_batch] | 0.1233s | 0.1224s | 8.1697 Ops/s | 8.5299 Ops/s |
Replace multiprocessing.Event (futex-based syscalls) with multiprocessing.RawArray shared-memory byte flags for worker-to-parent completion signaling on the hot path (step_and_maybe_reset). - _start_workers: creates shm_done_flags RawArray, passes to workers - _wait_for_workers: spin-polls done_flags instead of Event.wait() - Worker: _signal_done() closure writes shm_done_flags[idx]=1 - _shutdown_workers: uses _wait_for_workers instead of Event.wait() Measured impact: - 10% FPS improvement (7,737 -> 8,509 fps) on H200 with 8 workers - 28% reduction in penv.wait_for_workers overhead (2,622us -> 1,891us) - ParallelEnv.close() fixed from 80s timeout to ~0.9s Co-authored-by: Cursor <[email protected]> ghstack-source-id: f29522a Pull-Request: #3457 Co-authored-by: Cursor <[email protected]>
Optimise the output-reading phase of step_and_maybe_reset when shared memory and target device are both known and different (the common CPU-shared -> CUDA case). - When shared_device is not None and shared_device != device: use a single td.to(device) instead of _fast_apply with per-tensor check. Since .to() already creates new tensors, the extra .clone() is unnecessary. - Keep the _fast_apply fallback for the mixed-device case. - Move _sync_w2m() into a conditional - only called when a cross-device transfer actually happened. Co-authored-by: Cursor <[email protected]> ghstack-source-id: 07aba16 Pull-Request: #3458
Add two configuration knobs to the Dreamer example:
1. env.parallel_env_mode ("parallel" | "serial"): switches the train
environment between ParallelEnv (uses IPC) and SerialEnv (no IPC
overhead, better for cheap envs or when GPU contention between
collector workers degrades throughput).
2. collector.compile block (enabled, backend, cudagraphs): passes
compilation settings to MultiCollector via compile_policy and
cudagraph_policy kwargs, enabling torch.compile + CUDA graphs for
the policy in collector workers.
Parallel mode remains the default.
Co-authored-by: Cursor <[email protected]>
3dce31f to
3821082
Compare
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
1 similar comment
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
Add two configuration knobs to the Dreamer example:
1. env.parallel_env_mode ("parallel" | "serial"): switches the train
environment between ParallelEnv (uses IPC) and SerialEnv (no IPC
overhead, better for cheap envs or debugging).
2. collector.compile block (enabled, backend, cudagraphs): passes
compilation settings to MultiCollector via compile_policy and
cudagraph_policy kwargs, enabling torch.compile + CUDA graphs for
the policy in collector workers.
Co-authored-by: Cursor <[email protected]>
ghstack-source-id: 4876b37
Pull-Request: #3459
|
Rebasing |
|
Rebase failed. |
|
@torchrlbot rebase |
|
Rebasing |
|
Rebase failed. |
Add two configuration knobs to the Dreamer example:
env.parallel_env_mode("parallel" | "serial"): switches the trainenvironment between ParallelEnv (uses IPC) and SerialEnv (no IPC
overhead, better for cheap envs or when GPU contention between
collector workers degrades throughput). Parallel mode remains the default.
collector.compileblock (enabled, backend, cudagraphs): passescompilation settings to MultiCollector via
compile_policyandcudagraph_policykwargs, enablingtorch.compile+ CUDA graphs forthe policy in collector workers.