[Example] Dreamer: distributed prof profiling integration#3461
[Example] Dreamer: distributed prof profiling integration#3461vmoens wants to merge 2 commits intogh/vmoens/222/basefrom
Conversation
Integrate the prof distributed profiler into the Dreamer training loop and collector workers for coordinated cross-process profiling. - dreamer_utils.py: add create_prof_handle(), extend DreamerProfiler with prof_handle param, step(), shm_name property, finish() cleanup. Add prof_shm_name param to make_collector with PROF_SHM_NAME env var. - dreamer.py: create prof_handle early, pass shm_name to collector, wrap training phases with _prof_context (sample, world_model, actor, value, weight_update), call profiler.finish() at cleanup. - config.yaml: add profiling.distributed block, raise total_optim_steps to 70 for prof window. - _runner.py: worker reads PROF_SHM_NAME/PROF_ENABLED env vars and calls prof.prepare() to join profiling. Wraps rollout in prof context. Co-authored-by: Cursor <[email protected]> ghstack-source-id: 9fcc2e8 Pull-Request: #3461
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3461
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 81a8672 with merge base ab49b59 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
Integrate the prof distributed profiler into the Dreamer training loop and collector workers for coordinated cross-process profiling. - dreamer_utils.py: add create_prof_handle(), extend DreamerProfiler with prof_handle param, step(), shm_name property, finish() cleanup. Add prof_shm_name param to make_collector with PROF_SHM_NAME env var. - dreamer.py: create prof_handle early, pass shm_name to collector, wrap training phases with _prof_context (sample, world_model, actor, value, weight_update), call profiler.finish() at cleanup. - config.yaml: add profiling.distributed block, raise total_optim_steps to 70 for prof window. - _runner.py: worker reads PROF_SHM_NAME/PROF_ENABLED env vars and calls prof.prepare() to join profiling. Wraps rollout in prof context. Co-authored-by: Cursor <[email protected]> ghstack-source-id: fb5d3fa Pull-Request: #3461
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 81.9714μs | 80.4178μs | 12.4351 KOps/s | 12.5940 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1396ms | 0.1391ms | 7.1882 KOps/s | 7.1587 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1199s | 0.1196s | 8.3610 Ops/s | 8.9507 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.5592μs | 2.5532μs | 391.6659 KOps/s | 378.9859 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 37.5680μs | 37.3735μs | 26.7569 KOps/s | 25.9256 KOps/s | |
| test_simple | 0.5491s | 0.5470s | 1.8281 Ops/s | 1.7340 Ops/s | |
| test_transformed | 1.1308s | 1.1279s | 0.8866 Ops/s | 0.8638 Ops/s | |
| test_serial | 1.6755s | 1.6728s | 0.5978 Ops/s | 0.5899 Ops/s | |
| test_parallel | 1.1615s | 1.0616s | 0.9419 Ops/s | 0.9476 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.2114ms | 43.7915μs | 22.8355 KOps/s | 21.9813 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 53.9420μs | 24.6790μs | 40.5203 KOps/s | 39.1385 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 56.3220μs | 24.6898μs | 40.5026 KOps/s | 39.4965 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 45.6420μs | 13.5929μs | 73.5681 KOps/s | 71.9130 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 83.3530μs | 46.9908μs | 21.2808 KOps/s | 20.9800 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 53.2120μs | 27.1959μs | 36.7702 KOps/s | 35.6000 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 60.4530μs | 26.8183μs | 37.2880 KOps/s | 36.0685 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 43.8910μs | 16.3734μs | 61.0747 KOps/s | 59.9966 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 82.5230μs | 50.3530μs | 19.8598 KOps/s | 19.8513 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 68.0320μs | 30.3181μs | 32.9836 KOps/s | 32.3942 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 68.5630μs | 27.2062μs | 36.7563 KOps/s | 35.9173 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 44.2220μs | 16.3452μs | 61.1802 KOps/s | 59.7286 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 81.0630μs | 51.2073μs | 19.5285 KOps/s | 18.9437 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 63.3820μs | 33.1215μs | 30.1919 KOps/s | 29.8508 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 56.5020μs | 30.1524μs | 33.1649 KOps/s | 33.0617 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 62.2030μs | 19.2947μs | 51.8278 KOps/s | 51.9971 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 88.5730μs | 50.1685μs | 19.9328 KOps/s | 19.6631 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 56.3820μs | 30.1107μs | 33.2108 KOps/s | 32.3797 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.3249ms | 32.3444μs | 30.9172 KOps/s | 31.7798 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 49.5420μs | 18.4213μs | 54.2851 KOps/s | 53.5546 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 79.5320μs | 52.4181μs | 19.0774 KOps/s | 18.1223 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 70.5220μs | 33.1365μs | 30.1782 KOps/s | 29.5167 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 72.8330μs | 33.6570μs | 29.7115 KOps/s | 28.7029 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 54.6720μs | 21.1092μs | 47.3728 KOps/s | 45.5479 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 91.3530μs | 55.7630μs | 17.9330 KOps/s | 17.8681 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 67.3620μs | 36.1521μs | 27.6609 KOps/s | 26.0794 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 78.5330μs | 34.3335μs | 29.1261 KOps/s | 29.2575 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 50.5710μs | 20.5631μs | 48.6307 KOps/s | 47.9002 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 95.0130μs | 57.1223μs | 17.5063 KOps/s | 17.3075 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 72.6030μs | 38.1447μs | 26.2160 KOps/s | 24.5534 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 70.7020μs | 35.6375μs | 28.0603 KOps/s | 27.2804 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 49.9320μs | 23.4060μs | 42.7241 KOps/s | 41.8769 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.7540s | 0.7498s | 1.3337 Ops/s | 1.2791 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7288s | 0.6361s | 1.5722 Ops/s | 1.5642 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7572s | 1.6784s | 0.5958 Ops/s | 0.5918 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5364s | 1.4610s | 0.6845 Ops/s | 0.6768 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 2.0170s | 1.9380s | 0.5160 Ops/s | 0.5138 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.8055s | 1.7162s | 0.5827 Ops/s | 0.5821 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.8064s | 4.6847s | 0.2135 Ops/s | 0.2164 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.5416s | 4.4418s | 0.2251 Ops/s | 0.2233 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9825s | 1.8756s | 0.5332 Ops/s | 0.5290 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.7365s | 1.6196s | 0.6174 Ops/s | 0.6173 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 11.3490ms | 10.5744ms | 94.5681 Ops/s | 94.9608 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 13.3687ms | 11.1628ms | 89.5836 Ops/s | 56.0668 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.2454ms | 0.1392ms | 7.1850 KOps/s | 7.6702 KOps/s | |
| test_values[td1_return_estimate-False-False] | 30.0891ms | 28.6312ms | 34.9269 Ops/s | 34.8131 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 12.3284ms | 11.2213ms | 89.1164 Ops/s | 55.4030 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 43.9387ms | 42.3223ms | 23.6282 Ops/s | 23.4010 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 11.4967ms | 11.1779ms | 89.4624 Ops/s | 55.2546 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 9.5669ms | 9.4397ms | 105.9355 Ops/s | 106.1150 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.7128ms | 1.4620ms | 683.9894 Ops/s | 648.7300 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.5293ms | 0.4329ms | 2.3101 KOps/s | 2.2752 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 23.0991ms | 22.6817ms | 44.0885 Ops/s | 28.7245 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 2.1763ms | 1.7392ms | 574.9699 Ops/s | 575.7714 Ops/s | |
| test_dqn_speed[False-None] | 1.5369ms | 1.4186ms | 704.9047 Ops/s | 709.0873 Ops/s | |
| test_dqn_speed[False-backward] | 2.0576ms | 1.9146ms | 522.3048 Ops/s | 521.5697 Ops/s | |
| test_dqn_speed[True-None] | 0.9526ms | 0.5647ms | 1.7709 KOps/s | 1.7898 KOps/s | |
| test_dqn_speed[True-backward] | 1.0518ms | 1.0095ms | 990.6108 Ops/s | 956.9437 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.9382ms | 0.5464ms | 1.8301 KOps/s | 1.8115 KOps/s | |
| test_ddpg_speed[False-None] | 3.2321ms | 2.8543ms | 350.3449 Ops/s | 351.3500 Ops/s | |
| test_ddpg_speed[False-backward] | 4.2083ms | 4.0510ms | 246.8537 Ops/s | 247.5544 Ops/s | |
| test_ddpg_speed[True-None] | 1.8064ms | 1.4204ms | 704.0450 Ops/s | 696.7728 Ops/s | |
| test_ddpg_speed[True-backward] | 2.5242ms | 2.4000ms | 416.6660 Ops/s | 359.6709 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.5067ms | 1.4151ms | 706.6742 Ops/s | 705.9793 Ops/s | |
| test_sac_speed[False-None] | 8.9498ms | 8.0501ms | 124.2217 Ops/s | 123.4766 Ops/s | |
| test_sac_speed[False-backward] | 11.7339ms | 11.2702ms | 88.7292 Ops/s | 88.2035 Ops/s | |
| test_sac_speed[True-None] | 2.5568ms | 2.1410ms | 467.0633 Ops/s | 452.9656 Ops/s | |
| test_sac_speed[True-backward] | 4.1907ms | 4.0056ms | 249.6516 Ops/s | 237.3599 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 2.2362ms | 2.1406ms | 467.1546 Ops/s | 454.6438 Ops/s | |
| test_redq_speed[False-None] | 11.0735ms | 10.4411ms | 95.7753 Ops/s | 89.2847 Ops/s | |
| test_redq_speed[False-backward] | 18.5904ms | 17.5305ms | 57.0434 Ops/s | 57.4974 Ops/s | |
| test_redq_speed[True-None] | 4.6685ms | 4.3776ms | 228.4378 Ops/s | 220.0057 Ops/s | |
| test_redq_speed[True-backward] | 10.0418ms | 9.4883ms | 105.3933 Ops/s | 102.7289 Ops/s | |
| test_redq_speed[reduce-overhead-None] | 4.6276ms | 4.3939ms | 227.5907 Ops/s | 233.3204 Ops/s | |
| test_redq_deprec_speed[False-None] | 11.6888ms | 10.9540ms | 91.2913 Ops/s | 90.3695 Ops/s | |
| test_redq_deprec_speed[False-backward] | 16.3804ms | 15.8416ms | 63.1251 Ops/s | 63.1792 Ops/s | |
| test_redq_deprec_speed[True-None] | 3.9467ms | 3.5893ms | 278.6087 Ops/s | 270.5826 Ops/s | |
| test_redq_deprec_speed[True-backward] | 7.6970ms | 7.3856ms | 135.3992 Ops/s | 128.6004 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 3.8376ms | 3.5450ms | 282.0890 Ops/s | 209.2363 Ops/s | |
| test_td3_speed[False-None] | 8.1192ms | 8.0519ms | 124.1942 Ops/s | 122.4665 Ops/s | |
| test_td3_speed[False-backward] | 11.3557ms | 10.9572ms | 91.2640 Ops/s | 91.2569 Ops/s | |
| test_td3_speed[True-None] | 1.9121ms | 1.8286ms | 546.8723 Ops/s | 530.3264 Ops/s | |
| test_td3_speed[True-backward] | 3.7411ms | 3.5937ms | 278.2648 Ops/s | 268.3287 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 1.8140ms | 1.7867ms | 559.6858 Ops/s | 550.4837 Ops/s | |
| test_cql_speed[False-None] | 28.5991ms | 25.9187ms | 38.5822 Ops/s | 38.3861 Ops/s | |
| test_cql_speed[False-backward] | 35.5012ms | 34.9471ms | 28.6147 Ops/s | 27.9983 Ops/s | |
| test_cql_speed[True-None] | 12.5895ms | 12.2032ms | 81.9457 Ops/s | 79.6172 Ops/s | |
| test_cql_speed[True-backward] | 22.8980ms | 19.0495ms | 52.4948 Ops/s | 54.1337 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 12.4942ms | 12.1998ms | 81.9683 Ops/s | 80.3684 Ops/s | |
| test_a2c_speed[False-None] | 5.6766ms | 5.2866ms | 189.1580 Ops/s | 190.1216 Ops/s | |
| test_a2c_speed[False-backward] | 11.7866ms | 11.4974ms | 86.9765 Ops/s | 86.7902 Ops/s | |
| test_a2c_speed[True-None] | 3.8402ms | 3.6646ms | 272.8840 Ops/s | 249.4012 Ops/s | |
| test_a2c_speed[True-backward] | 8.7819ms | 8.5023ms | 117.6148 Ops/s | 117.3691 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 4.1320ms | 3.7185ms | 268.9263 Ops/s | 268.8408 Ops/s | |
| test_ppo_speed[False-None] | 6.3440ms | 5.9664ms | 167.6064 Ops/s | 172.6181 Ops/s | |
| test_ppo_speed[False-backward] | 12.8446ms | 12.5237ms | 79.8484 Ops/s | 82.5908 Ops/s | |
| test_ppo_speed[True-None] | 3.8051ms | 3.6258ms | 275.8015 Ops/s | 274.5920 Ops/s | |
| test_ppo_speed[True-backward] | 8.5872ms | 8.3270ms | 120.0916 Ops/s | 119.8195 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 4.1099ms | 3.6293ms | 275.5357 Ops/s | 276.4191 Ops/s | |
| test_reinforce_speed[False-None] | 4.7382ms | 4.4461ms | 224.9162 Ops/s | 224.8792 Ops/s | |
| test_reinforce_speed[False-backward] | 7.4397ms | 7.2591ms | 137.7579 Ops/s | 137.0032 Ops/s | |
| test_reinforce_speed[True-None] | 3.3218ms | 2.8318ms | 353.1330 Ops/s | 346.6011 Ops/s | |
| test_reinforce_speed[True-backward] | 7.9572ms | 7.7236ms | 129.4726 Ops/s | 130.3618 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 3.3917ms | 2.8786ms | 347.3884 Ops/s | 329.0131 Ops/s | |
| test_iql_speed[False-None] | 20.5886ms | 19.8018ms | 50.5005 Ops/s | 49.4309 Ops/s | |
| test_iql_speed[False-backward] | 30.8469ms | 30.1955ms | 33.1175 Ops/s | 33.1167 Ops/s | |
| test_iql_speed[True-None] | 10.1337ms | 8.5815ms | 116.5297 Ops/s | 124.8100 Ops/s | |
| test_iql_speed[True-backward] | 16.8548ms | 16.4580ms | 60.7609 Ops/s | 65.0262 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 8.7686ms | 8.4892ms | 117.7964 Ops/s | 124.4472 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.2981ms | 6.1095ms | 163.6782 Ops/s | 162.7361 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.7427ms | 0.3283ms | 3.0464 KOps/s | 3.5111 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6078ms | 0.3098ms | 3.2280 KOps/s | 3.3479 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1695ms | 5.9103ms | 169.1970 Ops/s | 169.4494 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.0979ms | 0.3409ms | 2.9334 KOps/s | 3.5959 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6830ms | 0.2906ms | 3.4413 KOps/s | 3.7554 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6234ms | 1.3482ms | 741.7349 Ops/s | 781.6581 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.6129ms | 1.2799ms | 781.3063 Ops/s | 831.5050 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 9.5821ms | 6.2098ms | 161.0359 Ops/s | 166.3845 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.8910ms | 0.4528ms | 2.2085 KOps/s | 2.1058 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6669ms | 0.4510ms | 2.2172 KOps/s | 2.3662 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.0343ms | 5.9319ms | 168.5810 Ops/s | 169.1624 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.9126ms | 0.2811ms | 3.5569 KOps/s | 2.8756 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.4932ms | 0.2635ms | 3.7955 KOps/s | 3.7588 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.2174ms | 5.9519ms | 168.0136 Ops/s | 171.6379 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7580ms | 0.3400ms | 2.9412 KOps/s | 3.6017 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.4775ms | 0.2904ms | 3.4435 KOps/s | 3.7553 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.2133ms | 6.0734ms | 164.6524 Ops/s | 167.4929 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 2.2860ms | 0.4653ms | 2.1492 KOps/s | 2.1560 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6956ms | 0.4515ms | 2.2147 KOps/s | 2.3217 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.4152ms | 4.9674ms | 201.3108 Ops/s | 57.1616 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 8.9539ms | 2.1538ms | 464.2880 Ops/s | 546.4382 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 10.2039ms | 1.2687ms | 788.1990 Ops/s | 913.4330 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.5439s | 15.8645ms | 63.0337 Ops/s | 196.6234 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 3.8493ms | 1.8154ms | 550.8380 Ops/s | 564.7911 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 8.8579ms | 1.1923ms | 838.7207 Ops/s | 1.1377 KOps/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 6.7149ms | 5.2759ms | 189.5429 Ops/s | 59.1997 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 11.0944ms | 2.0709ms | 482.8840 Ops/s | 521.9777 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 3.5603ms | 1.0817ms | 924.4834 Ops/s | 962.4025 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 37.7485ms | 35.5808ms | 28.1050 Ops/s | 27.7201 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 20.0912ms | 18.2182ms | 54.8901 Ops/s | 55.2792 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 41.5286ms | 36.9829ms | 27.0395 Ops/s | 26.5401 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 19.7997ms | 18.3343ms | 54.5427 Ops/s | 55.1110 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 40.6718ms | 38.7394ms | 25.8135 Ops/s | 25.1131 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.6333ms | 20.0846ms | 49.7893 Ops/s | 49.8894 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8535ms | 0.2195ms | 4.5561 KOps/s | 4.3975 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.6452ms | 1.3896ms | 719.6240 Ops/s | 722.5334 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.6357ms | 2.3896ms | 418.4725 Ops/s | 420.4122 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.1042ms | 2.9287ms | 341.4426 Ops/s | 346.7655 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2154ms | 0.1329ms | 7.5248 KOps/s | 7.4564 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3627ms | 0.1847ms | 5.4130 KOps/s | 5.0930 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.8900ms | 1.7354ms | 576.2464 Ops/s | 585.6858 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.4558ms | 1.3051ms | 766.2198 Ops/s | 786.1812 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.2360ms | 1.1327ms | 882.8553 Ops/s | 892.7334 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 7.6040ms | 3.6612ms | 273.1361 Ops/s | 280.8150 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 10.4480ms | 5.7237ms | 174.7123 Ops/s | 171.3989 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 15.0928ms | 7.1324ms | 140.2058 Ops/s | 136.8342 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.5009ms | 0.2762ms | 3.6202 KOps/s | 3.6312 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.6548ms | 1.5129ms | 660.9723 Ops/s | 669.9642 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.6262ms | 2.5023ms | 399.6355 Ops/s | 402.3489 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.2585ms | 3.1306ms | 319.4291 Ops/s | 321.1255 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 35.3018ms | 34.8078ms | 28.7292 Ops/s | 28.7168 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 68.5428ms | 68.3163ms | 14.6378 Ops/s | 14.5689 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 43.6421ms | 43.0471ms | 23.2304 Ops/s | 23.3731 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 0.1077s | 86.6539ms | 11.5402 Ops/s | 11.9794 Ops/s |
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 79.7302μs | 78.6843μs | 12.7090 KOps/s | 12.7862 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1417ms | 0.1390ms | 7.1960 KOps/s | 7.3818 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1076s | 0.1052s | 9.5028 Ops/s | 9.5499 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.4537μs | 2.4461μs | 408.8122 KOps/s | 408.9838 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 38.7597μs | 37.1732μs | 26.9011 KOps/s | 27.3002 KOps/s | |
| test_simple | 0.7800s | 0.7789s | 1.2838 Ops/s | 1.2364 Ops/s | |
| test_transformed | 1.5125s | 1.4199s | 0.7043 Ops/s | 0.6930 Ops/s | |
| test_serial | 2.3707s | 2.2772s | 0.4391 Ops/s | 0.4266 Ops/s | |
| test_parallel | 1.9546s | 1.8576s | 0.5383 Ops/s | 0.5437 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.2658ms | 43.6062μs | 22.9325 KOps/s | 22.3227 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 49.1710μs | 25.0229μs | 39.9634 KOps/s | 39.9852 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 67.9810μs | 24.2887μs | 41.1714 KOps/s | 41.3346 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 41.2400μs | 13.4824μs | 74.1708 KOps/s | 75.6136 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 77.8510μs | 46.7818μs | 21.3758 KOps/s | 21.9292 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 68.6610μs | 27.1009μs | 36.8991 KOps/s | 37.0535 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 53.0610μs | 27.3194μs | 36.6041 KOps/s | 37.3614 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 44.1310μs | 16.2117μs | 61.6838 KOps/s | 60.9351 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 99.4910μs | 49.8634μs | 20.0548 KOps/s | 20.2444 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 63.5510μs | 30.0311μs | 33.2988 KOps/s | 32.8250 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 56.5010μs | 27.0921μs | 36.9111 KOps/s | 36.5858 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 50.4710μs | 16.3000μs | 61.3497 KOps/s | 61.5019 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 91.0720μs | 51.4239μs | 19.4462 KOps/s | 19.4152 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 98.2720μs | 32.1518μs | 31.1024 KOps/s | 30.6289 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 61.6100μs | 29.7844μs | 33.5747 KOps/s | 33.9981 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 51.4410μs | 18.9064μs | 52.8920 KOps/s | 52.8714 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 86.0910μs | 49.4534μs | 20.2211 KOps/s | 20.5868 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 61.5910μs | 30.1782μs | 33.1365 KOps/s | 33.2673 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.3422ms | 30.7947μs | 32.4731 KOps/s | 32.8806 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 60.3000μs | 18.0111μs | 55.5213 KOps/s | 56.3973 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 81.4810μs | 52.3299μs | 19.1095 KOps/s | 19.3300 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 63.7910μs | 32.6421μs | 30.6353 KOps/s | 30.9460 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 71.6210μs | 33.4908μs | 29.8590 KOps/s | 30.8141 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 47.9410μs | 20.7483μs | 48.1968 KOps/s | 49.3681 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 86.7110μs | 54.8677μs | 18.2257 KOps/s | 18.2194 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 80.3110μs | 35.2581μs | 28.3623 KOps/s | 28.2548 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 67.7510μs | 33.5490μs | 29.8072 KOps/s | 29.8033 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 68.9110μs | 20.6498μs | 48.4266 KOps/s | 49.5428 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 91.1610μs | 56.6479μs | 17.6529 KOps/s | 17.8347 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 65.6610μs | 37.2144μs | 26.8713 KOps/s | 26.4944 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 67.7010μs | 35.2198μs | 28.3931 KOps/s | 28.3440 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 51.2300μs | 22.8150μs | 43.8309 KOps/s | 44.6531 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8534s | 0.7639s | 1.3090 Ops/s | 1.3267 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7187s | 0.6234s | 1.6040 Ops/s | 1.6103 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7367s | 1.6446s | 0.6081 Ops/s | 0.6077 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.4944s | 1.4183s | 0.7051 Ops/s | 0.6989 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9684s | 1.8873s | 0.5299 Ops/s | 0.5270 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7447s | 1.6655s | 0.6004 Ops/s | 0.5956 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.6796s | 4.5663s | 0.2190 Ops/s | 0.2152 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.4785s | 4.4016s | 0.2272 Ops/s | 0.2244 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9794s | 1.8922s | 0.5285 Ops/s | 0.5262 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.6777s | 1.5889s | 0.6294 Ops/s | 0.6212 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 21.4074ms | 20.7367ms | 48.2238 Ops/s | 48.5099 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 0.1385s | 3.6941ms | 270.7011 Ops/s | 287.6128 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.1140ms | 81.9209μs | 12.2069 KOps/s | 11.8899 KOps/s | |
| test_values[td1_return_estimate-False-False] | 52.0084ms | 49.9712ms | 20.0115 Ops/s | 20.5872 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 1.3740ms | 1.0864ms | 920.5035 Ops/s | 916.4313 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 79.6306ms | 79.0792ms | 12.6456 Ops/s | 12.5459 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 1.2884ms | 1.0847ms | 921.8726 Ops/s | 917.6236 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 20.9546ms | 20.7761ms | 48.1323 Ops/s | 48.0376 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0647ms | 0.7511ms | 1.3313 KOps/s | 1.3112 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7418ms | 0.6788ms | 1.4732 KOps/s | 1.4637 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5288ms | 1.4898ms | 671.2185 Ops/s | 668.1889 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.8129ms | 0.6930ms | 1.4429 KOps/s | 1.4285 KOps/s | |
| test_dqn_speed[False-None] | 1.5949ms | 1.5154ms | 659.8942 Ops/s | 653.2036 Ops/s | |
| test_dqn_speed[False-backward] | 2.3956ms | 2.1737ms | 460.0397 Ops/s | 453.4731 Ops/s | |
| test_dqn_speed[True-None] | 1.1564ms | 0.5614ms | 1.7812 KOps/s | 1.7688 KOps/s | |
| test_dqn_speed[True-backward] | 1.1301ms | 1.0928ms | 915.0575 Ops/s | 909.1358 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.7440ms | 0.5795ms | 1.7257 KOps/s | 1.6625 KOps/s | |
| test_ddpg_speed[False-None] | 3.2263ms | 2.8679ms | 348.6892 Ops/s | 348.8748 Ops/s | |
| test_ddpg_speed[False-backward] | 4.6257ms | 4.2010ms | 238.0405 Ops/s | 239.6936 Ops/s | |
| test_ddpg_speed[True-None] | 1.3662ms | 1.3029ms | 767.5388 Ops/s | 759.6154 Ops/s | |
| test_ddpg_speed[True-backward] | 2.6152ms | 2.3788ms | 420.3881 Ops/s | 395.8590 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.5376ms | 1.3567ms | 737.0580 Ops/s | 733.2170 Ops/s | |
| test_sac_speed[False-None] | 9.1393ms | 8.4387ms | 118.5011 Ops/s | 119.7708 Ops/s | |
| test_sac_speed[False-backward] | 12.0666ms | 11.5029ms | 86.9347 Ops/s | 87.3794 Ops/s | |
| test_sac_speed[True-None] | 1.8534ms | 1.8016ms | 555.0469 Ops/s | 552.0576 Ops/s | |
| test_sac_speed[True-backward] | 3.4713ms | 3.4154ms | 292.7938 Ops/s | 275.9848 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 19.5643ms | 11.0288ms | 90.6721 Ops/s | 82.1177 Ops/s | |
| test_redq_deprec_speed[False-None] | 9.8630ms | 9.2749ms | 107.8184 Ops/s | 106.5011 Ops/s | |
| test_redq_deprec_speed[False-backward] | 13.0648ms | 12.4553ms | 80.2873 Ops/s | 77.1979 Ops/s | |
| test_redq_deprec_speed[True-None] | 2.6596ms | 2.5229ms | 396.3769 Ops/s | 396.4184 Ops/s | |
| test_redq_deprec_speed[True-backward] | 4.5795ms | 4.2927ms | 232.9516 Ops/s | 228.9328 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 16.3617ms | 9.8832ms | 101.1817 Ops/s | 100.7439 Ops/s | |
| test_td3_speed[False-None] | 8.2081ms | 8.1300ms | 123.0013 Ops/s | 120.1896 Ops/s | |
| test_td3_speed[False-backward] | 12.4791ms | 10.9977ms | 90.9283 Ops/s | 91.6869 Ops/s | |
| test_td3_speed[True-None] | 1.7305ms | 1.6391ms | 610.0739 Ops/s | 592.4491 Ops/s | |
| test_td3_speed[True-backward] | 3.3213ms | 3.2410ms | 308.5515 Ops/s | 300.4419 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 72.3890ms | 24.7920ms | 40.3355 Ops/s | 40.6983 Ops/s | |
| test_cql_speed[False-None] | 17.3035ms | 17.0478ms | 58.6585 Ops/s | 57.9531 Ops/s | |
| test_cql_speed[False-backward] | 22.8615ms | 22.4263ms | 44.5905 Ops/s | 43.2893 Ops/s | |
| test_cql_speed[True-None] | 3.4561ms | 3.2446ms | 308.2056 Ops/s | 306.3068 Ops/s | |
| test_cql_speed[True-backward] | 5.9536ms | 5.3766ms | 185.9904 Ops/s | 179.5191 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 19.5312ms | 11.9722ms | 83.5271 Ops/s | 83.6281 Ops/s | |
| test_a2c_speed[False-None] | 4.0005ms | 3.2113ms | 311.3957 Ops/s | 307.5976 Ops/s | |
| test_a2c_speed[False-backward] | 6.5720ms | 6.2236ms | 160.6780 Ops/s | 154.6403 Ops/s | |
| test_a2c_speed[True-None] | 1.3772ms | 1.3002ms | 769.1123 Ops/s | 741.8681 Ops/s | |
| test_a2c_speed[True-backward] | 3.1153ms | 2.9702ms | 336.6741 Ops/s | 322.1185 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 1.0546ms | 0.9752ms | 1.0254 KOps/s | 1.0343 KOps/s | |
| test_ppo_speed[False-None] | 4.0429ms | 3.8793ms | 257.7768 Ops/s | 256.4023 Ops/s | |
| test_ppo_speed[False-backward] | 7.4520ms | 7.0350ms | 142.1471 Ops/s | 137.2273 Ops/s | |
| test_ppo_speed[True-None] | 1.4911ms | 1.4128ms | 707.8307 Ops/s | 710.2691 Ops/s | |
| test_ppo_speed[True-backward] | 3.2865ms | 3.2372ms | 308.9124 Ops/s | 302.8261 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 1.1037ms | 1.0291ms | 971.6951 Ops/s | 949.0906 Ops/s | |
| test_reinforce_speed[False-None] | 2.4146ms | 2.2636ms | 441.7722 Ops/s | 435.8453 Ops/s | |
| test_reinforce_speed[False-backward] | 3.8636ms | 3.4418ms | 290.5418 Ops/s | 289.4021 Ops/s | |
| test_reinforce_speed[True-None] | 1.4444ms | 1.2750ms | 784.3392 Ops/s | 771.1695 Ops/s | |
| test_reinforce_speed[True-backward] | 3.0641ms | 3.0216ms | 330.9528 Ops/s | 327.0702 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 17.5721ms | 9.5031ms | 105.2292 Ops/s | 104.4530 Ops/s | |
| test_iql_speed[False-None] | 9.9641ms | 9.3900ms | 106.4965 Ops/s | 105.5151 Ops/s | |
| test_iql_speed[False-backward] | 13.8885ms | 13.4048ms | 74.6002 Ops/s | 73.7726 Ops/s | |
| test_iql_speed[True-None] | 2.3561ms | 2.1589ms | 463.2091 Ops/s | 456.2250 Ops/s | |
| test_iql_speed[True-backward] | 5.1153ms | 4.8117ms | 207.8271 Ops/s | 202.5516 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 18.2412ms | 10.5763ms | 94.5507 Ops/s | 94.4060 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.2469ms | 5.7540ms | 173.7925 Ops/s | 173.5837 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.9689ms | 0.2815ms | 3.5526 KOps/s | 3.0118 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5151ms | 0.2652ms | 3.7705 KOps/s | 3.2519 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.8722ms | 5.5397ms | 180.5141 Ops/s | 181.7990 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.5273ms | 0.2931ms | 3.4118 KOps/s | 3.2901 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6565ms | 0.3060ms | 3.2682 KOps/s | 3.2746 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6957ms | 1.2694ms | 787.7550 Ops/s | 785.5102 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.4422ms | 1.1905ms | 839.9995 Ops/s | 843.7813 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.1995ms | 5.7503ms | 173.9045 Ops/s | 176.4271 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 2.3271ms | 0.5208ms | 1.9201 KOps/s | 2.2984 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7188ms | 0.5031ms | 1.9876 KOps/s | 2.4311 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.1610ms | 5.6825ms | 175.9798 Ops/s | 173.7787 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7530ms | 0.3541ms | 2.8239 KOps/s | 3.5642 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6045ms | 0.3401ms | 2.9406 KOps/s | 3.8058 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.2045ms | 5.6275ms | 177.6973 Ops/s | 177.6494 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.1376ms | 0.3523ms | 2.8388 KOps/s | 3.3293 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5416ms | 0.3398ms | 2.9428 KOps/s | 3.6793 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.0688ms | 5.7928ms | 172.6279 Ops/s | 170.7655 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.9738ms | 0.5299ms | 1.8872 KOps/s | 2.1507 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6174ms | 0.4160ms | 2.4037 KOps/s | 2.3177 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.6080s | 17.2109ms | 58.1026 Ops/s | 199.5611 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 9.4041ms | 2.0445ms | 489.1091 Ops/s | 433.0783 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 10.4729ms | 1.3188ms | 758.2758 Ops/s | 1.0600 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 7.5527ms | 5.1836ms | 192.9150 Ops/s | 50.4632 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 3.9898ms | 1.8153ms | 550.8852 Ops/s | 528.8491 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.0937ms | 0.9505ms | 1.0521 KOps/s | 868.0691 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 8.3841ms | 5.4134ms | 184.7253 Ops/s | 188.0729 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 4.2034ms | 1.9038ms | 525.2550 Ops/s | 507.0348 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 3.8913ms | 1.1681ms | 856.0703 Ops/s | 930.5498 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 37.6822ms | 35.7817ms | 27.9472 Ops/s | 27.5141 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 0.5867s | 29.5420ms | 33.8502 Ops/s | 54.0830 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 39.5651ms | 36.6801ms | 27.2627 Ops/s | 26.9916 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 19.7069ms | 18.1734ms | 55.0255 Ops/s | 53.7949 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 40.3450ms | 38.1327ms | 26.2242 Ops/s | 25.6627 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.3706ms | 19.8437ms | 50.3938 Ops/s | 50.3783 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8727ms | 0.2150ms | 4.6501 KOps/s | 4.3202 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.9024ms | 1.4402ms | 694.3281 Ops/s | 695.1602 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.8296ms | 2.2987ms | 435.0303 Ops/s | 428.5054 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.4655ms | 2.9646ms | 337.3140 Ops/s | 332.7779 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.3172ms | 0.1653ms | 6.0488 KOps/s | 6.0895 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3493ms | 0.2318ms | 4.3140 KOps/s | 4.3761 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 2.0105ms | 1.8523ms | 539.8592 Ops/s | 545.7263 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.8751ms | 1.4152ms | 706.6159 Ops/s | 709.4145 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.5713ms | 1.1156ms | 896.4015 Ops/s | 889.9915 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 4.0076ms | 3.5782ms | 279.4734 Ops/s | 266.5387 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 11.2621ms | 5.9898ms | 166.9499 Ops/s | 174.0638 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.9761ms | 7.0882ms | 141.0804 Ops/s | 139.5463 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.7353ms | 0.2739ms | 3.6512 KOps/s | 3.6915 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 2.0138ms | 1.5637ms | 639.5094 Ops/s | 642.2523 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.8762ms | 2.4231ms | 412.6955 Ops/s | 415.9917 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.5916ms | 3.1790ms | 314.5635 Ops/s | 313.4528 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 35.7897ms | 34.4612ms | 29.0181 Ops/s | 29.0867 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 67.9170ms | 67.1821ms | 14.8849 Ops/s | 14.8704 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 43.5685ms | 42.0977ms | 23.7543 Ops/s | 23.9272 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 83.9824ms | 82.6520ms | 12.0989 Ops/s | 12.2174 Ops/s | |
| test_collector_without_rb_cuda[100-img_shape0-atari] | 57.0012ms | 56.7959ms | 17.6069 Ops/s | 17.5768 Ops/s | |
| test_collector_without_rb_cuda[200-img_shape1-large_batch] | 0.1186s | 0.1142s | 8.7557 Ops/s | 4.7531 Ops/s | |
| test_collector_with_rb_cuda[100-img_shape0-atari] | 62.4753ms | 62.0260ms | 16.1223 Ops/s | 15.9481 Ops/s | |
| test_collector_with_rb_cuda[200-img_shape1-large_batch] | 0.1295s | 0.1249s | 8.0094 Ops/s | 7.9673 Ops/s |
|
Closing: prof instrumentation not needed in the final stack. |
Stack from ghstack (oldest at bottom):
Integrate the prof distributed profiler into the Dreamer training loop
and collector workers for coordinated cross-process profiling.
with prof_handle param, step(), shm_name property, finish() cleanup.
Add prof_shm_name param to make_collector with PROF_SHM_NAME env var.
wrap training phases with _prof_context (sample, world_model, actor,
value, weight_update), call profiler.finish() at cleanup.
to 70 for prof window.
calls prof.prepare() to join profiling. Wraps rollout in prof context.
Co-authored-by: Cursor [email protected]