Skip to content

[CI] Separate GPU and CPU tests with pytest markers#3404

Merged
vmoens merged 6 commits intogh/vmoens/207/basefrom
gh/vmoens/207/head
Jan 29, 2026
Merged

[CI] Separate GPU and CPU tests with pytest markers#3404
vmoens merged 6 commits intogh/vmoens/207/basefrom
gh/vmoens/207/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Jan 28, 2026

Stack from ghstack (oldest at bottom):

Add @pytest.mark.gpu to tests that require CUDA, and update run_all.sh
to filter tests based on whether running on GPU or CPU machines.

Changes:

  • Register 'gpu' marker in pytest.ini and conftest.py
  • Add @pytest.mark.gpu to ~30 tests that explicitly require CUDA
  • Update run_all.sh to use GPU_MARKER_FILTER:
    • GPU jobs (CU_VERSION != cpu): run only @pytest.mark.gpu tests
    • CPU jobs (CU_VERSION = cpu): run all tests except @pytest.mark.gpu

This significantly reduces GPU machine usage by running only GPU-requiring
tests on expensive GPU runners (~30 tests instead of ~2000+). Tests that
can run on either device will run on CPU machines only.

The optimization can be disabled by setting TORCHRL_GPU_FILTER=0.

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 28, 2026
Add pytest.mark.gpu to tests that require CUDA, and update run_all.sh
to filter tests based on whether running on GPU or CPU machines.

Changes:
- Register 'gpu' marker in pytest.ini and conftest.py
- Add pytest.mark.gpu to ~30 tests that explicitly require CUDA
- Update run_all.sh to use GPU_MARKER_FILTER:
  - GPU jobs (CU_VERSION != cpu): run only pytest.mark.gpu tests
  - CPU jobs (CU_VERSION = cpu): run all tests except pytest.mark.gpu

This significantly reduces GPU machine usage by running only GPU-requiring
tests on expensive GPU runners (~30 tests instead of ~2000+). Tests that
can run on either device will run on CPU machines only.

The optimization can be disabled by setting TORCHRL_GPU_FILTER=0.


ghstack-source-id: ca9778d
Pull-Request: #3404
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 28, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3404

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions github-actions bot added the CI Has to do with CI setup (e.g. wheels & builds, tests...) label Jan 28, 2026
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 28, 2026
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 28, 2026
Add pytest.mark.gpu to tests that require CUDA, and update run_all.sh
to filter tests based on whether running on GPU or CPU machines.

Changes:
- Register 'gpu' marker in pytest.ini and conftest.py
- Add pytest.mark.gpu to ~30 tests that explicitly require CUDA
- Update run_all.sh to use GPU_MARKER_FILTER:
  - GPU jobs (CU_VERSION != cpu): run only pytest.mark.gpu tests
  - CPU jobs (CU_VERSION = cpu): run all tests except pytest.mark.gpu

This significantly reduces GPU machine usage by running only GPU-requiring
tests on expensive GPU runners (~30 tests instead of ~2000+). Tests that
can run on either device will run on CPU machines only.

The optimization can be disabled by setting TORCHRL_GPU_FILTER=0.

ghstack-source-id: 9235913
Pull-Request: #3404
vmoens added a commit that referenced this pull request Jan 28, 2026
Add pytest.mark.gpu to tests that require CUDA, and update run_all.sh
to filter tests based on whether running on GPU or CPU machines.

Changes:
- Register 'gpu' marker in pytest.ini and conftest.py
- Add pytest.mark.gpu to ~30 tests that explicitly require CUDA
- Update run_all.sh to use GPU_MARKER_FILTER:
  - GPU jobs (CU_VERSION != cpu): run only pytest.mark.gpu tests
  - CPU jobs (CU_VERSION = cpu): run all tests except pytest.mark.gpu

This significantly reduces GPU machine usage by running only GPU-requiring
tests on expensive GPU runners (~30 tests instead of ~2000+). Tests that
can run on either device will run on CPU machines only.

The optimization can be disabled by setting TORCHRL_GPU_FILTER=0.

ghstack-source-id: 9235913
Pull-Request: #3404
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 28, 2026
Add pytest.mark.gpu to tests that require CUDA, and update run_all.sh
to filter tests based on whether running on GPU or CPU machines.

Changes:
- Register 'gpu' marker in pytest.ini and conftest.py
- Add pytest.mark.gpu to ~30 tests that explicitly require CUDA
- Update run_all.sh to use GPU_MARKER_FILTER:
  - GPU jobs (CU_VERSION != cpu): run only pytest.mark.gpu tests
  - CPU jobs (CU_VERSION = cpu): run all tests except pytest.mark.gpu

This significantly reduces GPU machine usage by running only GPU-requiring
tests on expensive GPU runners (~30 tests instead of ~2000+). Tests that
can run on either device will run on CPU machines only.

The optimization can be disabled by setting TORCHRL_GPU_FILTER=0.

ghstack-source-id: 69148f2
Pull-Request: #3404
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 29, 2026
Add pytest.mark.gpu to tests that require CUDA, and update run_all.sh
to filter tests based on whether running on GPU or CPU machines.

Changes:
- Register 'gpu' marker in pytest.ini and conftest.py
- Add pytest.mark.gpu to ~30 tests that explicitly require CUDA
- Update run_all.sh to use GPU_MARKER_FILTER:
  - GPU jobs (CU_VERSION != cpu): run only pytest.mark.gpu tests
  - CPU jobs (CU_VERSION = cpu): run all tests except pytest.mark.gpu

This significantly reduces GPU machine usage by running only GPU-requiring
tests on expensive GPU runners (~30 tests instead of ~2000+). Tests that
can run on either device will run on CPU machines only.

The optimization can be disabled by setting TORCHRL_GPU_FILTER=0.

ghstack-source-id: cf14d1e
Pull-Request: #3404
@vmoens vmoens merged commit edc1d74 into gh/vmoens/207/base Jan 29, 2026
71 of 74 checks passed
@vmoens vmoens deleted the gh/vmoens/207/head branch January 29, 2026 09:58
@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 153. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}15$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 79.7458μs 78.7644μs 12.6961 KOps/s 12.1608 KOps/s $\color{#35bf28}+4.40\%$
test_tensor_to_bytestream_speed[torch.save] 0.1362ms 0.1354ms 7.3865 KOps/s 7.1133 KOps/s $\color{#35bf28}+3.84\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1073s 0.1070s 9.3453 Ops/s 9.1166 Ops/s $\color{#35bf28}+2.51\%$
test_tensor_to_bytestream_speed[numpy] 2.5347μs 2.5238μs 396.2214 KOps/s 385.2515 KOps/s $\color{#35bf28}+2.85\%$
test_tensor_to_bytestream_speed[safetensors] 37.5882μs 37.2559μs 26.8414 KOps/s 26.3834 KOps/s $\color{#35bf28}+1.74\%$
test_simple 0.6586s 0.5711s 1.7510 Ops/s 1.7686 Ops/s $\color{#d91a1a}-1.00\%$
test_transformed 1.2262s 1.1319s 0.8835 Ops/s 0.8800 Ops/s $\color{#35bf28}+0.40\%$
test_serial 1.6694s 1.6632s 0.6012 Ops/s 0.6115 Ops/s $\color{#d91a1a}-1.67\%$
test_parallel 1.2430s 1.1564s 0.8647 Ops/s 0.8955 Ops/s $\color{#d91a1a}-3.43\%$
test_step_mdp_speed[True-True-True-True-True] 0.3181ms 43.9394μs 22.7586 KOps/s 23.4200 KOps/s $\color{#d91a1a}-2.82\%$
test_step_mdp_speed[True-True-True-True-False] 64.4130μs 23.6698μs 42.2479 KOps/s 40.8300 KOps/s $\color{#35bf28}+3.47\%$
test_step_mdp_speed[True-True-True-False-True] 77.1840μs 24.0562μs 41.5694 KOps/s 40.3185 KOps/s $\color{#35bf28}+3.10\%$
test_step_mdp_speed[True-True-True-False-False] 50.2230μs 13.1823μs 75.8591 KOps/s 73.8273 KOps/s $\color{#35bf28}+2.75\%$
test_step_mdp_speed[True-True-False-True-True] 90.5840μs 46.8965μs 21.3236 KOps/s 21.3943 KOps/s $\color{#d91a1a}-0.33\%$
test_step_mdp_speed[True-True-False-True-False] 89.6540μs 26.6049μs 37.5871 KOps/s 36.9902 KOps/s $\color{#35bf28}+1.61\%$
test_step_mdp_speed[True-True-False-False-True] 69.3730μs 26.9723μs 37.0751 KOps/s 36.0685 KOps/s $\color{#35bf28}+2.79\%$
test_step_mdp_speed[True-True-False-False-False] 91.3940μs 15.8294μs 63.1734 KOps/s 62.5141 KOps/s $\color{#35bf28}+1.05\%$
test_step_mdp_speed[True-False-True-True-True] 95.3840μs 49.3414μs 20.2669 KOps/s 20.1023 KOps/s $\color{#35bf28}+0.82\%$
test_step_mdp_speed[True-False-True-True-False] 62.3430μs 29.5902μs 33.7950 KOps/s 33.6851 KOps/s $\color{#35bf28}+0.33\%$
test_step_mdp_speed[True-False-True-False-True] 67.2730μs 27.2613μs 36.6821 KOps/s 36.7805 KOps/s $\color{#d91a1a}-0.27\%$
test_step_mdp_speed[True-False-True-False-False] 38.1810μs 15.9333μs 62.7618 KOps/s 62.7226 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[True-False-False-True-True] 81.7140μs 51.0812μs 19.5767 KOps/s 19.5334 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[True-False-False-True-False] 0.1040ms 31.2477μs 32.0023 KOps/s 31.4143 KOps/s $\color{#35bf28}+1.87\%$
test_step_mdp_speed[True-False-False-False-True] 58.8730μs 29.1867μs 34.2622 KOps/s 33.5304 KOps/s $\color{#35bf28}+2.18\%$
test_step_mdp_speed[True-False-False-False-False] 43.1620μs 18.4371μs 54.2386 KOps/s 53.2319 KOps/s $\color{#35bf28}+1.89\%$
test_step_mdp_speed[False-True-True-True-True] 98.0240μs 48.6721μs 20.5457 KOps/s 20.2126 KOps/s $\color{#35bf28}+1.65\%$
test_step_mdp_speed[False-True-True-True-False] 60.0530μs 29.6028μs 33.7806 KOps/s 33.0017 KOps/s $\color{#35bf28}+2.36\%$
test_step_mdp_speed[False-True-True-False-True] 59.1930μs 30.2142μs 33.0970 KOps/s 32.0660 KOps/s $\color{#35bf28}+3.22\%$
test_step_mdp_speed[False-True-True-False-False] 51.3730μs 17.6511μs 56.6537 KOps/s 55.2539 KOps/s $\color{#35bf28}+2.53\%$
test_step_mdp_speed[False-True-False-True-True] 2.7691ms 52.0368μs 19.2172 KOps/s 18.9454 KOps/s $\color{#35bf28}+1.43\%$
test_step_mdp_speed[False-True-False-True-False] 70.0830μs 32.2494μs 31.0083 KOps/s 30.9747 KOps/s $\color{#35bf28}+0.11\%$
test_step_mdp_speed[False-True-False-False-True] 67.8030μs 33.1833μs 30.1356 KOps/s 30.2052 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[False-True-False-False-False] 44.3720μs 20.1629μs 49.5960 KOps/s 48.1949 KOps/s $\color{#35bf28}+2.91\%$
test_step_mdp_speed[False-False-True-True-True] 84.1730μs 53.2782μs 18.7694 KOps/s 18.2036 KOps/s $\color{#35bf28}+3.11\%$
test_step_mdp_speed[False-False-True-True-False] 72.8740μs 34.6547μs 28.8561 KOps/s 28.2991 KOps/s $\color{#35bf28}+1.97\%$
test_step_mdp_speed[False-False-True-False-True] 63.7530μs 32.8793μs 30.4143 KOps/s 29.5497 KOps/s $\color{#35bf28}+2.93\%$
test_step_mdp_speed[False-False-True-False-False] 51.8220μs 20.2601μs 49.3582 KOps/s 48.6020 KOps/s $\color{#35bf28}+1.56\%$
test_step_mdp_speed[False-False-False-True-True] 90.5840μs 55.8016μs 17.9206 KOps/s 17.5884 KOps/s $\color{#35bf28}+1.89\%$
test_step_mdp_speed[False-False-False-True-False] 66.7630μs 37.1242μs 26.9366 KOps/s 26.4261 KOps/s $\color{#35bf28}+1.93\%$
test_step_mdp_speed[False-False-False-False-True] 68.5730μs 34.6067μs 28.8961 KOps/s 28.1961 KOps/s $\color{#35bf28}+2.48\%$
test_step_mdp_speed[False-False-False-False-False] 51.9930μs 22.4750μs 44.4939 KOps/s 43.6647 KOps/s $\color{#35bf28}+1.90\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8414s 0.7511s 1.3314 Ops/s 1.3242 Ops/s $\color{#35bf28}+0.54\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7061s 0.6147s 1.6269 Ops/s 1.6094 Ops/s $\color{#35bf28}+1.09\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7153s 1.6386s 0.6103 Ops/s 0.6064 Ops/s $\color{#35bf28}+0.64\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4942s 1.4136s 0.7074 Ops/s 0.6840 Ops/s $\color{#35bf28}+3.42\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9498s 1.8777s 0.5326 Ops/s 0.5320 Ops/s $\color{#35bf28}+0.11\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7321s 1.6644s 0.6008 Ops/s 0.6023 Ops/s $\color{#d91a1a}-0.25\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6654s 4.5895s 0.2179 Ops/s 0.2185 Ops/s $\color{#d91a1a}-0.27\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6075s 4.4718s 0.2236 Ops/s 0.2244 Ops/s $\color{#d91a1a}-0.34\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.1526s 1.9594s 0.5104 Ops/s 0.5113 Ops/s $\color{#d91a1a}-0.19\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7688s 1.6660s 0.6002 Ops/s 0.6161 Ops/s $\color{#d91a1a}-2.58\%$
test_values[generalized_advantage_estimate-True-True] 10.3463ms 10.0425ms 99.5765 Ops/s 96.6993 Ops/s $\color{#35bf28}+2.98\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.9151ms 17.9202ms 55.8031 Ops/s 87.0201 Ops/s $\textbf{\color{#d91a1a}-35.87\%}$
test_values[td0_return_estimate-False-False] 0.2111ms 0.1300ms 7.6926 KOps/s 7.3921 KOps/s $\color{#35bf28}+4.07\%$
test_values[td1_return_estimate-False-False] 29.0190ms 27.7722ms 36.0072 Ops/s 35.1554 Ops/s $\color{#35bf28}+2.42\%$
test_values[vec_td1_return_estimate-False-False] 18.6609ms 17.9382ms 55.7471 Ops/s 87.1853 Ops/s $\textbf{\color{#d91a1a}-36.06\%}$
test_values[td_lambda_return_estimate-True-False] 42.7938ms 40.9298ms 24.4321 Ops/s 23.7836 Ops/s $\color{#35bf28}+2.73\%$
test_values[vec_td_lambda_return_estimate-True-False] 19.1589ms 17.9539ms 55.6981 Ops/s 87.1528 Ops/s $\textbf{\color{#d91a1a}-36.09\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.1294ms 8.8981ms 112.3829 Ops/s 108.6886 Ops/s $\color{#35bf28}+3.40\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7867ms 1.5293ms 653.9132 Ops/s 648.4828 Ops/s $\color{#35bf28}+0.84\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5539ms 0.4315ms 2.3177 KOps/s 2.3599 KOps/s $\color{#d91a1a}-1.79\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.0872ms 34.6872ms 28.8291 Ops/s 34.3806 Ops/s $\textbf{\color{#d91a1a}-16.15\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.1539ms 1.7425ms 573.9039 Ops/s 570.3284 Ops/s $\color{#35bf28}+0.63\%$
test_dqn_speed[False-None] 1.7797ms 1.3836ms 722.7352 Ops/s 720.6519 Ops/s $\color{#35bf28}+0.29\%$
test_dqn_speed[False-backward] 1.9768ms 1.9113ms 523.2015 Ops/s 527.2238 Ops/s $\color{#d91a1a}-0.76\%$
test_dqn_speed[True-None] 0.7230ms 0.5515ms 1.8131 KOps/s 1.8831 KOps/s $\color{#d91a1a}-3.71\%$
test_dqn_speed[True-backward] 1.0171ms 0.9857ms 1.0145 KOps/s 993.4563 Ops/s $\color{#35bf28}+2.12\%$
test_dqn_speed[reduce-overhead-None] 0.9276ms 0.5186ms 1.9282 KOps/s 1.8924 KOps/s $\color{#35bf28}+1.89\%$
test_ddpg_speed[False-None] 0.1994s 3.4279ms 291.7196 Ops/s 352.7294 Ops/s $\textbf{\color{#d91a1a}-17.30\%}$
test_ddpg_speed[False-backward] 4.1190ms 4.0230ms 248.5734 Ops/s 245.3491 Ops/s $\color{#35bf28}+1.31\%$
test_ddpg_speed[True-None] 1.8642ms 1.3840ms 722.5543 Ops/s 710.8613 Ops/s $\color{#35bf28}+1.64\%$
test_ddpg_speed[True-backward] 2.4051ms 2.3536ms 424.8863 Ops/s 377.5588 Ops/s $\textbf{\color{#35bf28}+12.54\%}$
test_ddpg_speed[reduce-overhead-None] 1.8593ms 1.3748ms 727.3951 Ops/s 713.6841 Ops/s $\color{#35bf28}+1.92\%$
test_sac_speed[False-None] 8.4930ms 7.9338ms 126.0431 Ops/s 126.1598 Ops/s $\color{#d91a1a}-0.09\%$
test_sac_speed[False-backward] 11.7698ms 11.1666ms 89.5525 Ops/s 89.1815 Ops/s $\color{#35bf28}+0.42\%$
test_sac_speed[True-None] 2.5603ms 2.1414ms 466.9815 Ops/s 446.5895 Ops/s $\color{#35bf28}+4.57\%$
test_sac_speed[True-backward] 4.0994ms 3.9799ms 251.2617 Ops/s 245.1600 Ops/s $\color{#35bf28}+2.49\%$
test_sac_speed[reduce-overhead-None] 2.2613ms 2.1195ms 471.8017 Ops/s 473.4617 Ops/s $\color{#d91a1a}-0.35\%$
test_redq_speed[False-None] 10.7342ms 10.2855ms 97.2240 Ops/s 97.2453 Ops/s $\color{#d91a1a}-0.02\%$
test_redq_speed[False-backward] 18.6299ms 17.8758ms 55.9417 Ops/s 56.5696 Ops/s $\color{#d91a1a}-1.11\%$
test_redq_speed[True-None] 4.9486ms 4.4841ms 223.0100 Ops/s 225.9144 Ops/s $\color{#d91a1a}-1.29\%$
test_redq_speed[True-backward] 9.9930ms 9.8203ms 101.8296 Ops/s 101.5085 Ops/s $\color{#35bf28}+0.32\%$
test_redq_speed[reduce-overhead-None] 4.9551ms 4.3776ms 228.4359 Ops/s 224.5306 Ops/s $\color{#35bf28}+1.74\%$
test_redq_deprec_speed[False-None] 11.5050ms 10.9223ms 91.5556 Ops/s 93.3891 Ops/s $\color{#d91a1a}-1.96\%$
test_redq_deprec_speed[False-backward] 15.9715ms 15.6964ms 63.7088 Ops/s 65.2134 Ops/s $\color{#d91a1a}-2.31\%$
test_redq_deprec_speed[True-None] 4.2118ms 3.6960ms 270.5601 Ops/s 269.8903 Ops/s $\color{#35bf28}+0.25\%$
test_redq_deprec_speed[True-backward] 7.8395ms 7.6472ms 130.7675 Ops/s 110.1815 Ops/s $\textbf{\color{#35bf28}+18.68\%}$
test_redq_deprec_speed[reduce-overhead-None] 4.1677ms 3.6131ms 276.7696 Ops/s 276.5828 Ops/s $\color{#35bf28}+0.07\%$
test_td3_speed[False-None] 8.1824ms 7.9941ms 125.0922 Ops/s 124.6787 Ops/s $\color{#35bf28}+0.33\%$
test_td3_speed[False-backward] 11.3045ms 10.8514ms 92.1538 Ops/s 91.5590 Ops/s $\color{#35bf28}+0.65\%$
test_td3_speed[True-None] 1.8796ms 1.8333ms 545.4560 Ops/s 544.5575 Ops/s $\color{#35bf28}+0.17\%$
test_td3_speed[True-backward] 3.7719ms 3.6315ms 275.3696 Ops/s 270.3886 Ops/s $\color{#35bf28}+1.84\%$
test_td3_speed[reduce-overhead-None] 1.8512ms 1.7914ms 558.2139 Ops/s 549.7468 Ops/s $\color{#35bf28}+1.54\%$
test_cql_speed[False-None] 30.5857ms 26.3159ms 37.9998 Ops/s 37.8281 Ops/s $\color{#35bf28}+0.45\%$
test_cql_speed[False-backward] 39.1317ms 35.9281ms 27.8334 Ops/s 28.2406 Ops/s $\color{#d91a1a}-1.44\%$
test_cql_speed[True-None] 13.1357ms 12.5117ms 79.9251 Ops/s 79.2654 Ops/s $\color{#35bf28}+0.83\%$
test_cql_speed[True-backward] 18.8385ms 18.3638ms 54.4551 Ops/s 53.7966 Ops/s $\color{#35bf28}+1.22\%$
test_cql_speed[reduce-overhead-None] 12.6274ms 12.4015ms 80.6355 Ops/s 79.8477 Ops/s $\color{#35bf28}+0.99\%$
test_a2c_speed[False-None] 5.9510ms 5.4176ms 184.5853 Ops/s 185.8306 Ops/s $\color{#d91a1a}-0.67\%$
test_a2c_speed[False-backward] 12.1481ms 11.8700ms 84.2463 Ops/s 84.3664 Ops/s $\color{#d91a1a}-0.14\%$
test_a2c_speed[True-None] 4.1581ms 3.7296ms 268.1248 Ops/s 272.9750 Ops/s $\color{#d91a1a}-1.78\%$
test_a2c_speed[True-backward] 8.7930ms 8.6284ms 115.8969 Ops/s 115.2585 Ops/s $\color{#35bf28}+0.55\%$
test_a2c_speed[reduce-overhead-None] 3.8018ms 3.6715ms 272.3684 Ops/s 271.9262 Ops/s $\color{#35bf28}+0.16\%$
test_ppo_speed[False-None] 6.0184ms 5.8590ms 170.6771 Ops/s 171.0868 Ops/s $\color{#d91a1a}-0.24\%$
test_ppo_speed[False-backward] 12.7847ms 12.4689ms 80.1995 Ops/s 79.6875 Ops/s $\color{#35bf28}+0.64\%$
test_ppo_speed[True-None] 4.1483ms 3.6218ms 276.1077 Ops/s 274.8823 Ops/s $\color{#35bf28}+0.45\%$
test_ppo_speed[True-backward] 8.8566ms 8.4329ms 118.5837 Ops/s 117.8953 Ops/s $\color{#35bf28}+0.58\%$
test_ppo_speed[reduce-overhead-None] 3.6891ms 3.5903ms 278.5315 Ops/s 279.8719 Ops/s $\color{#d91a1a}-0.48\%$
test_reinforce_speed[False-None] 4.7130ms 4.5568ms 219.4542 Ops/s 221.8469 Ops/s $\color{#d91a1a}-1.08\%$
test_reinforce_speed[False-backward] 7.6957ms 7.4227ms 134.7212 Ops/s 136.2955 Ops/s $\color{#d91a1a}-1.16\%$
test_reinforce_speed[True-None] 3.3148ms 2.8536ms 350.4300 Ops/s 351.1335 Ops/s $\color{#d91a1a}-0.20\%$
test_reinforce_speed[True-backward] 8.3960ms 7.8699ms 127.0657 Ops/s 123.9436 Ops/s $\color{#35bf28}+2.52\%$
test_reinforce_speed[reduce-overhead-None] 3.3867ms 2.8666ms 348.8429 Ops/s 339.3555 Ops/s $\color{#35bf28}+2.80\%$
test_iql_speed[False-None] 26.0278ms 20.1633ms 49.5952 Ops/s 50.9086 Ops/s $\color{#d91a1a}-2.58\%$
test_iql_speed[False-backward] 31.8075ms 30.5269ms 32.7580 Ops/s 32.9171 Ops/s $\color{#d91a1a}-0.48\%$
test_iql_speed[True-None] 9.1210ms 8.4998ms 117.6504 Ops/s 113.3626 Ops/s $\color{#35bf28}+3.78\%$
test_iql_speed[True-backward] 17.0691ms 16.7441ms 59.7224 Ops/s 59.4958 Ops/s $\color{#35bf28}+0.38\%$
test_iql_speed[reduce-overhead-None] 8.9173ms 8.5259ms 117.2903 Ops/s 116.7007 Ops/s $\color{#35bf28}+0.51\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1753ms 5.7254ms 174.6603 Ops/s 172.7807 Ops/s $\color{#35bf28}+1.09\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0638ms 0.3664ms 2.7290 KOps/s 3.4273 KOps/s $\textbf{\color{#d91a1a}-20.37\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6527ms 0.3504ms 2.8536 KOps/s 3.0078 KOps/s $\textbf{\color{#d91a1a}-5.13\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1209ms 5.5518ms 180.1206 Ops/s 177.6370 Ops/s $\color{#35bf28}+1.40\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0768ms 0.3512ms 2.8471 KOps/s 3.5876 KOps/s $\textbf{\color{#d91a1a}-20.64\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6030ms 0.3325ms 3.0075 KOps/s 3.5191 KOps/s $\textbf{\color{#d91a1a}-14.54\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.8540ms 1.4035ms 712.5088 Ops/s 783.5141 Ops/s $\textbf{\color{#d91a1a}-9.06\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5718ms 1.3201ms 757.5459 Ops/s 843.4427 Ops/s $\textbf{\color{#d91a1a}-10.18\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 11.5870ms 5.9087ms 169.2423 Ops/s 173.8227 Ops/s $\color{#d91a1a}-2.64\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0786ms 0.4446ms 2.2491 KOps/s 2.1542 KOps/s $\color{#35bf28}+4.40\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6866ms 0.4696ms 2.1296 KOps/s 2.1518 KOps/s $\color{#d91a1a}-1.03\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0120ms 5.5958ms 178.7059 Ops/s 176.7325 Ops/s $\color{#35bf28}+1.12\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.8290ms 0.2795ms 3.5779 KOps/s 3.2332 KOps/s $\textbf{\color{#35bf28}+10.66\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7194ms 0.2997ms 3.3369 KOps/s 3.3565 KOps/s $\color{#d91a1a}-0.59\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8631ms 5.5513ms 180.1375 Ops/s 177.3186 Ops/s $\color{#35bf28}+1.59\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.6974ms 0.3104ms 3.2212 KOps/s 3.4338 KOps/s $\textbf{\color{#d91a1a}-6.19\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6107ms 0.3780ms 2.6454 KOps/s 3.2952 KOps/s $\textbf{\color{#d91a1a}-19.72\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9668ms 5.7574ms 173.6882 Ops/s 171.8121 Ops/s $\color{#35bf28}+1.09\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8422ms 0.4566ms 2.1903 KOps/s 2.1681 KOps/s $\color{#35bf28}+1.03\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6335ms 0.4438ms 2.2533 KOps/s 2.2980 KOps/s $\color{#d91a1a}-1.94\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4907ms 4.9947ms 200.2121 Ops/s 197.6926 Ops/s $\color{#35bf28}+1.27\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 13.0781ms 1.9263ms 519.1269 Ops/s 522.6645 Ops/s $\color{#d91a1a}-0.68\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.0615ms 0.8715ms 1.1474 KOps/s 1.1454 KOps/s $\color{#35bf28}+0.17\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 11.1087ms 5.0680ms 197.3166 Ops/s 200.7695 Ops/s $\color{#d91a1a}-1.72\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 6.6574ms 1.8840ms 530.7949 Ops/s 559.6953 Ops/s $\textbf{\color{#d91a1a}-5.16\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 11.8139ms 1.2668ms 789.4028 Ops/s 1.0622 KOps/s $\textbf{\color{#d91a1a}-25.68\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5495s 16.0993ms 62.1146 Ops/s 57.2830 Ops/s $\textbf{\color{#35bf28}+8.43\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.2288ms 1.8630ms 536.7811 Ops/s 493.1920 Ops/s $\textbf{\color{#35bf28}+8.84\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.8559ms 1.0063ms 993.7112 Ops/s 953.8081 Ops/s $\color{#35bf28}+4.18\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.9772ms 35.6010ms 28.0891 Ops/s 27.9380 Ops/s $\color{#35bf28}+0.54\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.3285ms 18.5851ms 53.8066 Ops/s 55.6752 Ops/s $\color{#d91a1a}-3.36\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.4316ms 36.9538ms 27.0608 Ops/s 26.7861 Ops/s $\color{#35bf28}+1.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.1200ms 18.4398ms 54.2306 Ops/s 53.0771 Ops/s $\color{#35bf28}+2.17\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 39.9897ms 38.6119ms 25.8988 Ops/s 25.6690 Ops/s $\color{#35bf28}+0.90\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.8360ms 20.2602ms 49.3579 Ops/s 50.5338 Ops/s $\color{#d91a1a}-2.33\%$

@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 148. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.8024μs 81.8418μs 12.2187 KOps/s 11.7695 KOps/s $\color{#35bf28}+3.82\%$
test_tensor_to_bytestream_speed[torch.save] 0.1426ms 0.1415ms 7.0691 KOps/s 7.0060 KOps/s $\color{#35bf28}+0.90\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1410s 0.1407s 7.1083 Ops/s 7.4691 Ops/s $\color{#d91a1a}-4.83\%$
test_tensor_to_bytestream_speed[numpy] 2.9965μs 2.9887μs 334.5893 KOps/s 360.6524 KOps/s $\textbf{\color{#d91a1a}-7.23\%}$
test_tensor_to_bytestream_speed[safetensors] 38.6287μs 37.6219μs 26.5803 KOps/s 26.0331 KOps/s $\color{#35bf28}+2.10\%$
test_simple 0.9236s 0.8323s 1.2015 Ops/s 1.2044 Ops/s $\color{#d91a1a}-0.25\%$
test_transformed 1.5700s 1.4806s 0.6754 Ops/s 0.6819 Ops/s $\color{#d91a1a}-0.94\%$
test_serial 2.4604s 2.3816s 0.4199 Ops/s 0.4273 Ops/s $\color{#d91a1a}-1.73\%$
test_parallel 2.0451s 1.9911s 0.5022 Ops/s 0.5010 Ops/s $\color{#35bf28}+0.24\%$
test_step_mdp_speed[True-True-True-True-True] 0.5136ms 45.7529μs 21.8565 KOps/s 22.2012 KOps/s $\color{#d91a1a}-1.55\%$
test_step_mdp_speed[True-True-True-True-False] 59.0010μs 25.5918μs 39.0750 KOps/s 39.8280 KOps/s $\color{#d91a1a}-1.89\%$
test_step_mdp_speed[True-True-True-False-True] 58.3010μs 25.4601μs 39.2771 KOps/s 38.8573 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[True-True-True-False-False] 42.9210μs 14.1835μs 70.5046 KOps/s 71.8991 KOps/s $\color{#d91a1a}-1.94\%$
test_step_mdp_speed[True-True-False-True-True] 0.1126ms 48.3943μs 20.6636 KOps/s 20.5590 KOps/s $\color{#35bf28}+0.51\%$
test_step_mdp_speed[True-True-False-True-False] 59.0410μs 28.5428μs 35.0351 KOps/s 35.6319 KOps/s $\color{#d91a1a}-1.68\%$
test_step_mdp_speed[True-True-False-False-True] 63.5810μs 28.7795μs 34.7470 KOps/s 34.6821 KOps/s $\color{#35bf28}+0.19\%$
test_step_mdp_speed[True-True-False-False-False] 49.0900μs 16.8504μs 59.3456 KOps/s 59.5810 KOps/s $\color{#d91a1a}-0.40\%$
test_step_mdp_speed[True-False-True-True-True] 92.9920μs 51.6284μs 19.3692 KOps/s 19.3127 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-False-True-True-False] 87.2810μs 30.9973μs 32.2608 KOps/s 31.8517 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[True-False-True-False-True] 90.9110μs 28.0665μs 35.6296 KOps/s 34.7931 KOps/s $\color{#35bf28}+2.40\%$
test_step_mdp_speed[True-False-True-False-False] 45.2310μs 16.9053μs 59.1531 KOps/s 59.8684 KOps/s $\color{#d91a1a}-1.19\%$
test_step_mdp_speed[True-False-False-True-True] 96.2810μs 53.4106μs 18.7229 KOps/s 18.6257 KOps/s $\color{#35bf28}+0.52\%$
test_step_mdp_speed[True-False-False-True-False] 80.5220μs 34.1048μs 29.3214 KOps/s 29.6568 KOps/s $\color{#d91a1a}-1.13\%$
test_step_mdp_speed[True-False-False-False-True] 58.3910μs 31.2116μs 32.0394 KOps/s 32.3637 KOps/s $\color{#d91a1a}-1.00\%$
test_step_mdp_speed[True-False-False-False-False] 52.4210μs 19.4361μs 51.4505 KOps/s 51.2961 KOps/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[False-True-True-True-True] 94.9610μs 52.1881μs 19.1614 KOps/s 19.4082 KOps/s $\color{#d91a1a}-1.27\%$
test_step_mdp_speed[False-True-True-True-False] 59.7710μs 31.2726μs 31.9769 KOps/s 32.3232 KOps/s $\color{#d91a1a}-1.07\%$
test_step_mdp_speed[False-True-True-False-True] 63.4110μs 31.7935μs 31.4529 KOps/s 31.2443 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[False-True-True-False-False] 50.5810μs 18.7040μs 53.4644 KOps/s 54.4821 KOps/s $\color{#d91a1a}-1.87\%$
test_step_mdp_speed[False-True-False-True-True] 2.6159ms 54.4950μs 18.3503 KOps/s 18.5782 KOps/s $\color{#d91a1a}-1.23\%$
test_step_mdp_speed[False-True-False-True-False] 70.0110μs 33.6268μs 29.7382 KOps/s 29.4582 KOps/s $\color{#35bf28}+0.95\%$
test_step_mdp_speed[False-True-False-False-True] 62.8910μs 34.9338μs 28.6255 KOps/s 28.5643 KOps/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[False-True-False-False-False] 51.3710μs 21.6372μs 46.2166 KOps/s 47.7558 KOps/s $\color{#d91a1a}-3.22\%$
test_step_mdp_speed[False-False-True-True-True] 86.7420μs 57.2007μs 17.4823 KOps/s 17.5052 KOps/s $\color{#d91a1a}-0.13\%$
test_step_mdp_speed[False-False-True-True-False] 70.5110μs 36.7504μs 27.2106 KOps/s 27.0980 KOps/s $\color{#35bf28}+0.42\%$
test_step_mdp_speed[False-False-True-False-True] 98.1120μs 34.5790μs 28.9193 KOps/s 28.3049 KOps/s $\color{#35bf28}+2.17\%$
test_step_mdp_speed[False-False-True-False-False] 53.5110μs 21.4399μs 46.6420 KOps/s 47.1360 KOps/s $\color{#d91a1a}-1.05\%$
test_step_mdp_speed[False-False-False-True-True] 98.9510μs 58.7076μs 17.0336 KOps/s 16.8323 KOps/s $\color{#35bf28}+1.20\%$
test_step_mdp_speed[False-False-False-True-False] 77.0010μs 39.6038μs 25.2501 KOps/s 25.2296 KOps/s $\color{#35bf28}+0.08\%$
test_step_mdp_speed[False-False-False-False-True] 72.6810μs 36.7955μs 27.1772 KOps/s 26.6994 KOps/s $\color{#35bf28}+1.79\%$
test_step_mdp_speed[False-False-False-False-False] 52.9010μs 24.0841μs 41.5211 KOps/s 42.0901 KOps/s $\color{#d91a1a}-1.35\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7638s 0.7601s 1.3155 Ops/s 1.2855 Ops/s $\color{#35bf28}+2.34\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7325s 0.6411s 1.5599 Ops/s 1.5549 Ops/s $\color{#35bf28}+0.32\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7799s 1.7023s 0.5874 Ops/s 0.5891 Ops/s $\color{#d91a1a}-0.28\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5457s 1.4713s 0.6797 Ops/s 0.6810 Ops/s $\color{#d91a1a}-0.19\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0450s 1.9649s 0.5089 Ops/s 0.5131 Ops/s $\color{#d91a1a}-0.81\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.8117s 1.7333s 0.5769 Ops/s 0.5802 Ops/s $\color{#d91a1a}-0.57\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.9341s 4.7332s 0.2113 Ops/s 0.2104 Ops/s $\color{#35bf28}+0.43\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6519s 4.5278s 0.2209 Ops/s 0.2214 Ops/s $\color{#d91a1a}-0.25\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0785s 1.9819s 0.5046 Ops/s 0.5046 Ops/s $\color{#d91a1a}-0.01\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.8526s 1.7105s 0.5846 Ops/s 0.5912 Ops/s $\color{#d91a1a}-1.12\%$
test_values[generalized_advantage_estimate-True-True] 22.0257ms 21.5869ms 46.3244 Ops/s 47.8013 Ops/s $\color{#d91a1a}-3.09\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1436s 3.8205ms 261.7492 Ops/s 267.6292 Ops/s $\color{#d91a1a}-2.20\%$
test_values[td0_return_estimate-False-False] 0.1114ms 86.1417μs 11.6088 KOps/s 11.8577 KOps/s $\color{#d91a1a}-2.10\%$
test_values[td1_return_estimate-False-False] 51.8995ms 51.3880ms 19.4598 Ops/s 19.6203 Ops/s $\color{#d91a1a}-0.82\%$
test_values[vec_td1_return_estimate-False-False] 1.3694ms 1.1160ms 896.0600 Ops/s 908.4098 Ops/s $\color{#d91a1a}-1.36\%$
test_values[td_lambda_return_estimate-True-False] 84.4318ms 83.9200ms 11.9161 Ops/s 11.9588 Ops/s $\color{#d91a1a}-0.36\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3264ms 1.1099ms 900.9758 Ops/s 912.9208 Ops/s $\color{#d91a1a}-1.31\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.2324ms 21.7815ms 45.9105 Ops/s 47.0632 Ops/s $\color{#d91a1a}-2.45\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0747ms 0.7844ms 1.2748 KOps/s 1.2934 KOps/s $\color{#d91a1a}-1.44\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7579ms 0.7015ms 1.4254 KOps/s 1.4036 KOps/s $\color{#35bf28}+1.56\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5917ms 1.5170ms 659.2168 Ops/s 666.4424 Ops/s $\color{#d91a1a}-1.08\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7702ms 0.7191ms 1.3906 KOps/s 1.4302 KOps/s $\color{#d91a1a}-2.76\%$
test_dqn_speed[False-None] 2.0225ms 1.5663ms 638.4276 Ops/s 640.7204 Ops/s $\color{#d91a1a}-0.36\%$
test_dqn_speed[False-backward] 2.3123ms 2.2255ms 449.3409 Ops/s 455.7208 Ops/s $\color{#d91a1a}-1.40\%$
test_dqn_speed[True-None] 1.0585ms 0.5732ms 1.7447 KOps/s 1.8226 KOps/s $\color{#d91a1a}-4.28\%$
test_dqn_speed[True-backward] 1.1369ms 1.0881ms 918.9994 Ops/s 889.2959 Ops/s $\color{#35bf28}+3.34\%$
test_dqn_speed[reduce-overhead-None] 0.6700ms 0.5739ms 1.7423 KOps/s 1.7000 KOps/s $\color{#35bf28}+2.49\%$
test_ddpg_speed[False-None] 3.6484ms 2.9865ms 334.8368 Ops/s 336.8814 Ops/s $\color{#d91a1a}-0.61\%$
test_ddpg_speed[False-backward] 4.5944ms 4.2813ms 233.5739 Ops/s 226.8627 Ops/s $\color{#35bf28}+2.96\%$
test_ddpg_speed[True-None] 1.3979ms 1.3143ms 760.8437 Ops/s 767.7349 Ops/s $\color{#d91a1a}-0.90\%$
test_ddpg_speed[True-backward] 2.5336ms 2.4154ms 414.0159 Ops/s 421.1570 Ops/s $\color{#d91a1a}-1.70\%$
test_ddpg_speed[reduce-overhead-None] 1.4541ms 1.3522ms 739.5223 Ops/s 748.1429 Ops/s $\color{#d91a1a}-1.15\%$
test_sac_speed[False-None] 8.9843ms 8.5365ms 117.1446 Ops/s 117.3656 Ops/s $\color{#d91a1a}-0.19\%$
test_sac_speed[False-backward] 12.0419ms 11.5430ms 86.6327 Ops/s 84.9420 Ops/s $\color{#35bf28}+1.99\%$
test_sac_speed[True-None] 2.1611ms 1.8211ms 549.1283 Ops/s 554.4245 Ops/s $\color{#d91a1a}-0.96\%$
test_sac_speed[True-backward] 3.5245ms 3.4581ms 289.1764 Ops/s 279.1938 Ops/s $\color{#35bf28}+3.58\%$
test_sac_speed[reduce-overhead-None] 18.9077ms 10.7406ms 93.1050 Ops/s 93.8383 Ops/s $\color{#d91a1a}-0.78\%$
test_redq_deprec_speed[False-None] 10.2597ms 9.6004ms 104.1624 Ops/s 105.4119 Ops/s $\color{#d91a1a}-1.19\%$
test_redq_deprec_speed[False-backward] 13.2375ms 12.7300ms 78.5545 Ops/s 77.6617 Ops/s $\color{#35bf28}+1.15\%$
test_redq_deprec_speed[True-None] 2.6734ms 2.5703ms 389.0549 Ops/s 391.2554 Ops/s $\color{#d91a1a}-0.56\%$
test_redq_deprec_speed[True-backward] 4.3364ms 4.1966ms 238.2904 Ops/s 243.0146 Ops/s $\color{#d91a1a}-1.94\%$
test_redq_deprec_speed[reduce-overhead-None] 15.6841ms 9.6290ms 103.8532 Ops/s 105.5690 Ops/s $\color{#d91a1a}-1.63\%$
test_td3_speed[False-None] 8.8346ms 8.4656ms 118.1248 Ops/s 117.6089 Ops/s $\color{#35bf28}+0.44\%$
test_td3_speed[False-backward] 11.6279ms 10.9119ms 91.6432 Ops/s 92.5118 Ops/s $\color{#d91a1a}-0.94\%$
test_td3_speed[True-None] 1.7150ms 1.6442ms 608.2041 Ops/s 621.8021 Ops/s $\color{#d91a1a}-2.19\%$
test_td3_speed[True-backward] 3.1702ms 3.1079ms 321.7616 Ops/s 323.0305 Ops/s $\color{#d91a1a}-0.39\%$
test_td3_speed[reduce-overhead-None] 80.4918ms 24.0528ms 41.5752 Ops/s 43.1723 Ops/s $\color{#d91a1a}-3.70\%$
test_cql_speed[False-None] 18.6702ms 17.9424ms 55.7338 Ops/s 57.0931 Ops/s $\color{#d91a1a}-2.38\%$
test_cql_speed[False-backward] 23.7047ms 23.2389ms 43.0313 Ops/s 43.8002 Ops/s $\color{#d91a1a}-1.76\%$
test_cql_speed[True-None] 3.3306ms 3.2525ms 307.4538 Ops/s 310.8470 Ops/s $\color{#d91a1a}-1.09\%$
test_cql_speed[True-backward] 5.9367ms 5.5183ms 181.2167 Ops/s 181.4506 Ops/s $\color{#d91a1a}-0.13\%$
test_cql_speed[reduce-overhead-None] 18.8211ms 11.7649ms 84.9989 Ops/s 86.0599 Ops/s $\color{#d91a1a}-1.23\%$
test_a2c_speed[False-None] 4.2909ms 3.3603ms 297.5937 Ops/s 304.8047 Ops/s $\color{#d91a1a}-2.37\%$
test_a2c_speed[False-backward] 7.0399ms 6.6219ms 151.0141 Ops/s 154.4998 Ops/s $\color{#d91a1a}-2.26\%$
test_a2c_speed[True-None] 1.8325ms 1.3497ms 740.9166 Ops/s 749.7705 Ops/s $\color{#d91a1a}-1.18\%$
test_a2c_speed[True-backward] 3.2184ms 3.1363ms 318.8507 Ops/s 323.9000 Ops/s $\color{#d91a1a}-1.56\%$
test_a2c_speed[reduce-overhead-None] 1.4067ms 0.9887ms 1.0114 KOps/s 1.0227 KOps/s $\color{#d91a1a}-1.10\%$
test_ppo_speed[False-None] 4.3692ms 3.9944ms 250.3475 Ops/s 256.7391 Ops/s $\color{#d91a1a}-2.49\%$
test_ppo_speed[False-backward] 7.8314ms 7.3941ms 135.2429 Ops/s 136.1851 Ops/s $\color{#d91a1a}-0.69\%$
test_ppo_speed[True-None] 1.6284ms 1.4263ms 701.1060 Ops/s 709.1185 Ops/s $\color{#d91a1a}-1.13\%$
test_ppo_speed[True-backward] 3.3075ms 3.2608ms 306.6707 Ops/s 327.0132 Ops/s $\textbf{\color{#d91a1a}-6.22\%}$
test_ppo_speed[reduce-overhead-None] 1.1090ms 1.0335ms 967.5838 Ops/s 933.3878 Ops/s $\color{#35bf28}+3.66\%$
test_reinforce_speed[False-None] 2.4412ms 2.3482ms 425.8623 Ops/s 431.0415 Ops/s $\color{#d91a1a}-1.20\%$
test_reinforce_speed[False-backward] 3.9558ms 3.5141ms 284.5644 Ops/s 289.8118 Ops/s $\color{#d91a1a}-1.81\%$
test_reinforce_speed[True-None] 1.3578ms 1.2664ms 789.6590 Ops/s 797.3933 Ops/s $\color{#d91a1a}-0.97\%$
test_reinforce_speed[True-backward] 3.1271ms 3.0794ms 324.7340 Ops/s 325.8129 Ops/s $\color{#d91a1a}-0.33\%$
test_reinforce_speed[reduce-overhead-None] 0.4676s 10.1178ms 98.8353 Ops/s 97.4161 Ops/s $\color{#35bf28}+1.46\%$
test_iql_speed[False-None] 10.1575ms 9.7047ms 103.0431 Ops/s 103.5774 Ops/s $\color{#d91a1a}-0.52\%$
test_iql_speed[False-backward] 14.2866ms 13.8114ms 72.4040 Ops/s 72.9256 Ops/s $\color{#d91a1a}-0.72\%$
test_iql_speed[True-None] 2.2988ms 2.1961ms 455.3485 Ops/s 453.8979 Ops/s $\color{#35bf28}+0.32\%$
test_iql_speed[True-backward] 5.1795ms 4.8793ms 204.9489 Ops/s 206.8884 Ops/s $\color{#d91a1a}-0.94\%$
test_iql_speed[reduce-overhead-None] 17.1818ms 10.1425ms 98.5951 Ops/s 75.7660 Ops/s $\textbf{\color{#35bf28}+30.13\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4583ms 5.9738ms 167.3971 Ops/s 166.0390 Ops/s $\color{#35bf28}+0.82\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8896ms 0.2923ms 3.4214 KOps/s 3.4521 KOps/s $\color{#d91a1a}-0.89\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5598ms 0.2966ms 3.3719 KOps/s 3.6583 KOps/s $\textbf{\color{#d91a1a}-7.83\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1964ms 5.8204ms 171.8105 Ops/s 170.2316 Ops/s $\color{#35bf28}+0.93\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0512ms 0.3174ms 3.1509 KOps/s 3.5038 KOps/s $\textbf{\color{#d91a1a}-10.07\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6433ms 0.3107ms 3.2187 KOps/s 3.6221 KOps/s $\textbf{\color{#d91a1a}-11.14\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.8302ms 1.4887ms 671.7466 Ops/s 768.3889 Ops/s $\textbf{\color{#d91a1a}-12.58\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6017ms 1.3586ms 736.0666 Ops/s 786.1774 Ops/s $\textbf{\color{#d91a1a}-6.37\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0885ms 5.9956ms 166.7881 Ops/s 165.2669 Ops/s $\color{#35bf28}+0.92\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2822ms 0.5536ms 1.8062 KOps/s 2.1515 KOps/s $\textbf{\color{#d91a1a}-16.05\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7359ms 0.5066ms 1.9739 KOps/s 2.1935 KOps/s $\textbf{\color{#d91a1a}-10.01\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0547ms 5.8318ms 171.4725 Ops/s 168.9109 Ops/s $\color{#35bf28}+1.52\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8589ms 0.3772ms 2.6511 KOps/s 2.8591 KOps/s $\textbf{\color{#d91a1a}-7.28\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6226ms 0.3644ms 2.7440 KOps/s 2.6841 KOps/s $\color{#35bf28}+2.23\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0420ms 5.7708ms 173.2865 Ops/s 170.1449 Ops/s $\color{#35bf28}+1.85\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.5771ms 0.2883ms 3.4685 KOps/s 3.4860 KOps/s $\color{#d91a1a}-0.50\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5768ms 0.3482ms 2.8718 KOps/s 3.7223 KOps/s $\textbf{\color{#d91a1a}-22.85\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2021ms 5.9836ms 167.1239 Ops/s 166.1120 Ops/s $\color{#35bf28}+0.61\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.7601ms 0.5106ms 1.9586 KOps/s 2.1704 KOps/s $\textbf{\color{#d91a1a}-9.76\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7682ms 0.5023ms 1.9909 KOps/s 2.2034 KOps/s $\textbf{\color{#d91a1a}-9.64\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.6103s 17.3642ms 57.5897 Ops/s 50.6984 Ops/s $\textbf{\color{#35bf28}+13.59\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 32.2410ms 2.4181ms 413.5497 Ops/s 491.3184 Ops/s $\textbf{\color{#d91a1a}-15.83\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 10.1338ms 1.3107ms 762.9572 Ops/s 1.0406 KOps/s $\textbf{\color{#d91a1a}-26.68\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.5669ms 5.2939ms 188.8971 Ops/s 191.4335 Ops/s $\color{#d91a1a}-1.32\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.0374ms 1.8089ms 552.8273 Ops/s 498.8845 Ops/s $\textbf{\color{#35bf28}+10.81\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.9821ms 1.3044ms 766.6576 Ops/s 751.7478 Ops/s $\color{#35bf28}+1.98\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5634s 16.6828ms 59.9419 Ops/s 183.3766 Ops/s $\textbf{\color{#d91a1a}-67.31\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 11.6269ms 2.1753ms 459.6972 Ops/s 61.1396 Ops/s $\textbf{\color{#35bf28}+651.88\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.2347ms 1.0768ms 928.6998 Ops/s 840.4383 Ops/s $\textbf{\color{#35bf28}+10.50\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 39.7468ms 36.9444ms 27.0677 Ops/s 26.7994 Ops/s $\color{#35bf28}+1.00\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.3431ms 18.6929ms 53.4962 Ops/s 53.5792 Ops/s $\color{#d91a1a}-0.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 41.0633ms 37.8010ms 26.4543 Ops/s 25.4597 Ops/s $\color{#35bf28}+3.91\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.7049ms 18.7708ms 53.2742 Ops/s 52.5604 Ops/s $\color{#35bf28}+1.36\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.0404ms 39.5088ms 25.3108 Ops/s 24.9039 Ops/s $\color{#35bf28}+1.63\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.3045ms 20.1714ms 49.5751 Ops/s 48.8105 Ops/s $\color{#35bf28}+1.57\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Has to do with CI setup (e.g. wheels & builds, tests...) CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant