Skip to content

[CI] Add checkbox options for wheel build variants (CPU/CUDA/ROCm)#3451

Merged
vmoens merged 1 commit intomainfrom
ci/cpu-only-wheel-builds
Feb 7, 2026
Merged

[CI] Add checkbox options for wheel build variants (CPU/CUDA/ROCm)#3451
vmoens merged 1 commit intomainfrom
ci/cpu-only-wheel-builds

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 5, 2026

Summary

This PR adds workflow_dispatch inputs to select which wheel variants to build, presented as tickable checkboxes in the GitHub Actions UI:

  • build_cpu: Build CPU wheels (default: true)
  • build_cuda: Build CUDA wheels (default: false)
  • build_rocm: Build ROCm wheels, Linux only (default: false)

The default is CPU-only since TorchRL is a pure Python library and doesn't require CUDA-specific wheels. This significantly speeds up release dry-runs.

Changes

Updated workflows:

  • build-wheels-linux.yml: CPU, CUDA, ROCm options
  • build-wheels-windows.yml: CPU, CUDA options
  • build-wheels-m1.yml: CPU option only (macOS ARM64)
  • build-wheels-aarch64-linux.yml: CPU option only (no CUDA support)
  • release.yml: Updated to use boolean checkboxes instead of choice dropdown

How it works:

  • For workflow_dispatch (manual trigger): Boolean checkboxes appear in the UI
  • For workflow_call (from release.yml): String inputs (with-cpu, with-cuda, with-rocm) with values enable/disable
  • Parameters are passed to pytorch/test-infra's generate_binary_build_matrix.yml

Test plan

  • Trigger Build Linux Wheels workflow manually with CPU-only selected
  • Verify only CPU matrix is generated
  • Trigger with CUDA+CPU selected and verify both are generated

Made with Cursor

This adds workflow_dispatch inputs to select which wheel variants to build:
- build_cpu: Build CPU wheels (default: true)
- build_cuda: Build CUDA wheels (default: false)
- build_rocm: Build ROCm wheels, Linux only (default: false)

For workflow_call (used by release.yml), string inputs are used:
- with-cpu, with-cuda, with-rocm (enable/disable)

Default is CPU-only since TorchRL is a pure Python library and doesn't
require CUDA-specific wheels. This significantly speeds up dry-run builds.

Updated workflows:
- build-wheels-linux.yml: CPU, CUDA, ROCm options
- build-wheels-windows.yml: CPU, CUDA options
- build-wheels-m1.yml: CPU option only (macOS ARM64)
- build-wheels-aarch64-linux.yml: CPU option only (no CUDA support)
- release.yml: Updated to use boolean checkboxes instead of choice dropdown

Co-authored-by: Cursor <[email protected]>
@vmoens vmoens added the ciflow/binaries/all Build all binaries label Feb 5, 2026
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3451

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 77facdf with merge base 838410c (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2026
@github-actions github-actions bot added the CI Has to do with CI setup (e.g. wheels & builds, tests...) label Feb 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 79.8447μs 78.8353μs 12.6847 KOps/s 12.6983 KOps/s $\color{#d91a1a}-0.11\%$
test_tensor_to_bytestream_speed[torch.save] 0.1366ms 0.1362ms 7.3399 KOps/s 7.3011 KOps/s $\color{#35bf28}+0.53\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1068s 0.1065s 9.3877 Ops/s 8.9122 Ops/s $\textbf{\color{#35bf28}+5.34\%}$
test_tensor_to_bytestream_speed[numpy] 2.5975μs 2.5938μs 385.5301 KOps/s 395.1288 KOps/s $\color{#d91a1a}-2.43\%$
test_tensor_to_bytestream_speed[safetensors] 36.8427μs 36.5947μs 27.3264 KOps/s 27.1537 KOps/s $\color{#35bf28}+0.64\%$
test_simple 0.5333s 0.5324s 1.8784 Ops/s 1.7931 Ops/s $\color{#35bf28}+4.76\%$
test_transformed 1.2242s 1.1279s 0.8866 Ops/s 0.8843 Ops/s $\color{#35bf28}+0.26\%$
test_serial 1.6342s 1.6302s 0.6134 Ops/s 0.6020 Ops/s $\color{#35bf28}+1.90\%$
test_parallel 1.2927s 1.1235s 0.8901 Ops/s 0.8527 Ops/s $\color{#35bf28}+4.38\%$
test_step_mdp_speed[True-True-True-True-True] 0.3449ms 42.9646μs 23.2750 KOps/s 23.0492 KOps/s $\color{#35bf28}+0.98\%$
test_step_mdp_speed[True-True-True-True-False] 49.1230μs 24.6958μs 40.4927 KOps/s 41.2690 KOps/s $\color{#d91a1a}-1.88\%$
test_step_mdp_speed[True-True-True-False-True] 56.6340μs 24.0387μs 41.5997 KOps/s 40.9726 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[True-True-True-False-False] 46.6630μs 13.3866μs 74.7018 KOps/s 73.9781 KOps/s $\color{#35bf28}+0.98\%$
test_step_mdp_speed[True-True-False-True-True] 73.5450μs 46.0279μs 21.7259 KOps/s 21.3749 KOps/s $\color{#35bf28}+1.64\%$
test_step_mdp_speed[True-True-False-True-False] 55.7040μs 27.0459μs 36.9742 KOps/s 37.0572 KOps/s $\color{#d91a1a}-0.22\%$
test_step_mdp_speed[True-True-False-False-True] 61.7140μs 26.9707μs 37.0773 KOps/s 36.7467 KOps/s $\color{#35bf28}+0.90\%$
test_step_mdp_speed[True-True-False-False-False] 41.2730μs 16.2076μs 61.6996 KOps/s 61.2215 KOps/s $\color{#35bf28}+0.78\%$
test_step_mdp_speed[True-False-True-True-True] 88.0460μs 49.5425μs 20.1847 KOps/s 19.7825 KOps/s $\color{#35bf28}+2.03\%$
test_step_mdp_speed[True-False-True-True-False] 63.7740μs 30.4406μs 32.8509 KOps/s 32.7237 KOps/s $\color{#35bf28}+0.39\%$
test_step_mdp_speed[True-False-True-False-True] 59.3530μs 27.1284μs 36.8617 KOps/s 36.5515 KOps/s $\color{#35bf28}+0.85\%$
test_step_mdp_speed[True-False-True-False-False] 48.5530μs 16.1577μs 61.8900 KOps/s 61.1010 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[True-False-False-True-True] 84.4650μs 51.9475μs 19.2502 KOps/s 19.0662 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[True-False-False-True-False] 59.6140μs 33.0042μs 30.2991 KOps/s 30.3528 KOps/s $\color{#d91a1a}-0.18\%$
test_step_mdp_speed[True-False-False-False-True] 60.7040μs 29.6193μs 33.7618 KOps/s 33.3623 KOps/s $\color{#35bf28}+1.20\%$
test_step_mdp_speed[True-False-False-False-False] 58.3140μs 18.9450μs 52.7844 KOps/s 52.8347 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[False-True-True-True-True] 79.1250μs 49.3805μs 20.2509 KOps/s 20.0779 KOps/s $\color{#35bf28}+0.86\%$
test_step_mdp_speed[False-True-True-True-False] 97.9960μs 29.9223μs 33.4199 KOps/s 32.6247 KOps/s $\color{#35bf28}+2.44\%$
test_step_mdp_speed[False-True-True-False-True] 2.3715ms 31.3438μs 31.9042 KOps/s 32.2207 KOps/s $\color{#d91a1a}-0.98\%$
test_step_mdp_speed[False-True-True-False-False] 45.9430μs 18.0257μs 55.4765 KOps/s 55.0989 KOps/s $\color{#35bf28}+0.69\%$
test_step_mdp_speed[False-True-False-True-True] 88.0560μs 51.4606μs 19.4323 KOps/s 19.1347 KOps/s $\color{#35bf28}+1.56\%$
test_step_mdp_speed[False-True-False-True-False] 66.8740μs 32.9565μs 30.3430 KOps/s 30.6622 KOps/s $\color{#d91a1a}-1.04\%$
test_step_mdp_speed[False-True-False-False-True] 71.6750μs 33.0428μs 30.2638 KOps/s 29.9263 KOps/s $\color{#35bf28}+1.13\%$
test_step_mdp_speed[False-True-False-False-False] 57.2840μs 20.6939μs 48.3235 KOps/s 48.3186 KOps/s $\color{#35bf28}+0.01\%$
test_step_mdp_speed[False-False-True-True-True] 87.1250μs 53.8717μs 18.5626 KOps/s 18.3314 KOps/s $\color{#35bf28}+1.26\%$
test_step_mdp_speed[False-False-True-True-False] 69.5440μs 35.5965μs 28.0927 KOps/s 28.3818 KOps/s $\color{#d91a1a}-1.02\%$
test_step_mdp_speed[False-False-True-False-True] 0.1027ms 33.0570μs 30.2508 KOps/s 29.6710 KOps/s $\color{#35bf28}+1.95\%$
test_step_mdp_speed[False-False-True-False-False] 52.5230μs 20.6415μs 48.4460 KOps/s 48.7416 KOps/s $\color{#d91a1a}-0.61\%$
test_step_mdp_speed[False-False-False-True-True] 90.4860μs 56.3804μs 17.7366 KOps/s 17.5245 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[False-False-False-True-False] 73.3850μs 37.8169μs 26.4432 KOps/s 26.4474 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[False-False-False-False-True] 69.5640μs 35.5739μs 28.1105 KOps/s 28.3097 KOps/s $\color{#d91a1a}-0.70\%$
test_step_mdp_speed[False-False-False-False-False] 87.4760μs 22.9740μs 43.5275 KOps/s 43.6606 KOps/s $\color{#d91a1a}-0.30\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8541s 0.7547s 1.3251 Ops/s 1.3143 Ops/s $\color{#35bf28}+0.82\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7187s 0.6215s 1.6090 Ops/s 1.5926 Ops/s $\color{#35bf28}+1.04\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7224s 1.6474s 0.6070 Ops/s 0.6042 Ops/s $\color{#35bf28}+0.46\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5068s 1.4296s 0.6995 Ops/s 0.6978 Ops/s $\color{#35bf28}+0.25\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9613s 1.8822s 0.5313 Ops/s 0.5257 Ops/s $\color{#35bf28}+1.06\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7471s 1.6680s 0.5995 Ops/s 0.5922 Ops/s $\color{#35bf28}+1.24\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6892s 4.6127s 0.2168 Ops/s 0.2181 Ops/s $\color{#d91a1a}-0.59\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5067s 4.4489s 0.2248 Ops/s 0.2240 Ops/s $\color{#35bf28}+0.33\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.1315s 1.9677s 0.5082 Ops/s 0.5146 Ops/s $\color{#d91a1a}-1.23\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7179s 1.6419s 0.6090 Ops/s 0.6054 Ops/s $\color{#35bf28}+0.61\%$
test_values[generalized_advantage_estimate-True-True] 10.5168ms 9.8487ms 101.5363 Ops/s 103.2040 Ops/s $\color{#d91a1a}-1.62\%$
test_values[vec_generalized_advantage_estimate-True-True] 21.7597ms 17.7560ms 56.3190 Ops/s 57.0249 Ops/s $\color{#d91a1a}-1.24\%$
test_values[td0_return_estimate-False-False] 0.2259ms 0.1342ms 7.4499 KOps/s 8.1524 KOps/s $\textbf{\color{#d91a1a}-8.62\%}$
test_values[td1_return_estimate-False-False] 26.7724ms 26.0506ms 38.3868 Ops/s 38.3025 Ops/s $\color{#35bf28}+0.22\%$
test_values[vec_td1_return_estimate-False-False] 22.4593ms 17.8354ms 56.0684 Ops/s 56.4710 Ops/s $\color{#d91a1a}-0.71\%$
test_values[td_lambda_return_estimate-True-False] 39.8524ms 38.3694ms 26.0624 Ops/s 25.9879 Ops/s $\color{#35bf28}+0.29\%$
test_values[vec_td_lambda_return_estimate-True-False] 21.0015ms 17.7674ms 56.2828 Ops/s 56.9443 Ops/s $\color{#d91a1a}-1.16\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.6783ms 8.6135ms 116.0970 Ops/s 116.2943 Ops/s $\color{#d91a1a}-0.17\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.6726ms 1.4528ms 688.3134 Ops/s 685.0953 Ops/s $\color{#35bf28}+0.47\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5167ms 0.4029ms 2.4822 KOps/s 2.4843 KOps/s $\color{#d91a1a}-0.08\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.4723ms 34.9485ms 28.6135 Ops/s 32.6057 Ops/s $\textbf{\color{#d91a1a}-12.24\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8281ms 1.6828ms 594.2454 Ops/s 588.5981 Ops/s $\color{#35bf28}+0.96\%$
test_dqn_speed[False-None] 1.4790ms 1.3671ms 731.4748 Ops/s 739.7090 Ops/s $\color{#d91a1a}-1.11\%$
test_dqn_speed[False-backward] 1.9378ms 1.8520ms 539.9617 Ops/s 540.4967 Ops/s $\color{#d91a1a}-0.10\%$
test_dqn_speed[True-None] 0.6465ms 0.5327ms 1.8772 KOps/s 1.7813 KOps/s $\textbf{\color{#35bf28}+5.39\%}$
test_dqn_speed[True-backward] 1.0403ms 0.9827ms 1.0176 KOps/s 821.7749 Ops/s $\textbf{\color{#35bf28}+23.84\%}$
test_dqn_speed[reduce-overhead-None] 0.6334ms 0.5437ms 1.8393 KOps/s 1.8370 KOps/s $\color{#35bf28}+0.13\%$
test_ddpg_speed[False-None] 3.2330ms 2.8714ms 348.2605 Ops/s 359.6526 Ops/s $\color{#d91a1a}-3.17\%$
test_ddpg_speed[False-backward] 4.0337ms 3.9526ms 252.9976 Ops/s 253.6366 Ops/s $\color{#d91a1a}-0.25\%$
test_ddpg_speed[True-None] 1.6349ms 1.4175ms 705.4829 Ops/s 711.0128 Ops/s $\color{#d91a1a}-0.78\%$
test_ddpg_speed[True-backward] 2.4790ms 2.3623ms 423.3212 Ops/s 354.8300 Ops/s $\textbf{\color{#35bf28}+19.30\%}$
test_ddpg_speed[reduce-overhead-None] 1.5265ms 1.3804ms 724.4054 Ops/s 716.7538 Ops/s $\color{#35bf28}+1.07\%$
test_sac_speed[False-None] 8.5984ms 7.9107ms 126.4118 Ops/s 128.7192 Ops/s $\color{#d91a1a}-1.79\%$
test_sac_speed[False-backward] 11.4794ms 10.9340ms 91.4578 Ops/s 91.8529 Ops/s $\color{#d91a1a}-0.43\%$
test_sac_speed[True-None] 2.2854ms 2.1202ms 471.6528 Ops/s 464.4811 Ops/s $\color{#35bf28}+1.54\%$
test_sac_speed[True-backward] 4.1330ms 3.9941ms 250.3716 Ops/s 220.4053 Ops/s $\textbf{\color{#35bf28}+13.60\%}$
test_sac_speed[reduce-overhead-None] 2.3045ms 2.1137ms 473.1081 Ops/s 452.8944 Ops/s $\color{#35bf28}+4.46\%$
test_redq_speed[False-None] 16.5442ms 10.8192ms 92.4279 Ops/s 85.0440 Ops/s $\textbf{\color{#35bf28}+8.68\%}$
test_redq_speed[False-backward] 18.3943ms 17.7742ms 56.2613 Ops/s 57.4033 Ops/s $\color{#d91a1a}-1.99\%$
test_redq_speed[True-None] 4.6303ms 4.4449ms 224.9761 Ops/s 220.3366 Ops/s $\color{#35bf28}+2.11\%$
test_redq_speed[True-backward] 10.1069ms 9.8611ms 101.4089 Ops/s 103.0878 Ops/s $\color{#d91a1a}-1.63\%$
test_redq_speed[reduce-overhead-None] 4.7784ms 4.4334ms 225.5630 Ops/s 219.9848 Ops/s $\color{#35bf28}+2.54\%$
test_redq_deprec_speed[False-None] 11.1714ms 10.7547ms 92.9824 Ops/s 92.4264 Ops/s $\color{#35bf28}+0.60\%$
test_redq_deprec_speed[False-backward] 15.7467ms 15.4282ms 64.8165 Ops/s 63.8367 Ops/s $\color{#35bf28}+1.53\%$
test_redq_deprec_speed[True-None] 4.0806ms 3.6775ms 271.9229 Ops/s 270.0849 Ops/s $\color{#35bf28}+0.68\%$
test_redq_deprec_speed[True-backward] 7.8540ms 7.5678ms 132.1385 Ops/s 125.7537 Ops/s $\textbf{\color{#35bf28}+5.08\%}$
test_redq_deprec_speed[reduce-overhead-None] 4.2396ms 3.6538ms 273.6848 Ops/s 273.0911 Ops/s $\color{#35bf28}+0.22\%$
test_td3_speed[False-None] 8.0195ms 7.8456ms 127.4604 Ops/s 128.0432 Ops/s $\color{#d91a1a}-0.46\%$
test_td3_speed[False-backward] 11.1170ms 10.6248ms 94.1197 Ops/s 94.8116 Ops/s $\color{#d91a1a}-0.73\%$
test_td3_speed[True-None] 1.8612ms 1.8282ms 546.9752 Ops/s 536.3342 Ops/s $\color{#35bf28}+1.98\%$
test_td3_speed[True-backward] 3.7433ms 3.6207ms 276.1923 Ops/s 242.5286 Ops/s $\textbf{\color{#35bf28}+13.88\%}$
test_td3_speed[reduce-overhead-None] 1.8925ms 1.7728ms 564.0822 Ops/s 554.5962 Ops/s $\color{#35bf28}+1.71\%$
test_cql_speed[False-None] 29.6653ms 25.9123ms 38.5917 Ops/s 39.0474 Ops/s $\color{#d91a1a}-1.17\%$
test_cql_speed[False-backward] 38.3689ms 34.8713ms 28.6769 Ops/s 29.1523 Ops/s $\color{#d91a1a}-1.63\%$
test_cql_speed[True-None] 12.6496ms 12.3235ms 81.1461 Ops/s 80.1235 Ops/s $\color{#35bf28}+1.28\%$
test_cql_speed[True-backward] 18.7846ms 18.2694ms 54.7364 Ops/s 59.4833 Ops/s $\textbf{\color{#d91a1a}-7.98\%}$
test_cql_speed[reduce-overhead-None] 12.6821ms 12.4482ms 80.3329 Ops/s 79.9320 Ops/s $\color{#35bf28}+0.50\%$
test_a2c_speed[False-None] 5.4748ms 5.2547ms 190.3071 Ops/s 187.8660 Ops/s $\color{#35bf28}+1.30\%$
test_a2c_speed[False-backward] 11.8248ms 11.6533ms 85.8129 Ops/s 85.2589 Ops/s $\color{#35bf28}+0.65\%$
test_a2c_speed[True-None] 3.8325ms 3.7007ms 270.2217 Ops/s 256.0255 Ops/s $\textbf{\color{#35bf28}+5.54\%}$
test_a2c_speed[True-backward] 8.9549ms 8.5363ms 117.1475 Ops/s 111.4515 Ops/s $\textbf{\color{#35bf28}+5.11\%}$
test_a2c_speed[reduce-overhead-None] 3.8699ms 3.6837ms 271.4636 Ops/s 270.1411 Ops/s $\color{#35bf28}+0.49\%$
test_ppo_speed[False-None] 6.0145ms 5.8418ms 171.1793 Ops/s 168.8546 Ops/s $\color{#35bf28}+1.38\%$
test_ppo_speed[False-backward] 12.5565ms 12.3191ms 81.1746 Ops/s 81.1097 Ops/s $\color{#35bf28}+0.08\%$
test_ppo_speed[True-None] 3.7857ms 3.6284ms 275.6022 Ops/s 273.5647 Ops/s $\color{#35bf28}+0.74\%$
test_ppo_speed[True-backward] 8.5603ms 8.3022ms 120.4500 Ops/s 119.0861 Ops/s $\color{#35bf28}+1.15\%$
test_ppo_speed[reduce-overhead-None] 3.7087ms 3.6063ms 277.2894 Ops/s 276.7931 Ops/s $\color{#35bf28}+0.18\%$
test_reinforce_speed[False-None] 5.4980ms 4.5120ms 221.6335 Ops/s 221.4213 Ops/s $\color{#35bf28}+0.10\%$
test_reinforce_speed[False-backward] 7.4275ms 7.2432ms 138.0613 Ops/s 137.1018 Ops/s $\color{#35bf28}+0.70\%$
test_reinforce_speed[True-None] 3.0474ms 2.8821ms 346.9640 Ops/s 338.0506 Ops/s $\color{#35bf28}+2.64\%$
test_reinforce_speed[True-backward] 8.0289ms 7.7474ms 129.0749 Ops/s 117.1770 Ops/s $\textbf{\color{#35bf28}+10.15\%}$
test_reinforce_speed[reduce-overhead-None] 3.0522ms 2.8603ms 349.6162 Ops/s 335.7122 Ops/s $\color{#35bf28}+4.14\%$
test_iql_speed[False-None] 19.8915ms 19.2128ms 52.0488 Ops/s 51.1515 Ops/s $\color{#35bf28}+1.75\%$
test_iql_speed[False-backward] 35.3161ms 30.0253ms 33.3052 Ops/s 33.0502 Ops/s $\color{#35bf28}+0.77\%$
test_iql_speed[True-None] 8.7686ms 8.4577ms 118.2357 Ops/s 114.2159 Ops/s $\color{#35bf28}+3.52\%$
test_iql_speed[True-backward] 17.1746ms 16.6889ms 59.9201 Ops/s 59.9585 Ops/s $\color{#d91a1a}-0.06\%$
test_iql_speed[reduce-overhead-None] 8.7683ms 8.5209ms 117.3579 Ops/s 113.4285 Ops/s $\color{#35bf28}+3.46\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9885ms 5.8936ms 169.6766 Ops/s 171.8013 Ops/s $\color{#d91a1a}-1.24\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0001ms 0.2955ms 3.3842 KOps/s 3.0198 KOps/s $\textbf{\color{#35bf28}+12.07\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4711ms 0.2558ms 3.9090 KOps/s 3.3248 KOps/s $\textbf{\color{#35bf28}+17.57\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9077ms 5.7021ms 175.3740 Ops/s 177.0160 Ops/s $\color{#d91a1a}-0.93\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9263ms 0.3220ms 3.1052 KOps/s 3.6660 KOps/s $\textbf{\color{#d91a1a}-15.30\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5136ms 0.2849ms 3.5096 KOps/s 3.9217 KOps/s $\textbf{\color{#d91a1a}-10.51\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4623ms 1.2140ms 823.6975 Ops/s 826.0165 Ops/s $\color{#d91a1a}-0.28\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.3584ms 1.1262ms 887.9493 Ops/s 883.6538 Ops/s $\color{#35bf28}+0.49\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.8085ms 5.9856ms 167.0678 Ops/s 170.2359 Ops/s $\color{#d91a1a}-1.86\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.7655ms 0.4246ms 2.3552 KOps/s 2.3008 KOps/s $\color{#35bf28}+2.36\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6679ms 0.4126ms 2.4235 KOps/s 2.4487 KOps/s $\color{#d91a1a}-1.03\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.7807ms 5.6929ms 175.6580 Ops/s 175.3809 Ops/s $\color{#35bf28}+0.16\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0366ms 0.2772ms 3.6070 KOps/s 3.5874 KOps/s $\color{#35bf28}+0.55\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5107ms 0.3314ms 3.0178 KOps/s 3.6025 KOps/s $\textbf{\color{#d91a1a}-16.23\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8921ms 5.6660ms 176.4905 Ops/s 176.9980 Ops/s $\color{#d91a1a}-0.29\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1727ms 0.3294ms 3.0356 KOps/s 2.7041 KOps/s $\textbf{\color{#35bf28}+12.26\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5547ms 0.3297ms 3.0334 KOps/s 3.5598 KOps/s $\textbf{\color{#d91a1a}-14.79\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2846ms 5.8580ms 170.7057 Ops/s 171.8811 Ops/s $\color{#d91a1a}-0.68\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9333ms 0.4521ms 2.2120 KOps/s 650.2763 Ops/s $\textbf{\color{#35bf28}+240.16\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6527ms 0.4493ms 2.2256 KOps/s 2.1330 KOps/s $\color{#35bf28}+4.34\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4405ms 4.9698ms 201.2151 Ops/s 201.5604 Ops/s $\color{#d91a1a}-0.17\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 12.8160ms 1.9333ms 517.2388 Ops/s 586.3411 Ops/s $\textbf{\color{#d91a1a}-11.79\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.2399ms 0.8874ms 1.1269 KOps/s 1.1319 KOps/s $\color{#d91a1a}-0.44\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.5327ms 4.9299ms 202.8422 Ops/s 199.3997 Ops/s $\color{#35bf28}+1.73\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.9924ms 1.8304ms 546.3230 Ops/s 547.0981 Ops/s $\color{#d91a1a}-0.14\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.3457ms 0.8542ms 1.1707 KOps/s 1.1572 KOps/s $\color{#35bf28}+1.17\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5440s 16.0211ms 62.4178 Ops/s 59.9364 Ops/s $\color{#35bf28}+4.14\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.0561ms 2.0042ms 498.9416 Ops/s 531.6949 Ops/s $\textbf{\color{#d91a1a}-6.16\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.5121ms 1.0211ms 979.3366 Ops/s 964.4696 Ops/s $\color{#35bf28}+1.54\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 40.3526ms 35.4794ms 28.1854 Ops/s 28.1622 Ops/s $\color{#35bf28}+0.08\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.1435ms 17.6715ms 56.5884 Ops/s 56.6993 Ops/s $\color{#d91a1a}-0.20\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.0582ms 36.2599ms 27.5787 Ops/s 27.3043 Ops/s $\color{#35bf28}+1.00\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.8750ms 18.0076ms 55.5320 Ops/s 55.7609 Ops/s $\color{#d91a1a}-0.41\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 39.7831ms 37.9531ms 26.3483 Ops/s 25.9758 Ops/s $\color{#35bf28}+1.43\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.5407ms 19.3663ms 51.6362 Ops/s 52.3028 Ops/s $\color{#d91a1a}-1.27\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8353ms 0.2147ms 4.6574 KOps/s 4.6547 KOps/s $\color{#35bf28}+0.06\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6935ms 1.3756ms 726.9313 Ops/s 731.9877 Ops/s $\color{#d91a1a}-0.69\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7304ms 2.3078ms 433.3058 Ops/s 421.8570 Ops/s $\color{#35bf28}+2.71\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 2.9999ms 2.8619ms 349.4169 Ops/s 347.7104 Ops/s $\color{#35bf28}+0.49\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2161ms 0.1289ms 7.7574 KOps/s 7.7231 KOps/s $\color{#35bf28}+0.44\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.6018ms 0.1833ms 5.4544 KOps/s 5.6516 KOps/s $\color{#d91a1a}-3.49\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1516ms 1.7767ms 562.8380 Ops/s 572.6249 Ops/s $\color{#d91a1a}-1.71\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6768ms 1.3780ms 725.6914 Ops/s 763.4776 Ops/s $\color{#d91a1a}-4.95\%$
test_collector_stack_then_write[50-img_shape0-small] 1.6168ms 1.0899ms 917.5210 Ops/s 908.3056 Ops/s $\color{#35bf28}+1.01\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.6968ms 3.6163ms 276.5224 Ops/s 288.5424 Ops/s $\color{#d91a1a}-4.17\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.6556ms 5.5166ms 181.2722 Ops/s 173.3291 Ops/s $\color{#35bf28}+4.58\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.2926ms 6.9566ms 143.7482 Ops/s 139.3269 Ops/s $\color{#35bf28}+3.17\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4095ms 0.2666ms 3.7510 KOps/s 3.7338 KOps/s $\color{#35bf28}+0.46\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6048ms 1.4743ms 678.2792 Ops/s 672.4954 Ops/s $\color{#35bf28}+0.86\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8333ms 2.4205ms 413.1339 Ops/s 400.7692 Ops/s $\color{#35bf28}+3.09\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3341ms 3.0501ms 327.8629 Ops/s 324.1382 Ops/s $\color{#35bf28}+1.15\%$
test_collector_without_rb[100-img_shape0-atari] 33.0094ms 32.7091ms 30.5725 Ops/s 30.2886 Ops/s $\color{#35bf28}+0.94\%$
test_collector_without_rb[200-img_shape1-large_batch] 64.8979ms 64.6154ms 15.4762 Ops/s 15.3832 Ops/s $\color{#35bf28}+0.60\%$
test_collector_with_rb[100-img_shape0-atari] 37.8473ms 37.1278ms 26.9340 Ops/s 26.7012 Ops/s $\color{#35bf28}+0.87\%$
test_collector_with_rb[200-img_shape1-large_batch] 0.6624s 0.1156s 8.6533 Ops/s 13.6177 Ops/s $\textbf{\color{#d91a1a}-36.46\%}$

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.4205μs 80.4073μs 12.4367 KOps/s 12.4880 KOps/s $\color{#d91a1a}-0.41\%$
test_tensor_to_bytestream_speed[torch.save] 0.1468ms 0.1456ms 6.8676 KOps/s 7.2572 KOps/s $\textbf{\color{#d91a1a}-5.37\%}$
test_tensor_to_bytestream_speed[untyped_storage] 0.1055s 0.1054s 9.4912 Ops/s 9.5493 Ops/s $\color{#d91a1a}-0.61\%$
test_tensor_to_bytestream_speed[numpy] 2.6048μs 2.5855μs 386.7772 KOps/s 400.6032 KOps/s $\color{#d91a1a}-3.45\%$
test_tensor_to_bytestream_speed[safetensors] 37.5730μs 37.4059μs 26.7337 KOps/s 26.2385 KOps/s $\color{#35bf28}+1.89\%$
test_simple 0.7971s 0.7893s 1.2669 Ops/s 1.2383 Ops/s $\color{#35bf28}+2.31\%$
test_transformed 1.5369s 1.4481s 0.6906 Ops/s 0.6920 Ops/s $\color{#d91a1a}-0.20\%$
test_serial 2.4465s 2.3229s 0.4305 Ops/s 0.4258 Ops/s $\color{#35bf28}+1.09\%$
test_parallel 2.0543s 1.9433s 0.5146 Ops/s 0.5015 Ops/s $\color{#35bf28}+2.62\%$
test_step_mdp_speed[True-True-True-True-True] 0.3738ms 44.7474μs 22.3476 KOps/s 22.6317 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[True-True-True-True-False] 57.5010μs 25.0772μs 39.8769 KOps/s 40.3732 KOps/s $\color{#d91a1a}-1.23\%$
test_step_mdp_speed[True-True-True-False-True] 53.2710μs 24.8601μs 40.2250 KOps/s 40.9017 KOps/s $\color{#d91a1a}-1.65\%$
test_step_mdp_speed[True-True-True-False-False] 50.3610μs 13.6949μs 73.0199 KOps/s 73.4920 KOps/s $\color{#d91a1a}-0.64\%$
test_step_mdp_speed[True-True-False-True-True] 77.2810μs 47.9695μs 20.8466 KOps/s 20.9460 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[True-True-False-True-False] 72.6920μs 27.8170μs 35.9493 KOps/s 36.3510 KOps/s $\color{#d91a1a}-1.11\%$
test_step_mdp_speed[True-True-False-False-True] 57.7210μs 27.8967μs 35.8466 KOps/s 36.5205 KOps/s $\color{#d91a1a}-1.85\%$
test_step_mdp_speed[True-True-False-False-False] 47.5110μs 16.7029μs 59.8697 KOps/s 60.2538 KOps/s $\color{#d91a1a}-0.64\%$
test_step_mdp_speed[True-False-True-True-True] 82.9420μs 50.3172μs 19.8739 KOps/s 19.7472 KOps/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[True-False-True-True-False] 57.7810μs 30.4147μs 32.8788 KOps/s 32.9579 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[True-False-True-False-True] 65.3810μs 27.6910μs 36.1129 KOps/s 36.7649 KOps/s $\color{#d91a1a}-1.77\%$
test_step_mdp_speed[True-False-True-False-False] 42.9810μs 16.6649μs 60.0063 KOps/s 60.6802 KOps/s $\color{#d91a1a}-1.11\%$
test_step_mdp_speed[True-False-False-True-True] 85.0910μs 52.5693μs 19.0225 KOps/s 18.9973 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[True-False-False-True-False] 73.4710μs 33.3141μs 30.0173 KOps/s 30.3026 KOps/s $\color{#d91a1a}-0.94\%$
test_step_mdp_speed[True-False-False-False-True] 60.7010μs 30.4420μs 32.8493 KOps/s 33.3728 KOps/s $\color{#d91a1a}-1.57\%$
test_step_mdp_speed[True-False-False-False-False] 50.4010μs 19.1792μs 52.1397 KOps/s 52.2854 KOps/s $\color{#d91a1a}-0.28\%$
test_step_mdp_speed[False-True-True-True-True] 80.7620μs 50.2189μs 19.9128 KOps/s 19.7893 KOps/s $\color{#35bf28}+0.62\%$
test_step_mdp_speed[False-True-True-True-False] 76.7620μs 30.5112μs 32.7749 KOps/s 32.9109 KOps/s $\color{#d91a1a}-0.41\%$
test_step_mdp_speed[False-True-True-False-True] 2.2935ms 31.7114μs 31.5344 KOps/s 31.9719 KOps/s $\color{#d91a1a}-1.37\%$
test_step_mdp_speed[False-True-True-False-False] 78.2620μs 18.2005μs 54.9435 KOps/s 54.8790 KOps/s $\color{#35bf28}+0.12\%$
test_step_mdp_speed[False-True-False-True-True] 91.8510μs 52.8610μs 18.9176 KOps/s 18.9789 KOps/s $\color{#d91a1a}-0.32\%$
test_step_mdp_speed[False-True-False-True-False] 60.3310μs 33.1861μs 30.1331 KOps/s 30.4104 KOps/s $\color{#d91a1a}-0.91\%$
test_step_mdp_speed[False-True-False-False-True] 61.7310μs 34.2854μs 29.1669 KOps/s 29.2313 KOps/s $\color{#d91a1a}-0.22\%$
test_step_mdp_speed[False-True-False-False-False] 48.9810μs 20.9176μs 47.8066 KOps/s 47.6947 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[False-False-True-True-True] 94.4210μs 55.6245μs 17.9777 KOps/s 17.9518 KOps/s $\color{#35bf28}+0.14\%$
test_step_mdp_speed[False-False-True-True-False] 63.6510μs 36.7334μs 27.2231 KOps/s 27.5773 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[False-False-True-False-True] 61.5210μs 33.9655μs 29.4416 KOps/s 29.0612 KOps/s $\color{#35bf28}+1.31\%$
test_step_mdp_speed[False-False-True-False-False] 47.2610μs 20.9381μs 47.7598 KOps/s 47.8897 KOps/s $\color{#d91a1a}-0.27\%$
test_step_mdp_speed[False-False-False-True-True] 0.1049ms 57.4914μs 17.3939 KOps/s 17.4145 KOps/s $\color{#d91a1a}-0.12\%$
test_step_mdp_speed[False-False-False-True-False] 67.3710μs 38.6126μs 25.8983 KOps/s 26.0426 KOps/s $\color{#d91a1a}-0.55\%$
test_step_mdp_speed[False-False-False-False-True] 73.9010μs 36.4115μs 27.4638 KOps/s 27.4703 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[False-False-False-False-False] 59.6910μs 23.4154μs 42.7069 KOps/s 42.8284 KOps/s $\color{#d91a1a}-0.28\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8672s 0.7735s 1.2928 Ops/s 1.3002 Ops/s $\color{#d91a1a}-0.57\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7309s 0.6346s 1.5758 Ops/s 1.5752 Ops/s $\color{#35bf28}+0.04\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7542s 1.6809s 0.5949 Ops/s 0.6002 Ops/s $\color{#d91a1a}-0.88\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5271s 1.4505s 0.6894 Ops/s 0.6919 Ops/s $\color{#d91a1a}-0.36\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0063s 1.9309s 0.5179 Ops/s 0.5226 Ops/s $\color{#d91a1a}-0.91\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7939s 1.7077s 0.5856 Ops/s 0.5903 Ops/s $\color{#d91a1a}-0.79\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.8815s 4.7051s 0.2125 Ops/s 0.2147 Ops/s $\color{#d91a1a}-1.02\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5182s 4.4889s 0.2228 Ops/s 0.2247 Ops/s $\color{#d91a1a}-0.85\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.1901s 2.0263s 0.4935 Ops/s 0.5093 Ops/s $\color{#d91a1a}-3.11\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.8044s 1.7171s 0.5824 Ops/s 0.5994 Ops/s $\color{#d91a1a}-2.84\%$
test_values[generalized_advantage_estimate-True-True] 20.3560ms 19.8157ms 50.4649 Ops/s 50.2415 Ops/s $\color{#35bf28}+0.44\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1312s 3.5365ms 282.7692 Ops/s 260.0067 Ops/s $\textbf{\color{#35bf28}+8.75\%}$
test_values[td0_return_estimate-False-False] 0.1056ms 81.0726μs 12.3346 KOps/s 12.2469 KOps/s $\color{#35bf28}+0.72\%$
test_values[td1_return_estimate-False-False] 47.8497ms 47.3225ms 21.1316 Ops/s 21.1371 Ops/s $\color{#d91a1a}-0.03\%$
test_values[vec_td1_return_estimate-False-False] 1.2888ms 1.0768ms 928.6867 Ops/s 930.9480 Ops/s $\color{#d91a1a}-0.24\%$
test_values[td_lambda_return_estimate-True-False] 82.7761ms 77.9436ms 12.8298 Ops/s 12.9116 Ops/s $\color{#d91a1a}-0.63\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2762ms 1.0739ms 931.1970 Ops/s 933.7830 Ops/s $\color{#d91a1a}-0.28\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.6964ms 20.0213ms 49.9469 Ops/s 49.8236 Ops/s $\color{#35bf28}+0.25\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0181ms 0.7426ms 1.3467 KOps/s 1.3352 KOps/s $\color{#35bf28}+0.86\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8111ms 0.6914ms 1.4464 KOps/s 1.4970 KOps/s $\color{#d91a1a}-3.37\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5397ms 1.4811ms 675.1576 Ops/s 675.3605 Ops/s $\color{#d91a1a}-0.03\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7563ms 0.7068ms 1.4149 KOps/s 1.4665 KOps/s $\color{#d91a1a}-3.52\%$
test_dqn_speed[False-None] 1.7042ms 1.5313ms 653.0193 Ops/s 654.1960 Ops/s $\color{#d91a1a}-0.18\%$
test_dqn_speed[False-backward] 2.2275ms 2.1776ms 459.2257 Ops/s 456.7836 Ops/s $\color{#35bf28}+0.53\%$
test_dqn_speed[True-None] 1.2691ms 0.5779ms 1.7304 KOps/s 1.7379 KOps/s $\color{#d91a1a}-0.43\%$
test_dqn_speed[True-backward] 1.1756ms 1.1136ms 897.9808 Ops/s 920.2857 Ops/s $\color{#d91a1a}-2.42\%$
test_dqn_speed[reduce-overhead-None] 0.6696ms 0.5993ms 1.6687 KOps/s 1.6668 KOps/s $\color{#35bf28}+0.11\%$
test_ddpg_speed[False-None] 3.4565ms 2.8966ms 345.2309 Ops/s 344.6884 Ops/s $\color{#35bf28}+0.16\%$
test_ddpg_speed[False-backward] 4.6512ms 4.2377ms 235.9763 Ops/s 236.9833 Ops/s $\color{#d91a1a}-0.42\%$
test_ddpg_speed[True-None] 1.5001ms 1.3191ms 758.1044 Ops/s 759.1369 Ops/s $\color{#d91a1a}-0.14\%$
test_ddpg_speed[True-backward] 2.4093ms 2.3506ms 425.4306 Ops/s 426.6445 Ops/s $\color{#d91a1a}-0.28\%$
test_ddpg_speed[reduce-overhead-None] 1.7581ms 1.3298ms 751.9844 Ops/s 753.1660 Ops/s $\color{#d91a1a}-0.16\%$
test_sac_speed[False-None] 8.9690ms 8.3981ms 119.0747 Ops/s 119.6360 Ops/s $\color{#d91a1a}-0.47\%$
test_sac_speed[False-backward] 12.0086ms 11.5877ms 86.2987 Ops/s 87.0131 Ops/s $\color{#d91a1a}-0.82\%$
test_sac_speed[True-None] 1.9902ms 1.7997ms 555.6627 Ops/s 554.8104 Ops/s $\color{#35bf28}+0.15\%$
test_sac_speed[True-backward] 3.5775ms 3.4267ms 291.8290 Ops/s 290.5914 Ops/s $\color{#35bf28}+0.43\%$
test_sac_speed[reduce-overhead-None] 19.6462ms 10.9994ms 90.9140 Ops/s 89.9366 Ops/s $\color{#35bf28}+1.09\%$
test_redq_deprec_speed[False-None] 10.0744ms 9.3597ms 106.8415 Ops/s 106.4448 Ops/s $\color{#35bf28}+0.37\%$
test_redq_deprec_speed[False-backward] 12.9058ms 12.5428ms 79.7267 Ops/s 80.0975 Ops/s $\color{#d91a1a}-0.46\%$
test_redq_deprec_speed[True-None] 2.9100ms 2.4981ms 400.3002 Ops/s 399.2739 Ops/s $\color{#35bf28}+0.26\%$
test_redq_deprec_speed[True-backward] 4.1513ms 4.0836ms 244.8813 Ops/s 237.3992 Ops/s $\color{#35bf28}+3.15\%$
test_redq_deprec_speed[reduce-overhead-None] 16.3911ms 10.0087ms 99.9132 Ops/s 100.1635 Ops/s $\color{#d91a1a}-0.25\%$
test_td3_speed[False-None] 8.5558ms 8.3365ms 119.9551 Ops/s 114.6523 Ops/s $\color{#35bf28}+4.63\%$
test_td3_speed[False-backward] 11.1594ms 10.6596ms 93.8120 Ops/s 91.9886 Ops/s $\color{#35bf28}+1.98\%$
test_td3_speed[True-None] 1.6736ms 1.6135ms 619.7695 Ops/s 618.7020 Ops/s $\color{#35bf28}+0.17\%$
test_td3_speed[True-backward] 3.1959ms 3.0676ms 325.9850 Ops/s 307.4743 Ops/s $\textbf{\color{#35bf28}+6.02\%}$
test_td3_speed[reduce-overhead-None] 72.8567ms 24.7258ms 40.4436 Ops/s 39.8571 Ops/s $\color{#35bf28}+1.47\%$
test_cql_speed[False-None] 18.0885ms 17.4007ms 57.4689 Ops/s 57.7281 Ops/s $\color{#d91a1a}-0.45\%$
test_cql_speed[False-backward] 23.9344ms 22.7337ms 43.9876 Ops/s 43.4164 Ops/s $\color{#35bf28}+1.32\%$
test_cql_speed[True-None] 3.5056ms 3.3514ms 298.3843 Ops/s 309.2655 Ops/s $\color{#d91a1a}-3.52\%$
test_cql_speed[True-backward] 5.8083ms 5.4539ms 183.3567 Ops/s 180.9588 Ops/s $\color{#35bf28}+1.33\%$
test_cql_speed[reduce-overhead-None] 19.0171ms 11.9491ms 83.6886 Ops/s 82.7305 Ops/s $\color{#35bf28}+1.16\%$
test_a2c_speed[False-None] 4.0511ms 3.2512ms 307.5744 Ops/s 308.3398 Ops/s $\color{#d91a1a}-0.25\%$
test_a2c_speed[False-backward] 6.8568ms 6.4394ms 155.2938 Ops/s 153.2819 Ops/s $\color{#35bf28}+1.31\%$
test_a2c_speed[True-None] 1.4823ms 1.3217ms 756.5998 Ops/s 749.0466 Ops/s $\color{#35bf28}+1.01\%$
test_a2c_speed[True-backward] 3.1956ms 3.1028ms 322.2913 Ops/s 337.2081 Ops/s $\color{#d91a1a}-4.42\%$
test_a2c_speed[reduce-overhead-None] 1.0587ms 0.9867ms 1.0135 KOps/s 1.0176 KOps/s $\color{#d91a1a}-0.41\%$
test_ppo_speed[False-None] 4.0960ms 3.8899ms 257.0774 Ops/s 262.3012 Ops/s $\color{#d91a1a}-1.99\%$
test_ppo_speed[False-backward] 7.4877ms 7.2995ms 136.9957 Ops/s 142.9171 Ops/s $\color{#d91a1a}-4.14\%$
test_ppo_speed[True-None] 1.6684ms 1.4263ms 701.0961 Ops/s 704.3291 Ops/s $\color{#d91a1a}-0.46\%$
test_ppo_speed[True-backward] 3.1278ms 3.0577ms 327.0387 Ops/s 305.6006 Ops/s $\textbf{\color{#35bf28}+7.02\%}$
test_ppo_speed[reduce-overhead-None] 1.1264ms 1.0387ms 962.7528 Ops/s 927.5657 Ops/s $\color{#35bf28}+3.79\%$
test_reinforce_speed[False-None] 2.4024ms 2.2827ms 438.0731 Ops/s 435.3758 Ops/s $\color{#35bf28}+0.62\%$
test_reinforce_speed[False-backward] 3.7417ms 3.3171ms 301.4643 Ops/s 288.1385 Ops/s $\color{#35bf28}+4.62\%$
test_reinforce_speed[True-None] 1.4037ms 1.2834ms 779.1863 Ops/s 791.2317 Ops/s $\color{#d91a1a}-1.52\%$
test_reinforce_speed[True-backward] 2.9071ms 2.8618ms 349.4315 Ops/s 335.4034 Ops/s $\color{#35bf28}+4.18\%$
test_reinforce_speed[reduce-overhead-None] 0.4515s 10.3703ms 96.4295 Ops/s 103.4872 Ops/s $\textbf{\color{#d91a1a}-6.82\%}$
test_iql_speed[False-None] 10.0428ms 9.5396ms 104.8258 Ops/s 105.3766 Ops/s $\color{#d91a1a}-0.52\%$
test_iql_speed[False-backward] 13.7422ms 13.2646ms 75.3888 Ops/s 75.5955 Ops/s $\color{#d91a1a}-0.27\%$
test_iql_speed[True-None] 2.4906ms 2.1634ms 462.2259 Ops/s 460.1564 Ops/s $\color{#35bf28}+0.45\%$
test_iql_speed[True-backward] 4.8566ms 4.6802ms 213.6641 Ops/s 208.7999 Ops/s $\color{#35bf28}+2.33\%$
test_iql_speed[reduce-overhead-None] 18.1877ms 10.6586ms 93.8212 Ops/s 95.3685 Ops/s $\color{#d91a1a}-1.62\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4236ms 6.0245ms 165.9895 Ops/s 167.2136 Ops/s $\color{#d91a1a}-0.73\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9586ms 0.3109ms 3.2167 KOps/s 3.0035 KOps/s $\textbf{\color{#35bf28}+7.10\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7119ms 0.3084ms 3.2422 KOps/s 3.5237 KOps/s $\textbf{\color{#d91a1a}-7.99\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1662ms 5.9065ms 169.3049 Ops/s 172.4523 Ops/s $\color{#d91a1a}-1.83\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6911ms 0.2806ms 3.5640 KOps/s 2.9128 KOps/s $\textbf{\color{#35bf28}+22.36\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5610ms 0.2618ms 3.8193 KOps/s 3.0514 KOps/s $\textbf{\color{#35bf28}+25.17\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6099ms 1.2561ms 796.1411 Ops/s 736.3407 Ops/s $\textbf{\color{#35bf28}+8.12\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6897ms 1.3100ms 763.3626 Ops/s 856.4126 Ops/s $\textbf{\color{#d91a1a}-10.87\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2498ms 6.0869ms 164.2883 Ops/s 169.1256 Ops/s $\color{#d91a1a}-2.86\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.3670ms 0.4910ms 2.0366 KOps/s 2.3183 KOps/s $\textbf{\color{#d91a1a}-12.15\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7360ms 0.5049ms 1.9804 KOps/s 2.4075 KOps/s $\textbf{\color{#d91a1a}-17.74\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0684ms 5.9709ms 167.4795 Ops/s 172.2128 Ops/s $\color{#d91a1a}-2.75\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.8464ms 0.3517ms 2.8433 KOps/s 3.5011 KOps/s $\textbf{\color{#d91a1a}-18.79\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6547ms 0.3429ms 2.9166 KOps/s 3.6854 KOps/s $\textbf{\color{#d91a1a}-20.86\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1843ms 5.9108ms 169.1824 Ops/s 173.0765 Ops/s $\color{#d91a1a}-2.25\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.5742ms 0.3002ms 3.3311 KOps/s 3.1577 KOps/s $\textbf{\color{#35bf28}+5.49\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7344ms 0.2981ms 3.3551 KOps/s 3.3390 KOps/s $\color{#35bf28}+0.48\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2421ms 6.1065ms 163.7596 Ops/s 166.8099 Ops/s $\color{#d91a1a}-1.83\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8270ms 0.5151ms 1.9412 KOps/s 628.8264 Ops/s $\textbf{\color{#35bf28}+208.71\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7613ms 0.5113ms 1.9558 KOps/s 1.9874 KOps/s $\color{#d91a1a}-1.59\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.6493s 17.9953ms 55.5700 Ops/s 195.2799 Ops/s $\textbf{\color{#d91a1a}-71.54\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 12.8473ms 2.0207ms 494.8884 Ops/s 561.8926 Ops/s $\textbf{\color{#d91a1a}-11.92\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.3088ms 1.2276ms 814.5903 Ops/s 1.0633 KOps/s $\textbf{\color{#d91a1a}-23.39\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.5479ms 5.0432ms 198.2868 Ops/s 195.1380 Ops/s $\color{#35bf28}+1.61\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 13.0155ms 2.1312ms 469.2155 Ops/s 523.9608 Ops/s $\textbf{\color{#d91a1a}-10.45\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 3.1807ms 1.1877ms 841.9945 Ops/s 1.0700 KOps/s $\textbf{\color{#d91a1a}-21.31\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5800s 16.7512ms 59.6974 Ops/s 50.9932 Ops/s $\textbf{\color{#35bf28}+17.07\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.3425ms 2.1123ms 473.4215 Ops/s 504.9190 Ops/s $\textbf{\color{#d91a1a}-6.24\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.8821ms 1.1050ms 905.0111 Ops/s 911.6802 Ops/s $\color{#d91a1a}-0.73\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.8016ms 35.7201ms 27.9954 Ops/s 27.4065 Ops/s $\color{#35bf28}+2.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.9808ms 18.1314ms 55.1530 Ops/s 55.2283 Ops/s $\color{#d91a1a}-0.14\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.8093ms 36.9912ms 27.0334 Ops/s 26.9669 Ops/s $\color{#35bf28}+0.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.5393ms 18.4647ms 54.1575 Ops/s 54.8590 Ops/s $\color{#d91a1a}-1.28\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.5833ms 38.7985ms 25.7742 Ops/s 25.6723 Ops/s $\color{#35bf28}+0.40\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.7671ms 20.2500ms 49.3827 Ops/s 50.6132 Ops/s $\color{#d91a1a}-2.43\%$
test_storage_write_lazystack[50-img_shape0-small] 1.0006ms 0.2221ms 4.5016 KOps/s 4.4977 KOps/s $\color{#35bf28}+0.09\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.5845ms 1.4013ms 713.6302 Ops/s 725.8364 Ops/s $\color{#d91a1a}-1.68\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7472ms 2.3519ms 425.1898 Ops/s 432.4285 Ops/s $\color{#d91a1a}-1.67\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1111ms 2.9436ms 339.7219 Ops/s 336.1664 Ops/s $\color{#35bf28}+1.06\%$
test_storage_write_contiguous[50-img_shape0-small] 0.5140ms 0.1510ms 6.6210 KOps/s 6.5392 KOps/s $\color{#35bf28}+1.25\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3709ms 0.2003ms 4.9930 KOps/s 4.3056 KOps/s $\textbf{\color{#35bf28}+15.97\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1094ms 1.7096ms 584.9481 Ops/s 575.5107 Ops/s $\color{#35bf28}+1.64\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4558ms 1.2548ms 796.9086 Ops/s 713.0464 Ops/s $\textbf{\color{#35bf28}+11.76\%}$
test_collector_stack_then_write[50-img_shape0-small] 1.3163ms 1.1446ms 873.6625 Ops/s 872.4753 Ops/s $\color{#35bf28}+0.14\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7373ms 3.6305ms 275.4460 Ops/s 274.3290 Ops/s $\color{#35bf28}+0.41\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.8865ms 5.7090ms 175.1618 Ops/s 175.3049 Ops/s $\color{#d91a1a}-0.08\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 8.8719ms 7.1431ms 139.9955 Ops/s 140.4597 Ops/s $\color{#d91a1a}-0.33\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4511ms 0.2763ms 3.6187 KOps/s 3.6681 KOps/s $\color{#d91a1a}-1.35\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7248ms 1.5557ms 642.8094 Ops/s 670.5309 Ops/s $\color{#d91a1a}-4.13\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8950ms 2.4709ms 404.7124 Ops/s 407.9006 Ops/s $\color{#d91a1a}-0.78\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.5055ms 3.1685ms 315.6065 Ops/s 314.4630 Ops/s $\color{#35bf28}+0.36\%$
test_collector_without_rb[100-img_shape0-atari] 34.6886ms 34.1017ms 29.3240 Ops/s 29.1473 Ops/s $\color{#35bf28}+0.61\%$
test_collector_without_rb[200-img_shape1-large_batch] 68.0964ms 67.1940ms 14.8823 Ops/s 14.9753 Ops/s $\color{#d91a1a}-0.62\%$
test_collector_with_rb[100-img_shape0-atari] 39.6858ms 39.0313ms 25.6205 Ops/s 25.9016 Ops/s $\color{#d91a1a}-1.09\%$
test_collector_with_rb[200-img_shape1-large_batch] 77.6058ms 76.6970ms 13.0383 Ops/s 13.2611 Ops/s $\color{#d91a1a}-1.68\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 57.7638ms 57.2541ms 17.4660 Ops/s 17.9752 Ops/s $\color{#d91a1a}-2.83\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1151s 0.1144s 8.7393 Ops/s 9.0191 Ops/s $\color{#d91a1a}-3.10\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 59.3615ms 57.9909ms 17.2441 Ops/s 17.3108 Ops/s $\color{#d91a1a}-0.39\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1183s 0.1160s 8.6194 Ops/s 8.6667 Ops/s $\color{#d91a1a}-0.55\%$

@vmoens vmoens merged commit bec4498 into main Feb 7, 2026
295 of 296 checks passed
@vmoens vmoens deleted the ci/cpu-only-wheel-builds branch February 7, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Has to do with CI setup (e.g. wheels & builds, tests...) ciflow/binaries/all Build all binaries ciflow/rocm-mi300 CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant