Skip to content

[Feature] Restructure LLM pip extras for backend flexibility#3436

Merged
vmoens merged 28 commits intogh/vmoens/216/basefrom
gh/vmoens/216/head
Feb 3, 2026
Merged

[Feature] Restructure LLM pip extras for backend flexibility#3436
vmoens merged 28 commits intogh/vmoens/216/basefrom
gh/vmoens/216/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 2, 2026

Stack from ghstack (oldest at bottom):


  • Add llm-vllm, llm-sglang, llm-all extras for backend selection
  • Base llm extra no longer includes inference backend
  • Update sglang_nccl.py to use SGLang's native NCCL utilities
  • Remove vLLM dependency from SGLang weight sync code

Users can now:

  • pip install torchrl[llm-vllm] for vLLM backend
  • pip install torchrl[llm-sglang] for SGLang backend
  • pip install torchrl[llm-all] for both backends

Co-authored-by: Cursor cursoragent@cursor.com

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 2, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3436

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 23 Pending, 1 Unrelated Failure

As of commit 7131f5b with merge base 7a0b1f9 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 153. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}19$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 86.3563μs 84.4529μs 11.8409 KOps/s 11.8476 KOps/s $\color{#d91a1a}-0.06\%$
test_tensor_to_bytestream_speed[torch.save] 0.1398ms 0.1395ms 7.1683 KOps/s 6.8406 KOps/s $\color{#35bf28}+4.79\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1121s 0.1119s 8.9362 Ops/s 9.5083 Ops/s $\textbf{\color{#d91a1a}-6.02\%}$
test_tensor_to_bytestream_speed[numpy] 2.7090μs 2.7004μs 370.3152 KOps/s 391.1658 KOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_tensor_to_bytestream_speed[safetensors] 39.4581μs 39.2357μs 25.4870 KOps/s 24.7246 KOps/s $\color{#35bf28}+3.08\%$
test_simple 0.6696s 0.5770s 1.7332 Ops/s 1.7454 Ops/s $\color{#d91a1a}-0.70\%$
test_transformed 1.2511s 1.1602s 0.8619 Ops/s 0.8619 Ops/s $+0.00\%$
test_serial 1.7951s 1.7000s 0.5882 Ops/s 0.5899 Ops/s $\color{#d91a1a}-0.28\%$
test_parallel 1.2204s 1.1130s 0.8985 Ops/s 0.8209 Ops/s $\textbf{\color{#35bf28}+9.45\%}$
test_step_mdp_speed[True-True-True-True-True] 0.1448ms 43.9175μs 22.7700 KOps/s 21.9394 KOps/s $\color{#35bf28}+3.79\%$
test_step_mdp_speed[True-True-True-True-False] 55.7810μs 24.9456μs 40.0872 KOps/s 39.6706 KOps/s $\color{#35bf28}+1.05\%$
test_step_mdp_speed[True-True-True-False-True] 52.2610μs 25.0533μs 39.9149 KOps/s 39.4093 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[True-True-True-False-False] 43.9900μs 14.0193μs 71.3305 KOps/s 71.8140 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[True-True-False-True-True] 0.1197ms 47.9222μs 20.8671 KOps/s 20.6030 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[True-True-False-True-False] 64.2110μs 28.1712μs 35.4973 KOps/s 35.6281 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[True-True-False-False-True] 59.6510μs 28.0606μs 35.6371 KOps/s 35.1667 KOps/s $\color{#35bf28}+1.34\%$
test_step_mdp_speed[True-True-False-False-False] 48.2610μs 16.5862μs 60.2911 KOps/s 59.3625 KOps/s $\color{#35bf28}+1.56\%$
test_step_mdp_speed[True-False-True-True-True] 89.0110μs 50.0762μs 19.9696 KOps/s 19.4019 KOps/s $\color{#35bf28}+2.93\%$
test_step_mdp_speed[True-False-True-True-False] 58.6700μs 30.8751μs 32.3885 KOps/s 32.2119 KOps/s $\color{#35bf28}+0.55\%$
test_step_mdp_speed[True-False-True-False-True] 95.2420μs 27.6182μs 36.2081 KOps/s 35.7457 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[True-False-True-False-False] 56.3510μs 16.4578μs 60.7614 KOps/s 59.5384 KOps/s $\color{#35bf28}+2.05\%$
test_step_mdp_speed[True-False-False-True-True] 0.2267ms 51.8646μs 19.2810 KOps/s 18.6232 KOps/s $\color{#35bf28}+3.53\%$
test_step_mdp_speed[True-False-False-True-False] 53.7410μs 32.7970μs 30.4906 KOps/s 29.5669 KOps/s $\color{#35bf28}+3.12\%$
test_step_mdp_speed[True-False-False-False-True] 54.0610μs 30.7424μs 32.5284 KOps/s 32.6918 KOps/s $\color{#d91a1a}-0.50\%$
test_step_mdp_speed[True-False-False-False-False] 51.8610μs 19.3304μs 51.7321 KOps/s 51.5346 KOps/s $\color{#35bf28}+0.38\%$
test_step_mdp_speed[False-True-True-True-True] 76.0910μs 49.4186μs 20.2353 KOps/s 19.4597 KOps/s $\color{#35bf28}+3.99\%$
test_step_mdp_speed[False-True-True-True-False] 59.2010μs 30.7145μs 32.5579 KOps/s 32.5459 KOps/s $\color{#35bf28}+0.04\%$
test_step_mdp_speed[False-True-True-False-True] 55.9710μs 31.8954μs 31.3525 KOps/s 31.5297 KOps/s $\color{#d91a1a}-0.56\%$
test_step_mdp_speed[False-True-True-False-False] 45.5400μs 18.4717μs 54.1368 KOps/s 53.9881 KOps/s $\color{#35bf28}+0.28\%$
test_step_mdp_speed[False-True-False-True-True] 3.0224ms 53.5931μs 18.6591 KOps/s 18.6585 KOps/s $+0.00\%$
test_step_mdp_speed[False-True-False-True-False] 92.3710μs 33.5857μs 29.7746 KOps/s 29.9413 KOps/s $\color{#d91a1a}-0.56\%$
test_step_mdp_speed[False-True-False-False-True] 63.6010μs 34.0820μs 29.3410 KOps/s 28.7516 KOps/s $\color{#35bf28}+2.05\%$
test_step_mdp_speed[False-True-False-False-False] 49.7410μs 21.1514μs 47.2782 KOps/s 47.2865 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[False-False-True-True-True] 92.8710μs 56.4409μs 17.7176 KOps/s 17.5832 KOps/s $\color{#35bf28}+0.76\%$
test_step_mdp_speed[False-False-True-True-False] 68.2510μs 35.7832μs 27.9460 KOps/s 27.4758 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[False-False-True-False-True] 66.0510μs 34.5364μs 28.9550 KOps/s 28.6787 KOps/s $\color{#35bf28}+0.96\%$
test_step_mdp_speed[False-False-True-False-False] 47.2110μs 20.9965μs 47.6271 KOps/s 47.1964 KOps/s $\color{#35bf28}+0.91\%$
test_step_mdp_speed[False-False-False-True-True] 89.8810μs 57.2096μs 17.4796 KOps/s 16.8633 KOps/s $\color{#35bf28}+3.65\%$
test_step_mdp_speed[False-False-False-True-False] 68.3010μs 38.0658μs 26.2703 KOps/s 25.3413 KOps/s $\color{#35bf28}+3.67\%$
test_step_mdp_speed[False-False-False-False-True] 71.3110μs 36.5076μs 27.3916 KOps/s 27.2473 KOps/s $\color{#35bf28}+0.53\%$
test_step_mdp_speed[False-False-False-False-False] 53.6010μs 23.5153μs 42.5255 KOps/s 42.2331 KOps/s $\color{#35bf28}+0.69\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8875s 0.7835s 1.2764 Ops/s 1.2961 Ops/s $\color{#d91a1a}-1.53\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7383s 0.6435s 1.5539 Ops/s 1.5838 Ops/s $\color{#d91a1a}-1.89\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7919s 1.7095s 0.5850 Ops/s 0.5955 Ops/s $\color{#d91a1a}-1.78\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5461s 1.4698s 0.6804 Ops/s 0.6874 Ops/s $\color{#d91a1a}-1.02\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0037s 1.9257s 0.5193 Ops/s 0.5204 Ops/s $\color{#d91a1a}-0.21\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7770s 1.7043s 0.5868 Ops/s 0.5835 Ops/s $\color{#35bf28}+0.55\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7286s 4.6443s 0.2153 Ops/s 0.2167 Ops/s $\color{#d91a1a}-0.65\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5381s 4.4350s 0.2255 Ops/s 0.2257 Ops/s $\color{#d91a1a}-0.11\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0739s 1.9737s 0.5067 Ops/s 0.5121 Ops/s $\color{#d91a1a}-1.07\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.8123s 1.6891s 0.5920 Ops/s 0.5963 Ops/s $\color{#d91a1a}-0.71\%$
test_values[generalized_advantage_estimate-True-True] 11.6126ms 10.5357ms 94.9157 Ops/s 100.2538 Ops/s $\textbf{\color{#d91a1a}-5.32\%}$
test_values[vec_generalized_advantage_estimate-True-True] 21.3395ms 17.7113ms 56.4612 Ops/s 56.5021 Ops/s $\color{#d91a1a}-0.07\%$
test_values[td0_return_estimate-False-False] 0.2522ms 0.1295ms 7.7209 KOps/s 8.0634 KOps/s $\color{#d91a1a}-4.25\%$
test_values[td1_return_estimate-False-False] 28.4201ms 27.7920ms 35.9815 Ops/s 37.3915 Ops/s $\color{#d91a1a}-3.77\%$
test_values[vec_td1_return_estimate-False-False] 18.2721ms 17.6709ms 56.5902 Ops/s 56.7848 Ops/s $\color{#d91a1a}-0.34\%$
test_values[td_lambda_return_estimate-True-False] 41.7806ms 41.3340ms 24.1932 Ops/s 24.9915 Ops/s $\color{#d91a1a}-3.19\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.6937ms 17.7332ms 56.3914 Ops/s 56.5569 Ops/s $\color{#d91a1a}-0.29\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.3481ms 9.2192ms 108.4687 Ops/s 113.8376 Ops/s $\color{#d91a1a}-4.72\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7922ms 1.5451ms 647.2041 Ops/s 641.8228 Ops/s $\color{#35bf28}+0.84\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4552ms 0.4135ms 2.4183 KOps/s 2.3933 KOps/s $\color{#35bf28}+1.05\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 34.8986ms 34.3923ms 29.0763 Ops/s 28.7156 Ops/s $\color{#35bf28}+1.26\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.1299ms 1.7135ms 583.6013 Ops/s 584.8342 Ops/s $\color{#d91a1a}-0.21\%$
test_dqn_speed[False-None] 1.7859ms 1.3902ms 719.3427 Ops/s 717.2837 Ops/s $\color{#35bf28}+0.29\%$
test_dqn_speed[False-backward] 1.9821ms 1.8980ms 526.8613 Ops/s 528.7114 Ops/s $\color{#d91a1a}-0.35\%$
test_dqn_speed[True-None] 0.9644ms 0.5499ms 1.8186 KOps/s 1.8470 KOps/s $\color{#d91a1a}-1.54\%$
test_dqn_speed[True-backward] 1.2205ms 1.1111ms 899.9883 Ops/s 1.0034 KOps/s $\textbf{\color{#d91a1a}-10.31\%}$
test_dqn_speed[reduce-overhead-None] 0.8190ms 0.5347ms 1.8703 KOps/s 1.8572 KOps/s $\color{#35bf28}+0.71\%$
test_ddpg_speed[False-None] 3.2243ms 2.8478ms 351.1524 Ops/s 354.1945 Ops/s $\color{#d91a1a}-0.86\%$
test_ddpg_speed[False-backward] 4.2137ms 4.0533ms 246.7126 Ops/s 248.4284 Ops/s $\color{#d91a1a}-0.69\%$
test_ddpg_speed[True-None] 1.8074ms 1.4057ms 711.4018 Ops/s 710.8789 Ops/s $\color{#35bf28}+0.07\%$
test_ddpg_speed[True-backward] 2.5434ms 2.4358ms 410.5464 Ops/s 345.7419 Ops/s $\textbf{\color{#35bf28}+18.74\%}$
test_ddpg_speed[reduce-overhead-None] 1.5440ms 1.4037ms 712.3934 Ops/s 705.3423 Ops/s $\color{#35bf28}+1.00\%$
test_sac_speed[False-None] 8.3493ms 7.9096ms 126.4291 Ops/s 126.3277 Ops/s $\color{#35bf28}+0.08\%$
test_sac_speed[False-backward] 11.5527ms 11.1381ms 89.7820 Ops/s 90.1679 Ops/s $\color{#d91a1a}-0.43\%$
test_sac_speed[True-None] 2.3541ms 2.1701ms 460.8122 Ops/s 459.6495 Ops/s $\color{#35bf28}+0.25\%$
test_sac_speed[True-backward] 4.2529ms 4.1161ms 242.9481 Ops/s 236.6947 Ops/s $\color{#35bf28}+2.64\%$
test_sac_speed[reduce-overhead-None] 2.3287ms 2.1561ms 463.8077 Ops/s 462.9135 Ops/s $\color{#35bf28}+0.19\%$
test_redq_speed[False-None] 11.0846ms 10.4318ms 95.8612 Ops/s 94.3373 Ops/s $\color{#35bf28}+1.62\%$
test_redq_speed[False-backward] 18.8790ms 17.8820ms 55.9221 Ops/s 55.8453 Ops/s $\color{#35bf28}+0.14\%$
test_redq_speed[True-None] 4.7309ms 4.5700ms 218.8195 Ops/s 228.3268 Ops/s $\color{#d91a1a}-4.16\%$
test_redq_speed[True-backward] 10.4464ms 9.9431ms 100.5721 Ops/s 102.7200 Ops/s $\color{#d91a1a}-2.09\%$
test_redq_speed[reduce-overhead-None] 4.8517ms 4.5995ms 217.4132 Ops/s 215.1848 Ops/s $\color{#35bf28}+1.04\%$
test_redq_deprec_speed[False-None] 12.3471ms 11.1584ms 89.6184 Ops/s 92.3751 Ops/s $\color{#d91a1a}-2.98\%$
test_redq_deprec_speed[False-backward] 16.1709ms 15.8721ms 63.0038 Ops/s 64.1587 Ops/s $\color{#d91a1a}-1.80\%$
test_redq_deprec_speed[True-None] 4.0509ms 3.8521ms 259.6017 Ops/s 274.4742 Ops/s $\textbf{\color{#d91a1a}-5.42\%}$
test_redq_deprec_speed[True-backward] 8.1964ms 7.8647ms 127.1498 Ops/s 130.7566 Ops/s $\color{#d91a1a}-2.76\%$
test_redq_deprec_speed[reduce-overhead-None] 4.0335ms 3.7526ms 266.4820 Ops/s 272.3300 Ops/s $\color{#d91a1a}-2.15\%$
test_td3_speed[False-None] 8.1893ms 7.9768ms 125.3633 Ops/s 124.4875 Ops/s $\color{#35bf28}+0.70\%$
test_td3_speed[False-backward] 11.2224ms 10.8350ms 92.2936 Ops/s 92.8734 Ops/s $\color{#d91a1a}-0.62\%$
test_td3_speed[True-None] 1.8957ms 1.8481ms 541.0938 Ops/s 535.8410 Ops/s $\color{#35bf28}+0.98\%$
test_td3_speed[True-backward] 3.8269ms 3.6999ms 270.2769 Ops/s 223.2777 Ops/s $\textbf{\color{#35bf28}+21.05\%}$
test_td3_speed[reduce-overhead-None] 1.9036ms 1.8165ms 550.5198 Ops/s 552.0309 Ops/s $\color{#d91a1a}-0.27\%$
test_cql_speed[False-None] 27.0460ms 26.2412ms 38.1080 Ops/s 38.0847 Ops/s $\color{#35bf28}+0.06\%$
test_cql_speed[False-backward] 38.7510ms 35.4365ms 28.2195 Ops/s 28.2281 Ops/s $\color{#d91a1a}-0.03\%$
test_cql_speed[True-None] 13.1591ms 12.5068ms 79.9565 Ops/s 79.7460 Ops/s $\color{#35bf28}+0.26\%$
test_cql_speed[True-backward] 18.6574ms 18.1882ms 54.9808 Ops/s 55.9211 Ops/s $\color{#d91a1a}-1.68\%$
test_cql_speed[reduce-overhead-None] 13.2432ms 12.6188ms 79.2469 Ops/s 82.8766 Ops/s $\color{#d91a1a}-4.38\%$
test_a2c_speed[False-None] 5.9436ms 5.5287ms 180.8740 Ops/s 193.9526 Ops/s $\textbf{\color{#d91a1a}-6.74\%}$
test_a2c_speed[False-backward] 12.5321ms 11.9907ms 83.3981 Ops/s 84.4642 Ops/s $\color{#d91a1a}-1.26\%$
test_a2c_speed[True-None] 4.1999ms 3.8078ms 262.6178 Ops/s 274.7376 Ops/s $\color{#d91a1a}-4.41\%$
test_a2c_speed[True-backward] 9.0222ms 8.7701ms 114.0239 Ops/s 119.8734 Ops/s $\color{#d91a1a}-4.88\%$
test_a2c_speed[reduce-overhead-None] 4.1632ms 3.7930ms 263.6413 Ops/s 285.6503 Ops/s $\textbf{\color{#d91a1a}-7.70\%}$
test_ppo_speed[False-None] 6.5868ms 6.0025ms 166.5986 Ops/s 178.1674 Ops/s $\textbf{\color{#d91a1a}-6.49\%}$
test_ppo_speed[False-backward] 13.0192ms 12.6925ms 78.7869 Ops/s 80.2323 Ops/s $\color{#d91a1a}-1.80\%$
test_ppo_speed[True-None] 4.3051ms 3.7437ms 267.1150 Ops/s 262.2594 Ops/s $\color{#35bf28}+1.85\%$
test_ppo_speed[True-backward] 8.8738ms 8.5970ms 116.3200 Ops/s 116.0146 Ops/s $\color{#35bf28}+0.26\%$
test_ppo_speed[reduce-overhead-None] 3.8492ms 3.6466ms 274.2316 Ops/s 273.9807 Ops/s $\color{#35bf28}+0.09\%$
test_reinforce_speed[False-None] 4.8305ms 4.6461ms 215.2341 Ops/s 220.3183 Ops/s $\color{#d91a1a}-2.31\%$
test_reinforce_speed[False-backward] 7.6482ms 7.4331ms 134.5338 Ops/s 132.2621 Ops/s $\color{#35bf28}+1.72\%$
test_reinforce_speed[True-None] 3.1134ms 2.9153ms 343.0158 Ops/s 341.7236 Ops/s $\color{#35bf28}+0.38\%$
test_reinforce_speed[True-backward] 8.1964ms 7.9283ms 126.1300 Ops/s 102.0558 Ops/s $\textbf{\color{#35bf28}+23.59\%}$
test_reinforce_speed[reduce-overhead-None] 3.0659ms 2.8968ms 345.2137 Ops/s 365.2993 Ops/s $\textbf{\color{#d91a1a}-5.50\%}$
test_iql_speed[False-None] 26.4919ms 21.1110ms 47.3686 Ops/s 51.7222 Ops/s $\textbf{\color{#d91a1a}-8.42\%}$
test_iql_speed[False-backward] 37.0630ms 30.8404ms 32.4250 Ops/s 33.3846 Ops/s $\color{#d91a1a}-2.87\%$
test_iql_speed[True-None] 8.9837ms 8.7134ms 114.7658 Ops/s 124.5360 Ops/s $\textbf{\color{#d91a1a}-7.85\%}$
test_iql_speed[True-backward] 17.2705ms 16.9887ms 58.8626 Ops/s 58.9822 Ops/s $\color{#d91a1a}-0.20\%$
test_iql_speed[reduce-overhead-None] 9.0062ms 8.7563ms 114.2039 Ops/s 114.3985 Ops/s $\color{#d91a1a}-0.17\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2991ms 6.0429ms 165.4824 Ops/s 164.0443 Ops/s $\color{#35bf28}+0.88\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.2373ms 0.3285ms 3.0438 KOps/s 3.0151 KOps/s $\color{#35bf28}+0.95\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6535ms 0.3185ms 3.1402 KOps/s 3.7371 KOps/s $\textbf{\color{#d91a1a}-15.97\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0118ms 5.7631ms 173.5192 Ops/s 173.1741 Ops/s $\color{#35bf28}+0.20\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.1046ms 0.3550ms 2.8171 KOps/s 3.4226 KOps/s $\textbf{\color{#d91a1a}-17.69\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6154ms 0.3517ms 2.8436 KOps/s 3.6595 KOps/s $\textbf{\color{#d91a1a}-22.30\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6625ms 1.4247ms 701.8811 Ops/s 746.3421 Ops/s $\textbf{\color{#d91a1a}-5.96\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6496ms 1.3374ms 747.7359 Ops/s 838.9105 Ops/s $\textbf{\color{#d91a1a}-10.87\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1328ms 5.9168ms 169.0097 Ops/s 166.2147 Ops/s $\color{#35bf28}+1.68\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.3411ms 0.4389ms 2.2783 KOps/s 2.1761 KOps/s $\color{#35bf28}+4.69\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7064ms 0.4557ms 2.1946 KOps/s 2.3530 KOps/s $\textbf{\color{#d91a1a}-6.73\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1627ms 5.8145ms 171.9838 Ops/s 168.8545 Ops/s $\color{#35bf28}+1.85\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.7909ms 0.3413ms 2.9295 KOps/s 2.6435 KOps/s $\textbf{\color{#35bf28}+10.82\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5329ms 0.3312ms 3.0196 KOps/s 2.9405 KOps/s $\color{#35bf28}+2.69\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0772ms 5.7379ms 174.2797 Ops/s 172.4328 Ops/s $\color{#35bf28}+1.07\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.3460ms 0.3606ms 2.7732 KOps/s 2.7791 KOps/s $\color{#d91a1a}-0.21\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5748ms 0.3538ms 2.8264 KOps/s 2.8386 KOps/s $\color{#d91a1a}-0.43\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0554ms 5.9358ms 168.4689 Ops/s 167.4980 Ops/s $\color{#35bf28}+0.58\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9647ms 0.4482ms 2.2313 KOps/s 1.9381 KOps/s $\textbf{\color{#35bf28}+15.13\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7564ms 0.5227ms 1.9132 KOps/s 2.3848 KOps/s $\textbf{\color{#d91a1a}-19.78\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5628s 16.1470ms 61.9311 Ops/s 57.6606 Ops/s $\textbf{\color{#35bf28}+7.41\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 5.6545ms 1.8488ms 540.9055 Ops/s 504.2255 Ops/s $\textbf{\color{#35bf28}+7.27\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.2047ms 1.1382ms 878.5435 Ops/s 765.8354 Ops/s $\textbf{\color{#35bf28}+14.72\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 9.9545ms 5.1344ms 194.7631 Ops/s 198.1218 Ops/s $\color{#d91a1a}-1.70\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.0129ms 1.7938ms 557.4717 Ops/s 518.1793 Ops/s $\textbf{\color{#35bf28}+7.58\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 12.5281ms 1.3628ms 733.7590 Ops/s 1.1309 KOps/s $\textbf{\color{#d91a1a}-35.12\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.9333ms 5.2268ms 191.3215 Ops/s 57.6663 Ops/s $\textbf{\color{#35bf28}+231.77\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.3129ms 2.1125ms 473.3763 Ops/s 486.9372 Ops/s $\color{#d91a1a}-2.78\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.4316ms 1.0584ms 944.8291 Ops/s 923.9256 Ops/s $\color{#35bf28}+2.26\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.2075ms 35.9700ms 27.8010 Ops/s 27.1575 Ops/s $\color{#35bf28}+2.37\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.1121ms 18.5010ms 54.0511 Ops/s 54.6782 Ops/s $\color{#d91a1a}-1.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.1498ms 37.5125ms 26.6578 Ops/s 26.5363 Ops/s $\color{#35bf28}+0.46\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.6738ms 18.8711ms 52.9912 Ops/s 52.7265 Ops/s $\color{#35bf28}+0.50\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.3302ms 38.9220ms 25.6924 Ops/s 25.1563 Ops/s $\color{#35bf28}+2.13\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.2929ms 19.7744ms 50.5705 Ops/s 49.1580 Ops/s $\color{#35bf28}+2.87\%$

[ghstack-poisoned]
[ghstack-poisoned]
@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 148. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 80.1278μs 79.0420μs 12.6515 KOps/s 12.7606 KOps/s $\color{#d91a1a}-0.85\%$
test_tensor_to_bytestream_speed[torch.save] 0.1356ms 0.1351ms 7.4002 KOps/s 7.2242 KOps/s $\color{#35bf28}+2.44\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1050s 0.1045s 9.5671 Ops/s 9.2490 Ops/s $\color{#35bf28}+3.44\%$
test_tensor_to_bytestream_speed[numpy] 2.4787μs 2.4724μs 404.4638 KOps/s 379.8849 KOps/s $\textbf{\color{#35bf28}+6.47\%}$
test_tensor_to_bytestream_speed[safetensors] 37.4129μs 36.7738μs 27.1933 KOps/s 25.1173 KOps/s $\textbf{\color{#35bf28}+8.27\%}$
test_simple 0.8926s 0.8026s 1.2459 Ops/s 1.2262 Ops/s $\color{#35bf28}+1.61\%$
test_transformed 1.5147s 1.4210s 0.7037 Ops/s 0.6899 Ops/s $\color{#35bf28}+2.00\%$
test_serial 2.3633s 2.2680s 0.4409 Ops/s 0.4263 Ops/s $\color{#35bf28}+3.43\%$
test_parallel 1.9954s 1.9144s 0.5224 Ops/s 0.5018 Ops/s $\color{#35bf28}+4.09\%$
test_step_mdp_speed[True-True-True-True-True] 0.2311ms 44.0644μs 22.6940 KOps/s 23.3160 KOps/s $\color{#d91a1a}-2.67\%$
test_step_mdp_speed[True-True-True-True-False] 82.4510μs 24.1964μs 41.3284 KOps/s 40.9657 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[True-True-True-False-True] 72.3610μs 24.3313μs 41.0993 KOps/s 41.0824 KOps/s $\color{#35bf28}+0.04\%$
test_step_mdp_speed[True-True-True-False-False] 40.2010μs 13.4309μs 74.4552 KOps/s 74.2379 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-True-False-True-True] 88.3520μs 46.2861μs 21.6048 KOps/s 21.7082 KOps/s $\color{#d91a1a}-0.48\%$
test_step_mdp_speed[True-True-False-True-False] 75.1010μs 26.8308μs 37.2706 KOps/s 37.2827 KOps/s $\color{#d91a1a}-0.03\%$
test_step_mdp_speed[True-True-False-False-True] 85.5020μs 27.0297μs 36.9963 KOps/s 37.0713 KOps/s $\color{#d91a1a}-0.20\%$
test_step_mdp_speed[True-True-False-False-False] 61.2210μs 16.0111μs 62.4565 KOps/s 62.1622 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[True-False-True-True-True] 0.1257ms 48.6750μs 20.5444 KOps/s 20.5545 KOps/s $\color{#d91a1a}-0.05\%$
test_step_mdp_speed[True-False-True-True-False] 89.6020μs 30.0204μs 33.3107 KOps/s 34.5661 KOps/s $\color{#d91a1a}-3.63\%$
test_step_mdp_speed[True-False-True-False-True] 63.1210μs 26.7490μs 37.3846 KOps/s 37.2711 KOps/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[True-False-True-False-False] 52.8310μs 16.2070μs 61.7017 KOps/s 61.8229 KOps/s $\color{#d91a1a}-0.20\%$
test_step_mdp_speed[True-False-False-True-True] 0.1016ms 51.1525μs 19.5494 KOps/s 19.5302 KOps/s $\color{#35bf28}+0.10\%$
test_step_mdp_speed[True-False-False-True-False] 75.0910μs 32.1915μs 31.0641 KOps/s 31.1554 KOps/s $\color{#d91a1a}-0.29\%$
test_step_mdp_speed[True-False-False-False-True] 67.4410μs 29.1214μs 34.3391 KOps/s 34.0117 KOps/s $\color{#35bf28}+0.96\%$
test_step_mdp_speed[True-False-False-False-False] 96.4820μs 18.8389μs 53.0816 KOps/s 53.6103 KOps/s $\color{#d91a1a}-0.99\%$
test_step_mdp_speed[False-True-True-True-True] 96.1320μs 48.6257μs 20.5653 KOps/s 20.4784 KOps/s $\color{#35bf28}+0.42\%$
test_step_mdp_speed[False-True-True-True-False] 95.3920μs 29.4784μs 33.9232 KOps/s 34.1150 KOps/s $\color{#d91a1a}-0.56\%$
test_step_mdp_speed[False-True-True-False-True] 91.2020μs 29.9185μs 33.4242 KOps/s 32.9420 KOps/s $\color{#35bf28}+1.46\%$
test_step_mdp_speed[False-True-True-False-False] 68.0610μs 17.4215μs 57.4005 KOps/s 56.2086 KOps/s $\color{#35bf28}+2.12\%$
test_step_mdp_speed[False-True-False-True-True] 2.8303ms 51.1628μs 19.5455 KOps/s 19.6621 KOps/s $\color{#d91a1a}-0.59\%$
test_step_mdp_speed[False-True-False-True-False] 83.5010μs 32.4802μs 30.7880 KOps/s 30.8796 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[False-True-False-False-True] 75.7110μs 32.4497μs 30.8169 KOps/s 30.3731 KOps/s $\color{#35bf28}+1.46\%$
test_step_mdp_speed[False-True-False-False-False] 90.4620μs 19.8760μs 50.3119 KOps/s 49.0296 KOps/s $\color{#35bf28}+2.62\%$
test_step_mdp_speed[False-False-True-True-True] 88.8210μs 53.6108μs 18.6530 KOps/s 18.5503 KOps/s $\color{#35bf28}+0.55\%$
test_step_mdp_speed[False-False-True-True-False] 74.3720μs 34.4403μs 29.0358 KOps/s 28.4819 KOps/s $\color{#35bf28}+1.94\%$
test_step_mdp_speed[False-False-True-False-True] 0.1166ms 32.8718μs 30.4212 KOps/s 30.6237 KOps/s $\color{#d91a1a}-0.66\%$
test_step_mdp_speed[False-False-True-False-False] 53.7510μs 20.1163μs 49.7109 KOps/s 49.0673 KOps/s $\color{#35bf28}+1.31\%$
test_step_mdp_speed[False-False-False-True-True] 0.1112ms 54.6697μs 18.2917 KOps/s 17.6058 KOps/s $\color{#35bf28}+3.90\%$
test_step_mdp_speed[False-False-False-True-False] 72.5910μs 36.9072μs 27.0949 KOps/s 26.7997 KOps/s $\color{#35bf28}+1.10\%$
test_step_mdp_speed[False-False-False-False-True] 66.0310μs 34.5497μs 28.9438 KOps/s 28.3437 KOps/s $\color{#35bf28}+2.12\%$
test_step_mdp_speed[False-False-False-False-False] 57.3110μs 22.5946μs 44.2584 KOps/s 43.8223 KOps/s $\color{#35bf28}+1.00\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7450s 0.7388s 1.3535 Ops/s 1.3167 Ops/s $\color{#35bf28}+2.80\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7103s 0.6159s 1.6236 Ops/s 1.6123 Ops/s $\color{#35bf28}+0.70\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7171s 1.6387s 0.6102 Ops/s 0.6052 Ops/s $\color{#35bf28}+0.84\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4968s 1.4145s 0.7070 Ops/s 0.7017 Ops/s $\color{#35bf28}+0.75\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0001s 1.9014s 0.5259 Ops/s 0.5252 Ops/s $\color{#35bf28}+0.13\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7500s 1.6665s 0.6001 Ops/s 0.5852 Ops/s $\color{#35bf28}+2.53\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6829s 4.5863s 0.2180 Ops/s 0.2186 Ops/s $\color{#d91a1a}-0.26\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4990s 4.4330s 0.2256 Ops/s 0.2262 Ops/s $\color{#d91a1a}-0.26\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0489s 1.9378s 0.5161 Ops/s 0.5082 Ops/s $\color{#35bf28}+1.55\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7340s 1.6307s 0.6132 Ops/s 0.6004 Ops/s $\color{#35bf28}+2.14\%$
test_values[generalized_advantage_estimate-True-True] 21.6795ms 20.7175ms 48.2684 Ops/s 48.8252 Ops/s $\color{#d91a1a}-1.14\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1325s 3.5739ms 279.8085 Ops/s 252.6900 Ops/s $\textbf{\color{#35bf28}+10.73\%}$
test_values[td0_return_estimate-False-False] 0.1069ms 82.2230μs 12.1620 KOps/s 12.0311 KOps/s $\color{#35bf28}+1.09\%$
test_values[td1_return_estimate-False-False] 50.8352ms 49.3685ms 20.2558 Ops/s 20.3299 Ops/s $\color{#d91a1a}-0.36\%$
test_values[vec_td1_return_estimate-False-False] 1.3848ms 1.0833ms 923.0805 Ops/s 913.5529 Ops/s $\color{#35bf28}+1.04\%$
test_values[td_lambda_return_estimate-True-False] 83.3025ms 80.8968ms 12.3614 Ops/s 12.4424 Ops/s $\color{#d91a1a}-0.65\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3712ms 1.0831ms 923.2998 Ops/s 913.1765 Ops/s $\color{#35bf28}+1.11\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.0423ms 21.6496ms 46.1902 Ops/s 48.4470 Ops/s $\color{#d91a1a}-4.66\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0206ms 0.7493ms 1.3345 KOps/s 1.3114 KOps/s $\color{#35bf28}+1.77\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7839ms 0.7067ms 1.4150 KOps/s 1.4069 KOps/s $\color{#35bf28}+0.58\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5554ms 1.4869ms 672.5332 Ops/s 667.6267 Ops/s $\color{#35bf28}+0.73\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7601ms 0.6995ms 1.4295 KOps/s 1.4291 KOps/s $\color{#35bf28}+0.03\%$
test_dqn_speed[False-None] 1.6422ms 1.5085ms 662.9294 Ops/s 659.0387 Ops/s $\color{#35bf28}+0.59\%$
test_dqn_speed[False-backward] 2.2904ms 2.1797ms 458.7863 Ops/s 459.0448 Ops/s $\color{#d91a1a}-0.06\%$
test_dqn_speed[True-None] 0.6451ms 0.5508ms 1.8154 KOps/s 1.8062 KOps/s $\color{#35bf28}+0.51\%$
test_dqn_speed[True-backward] 1.2158ms 1.1667ms 857.1076 Ops/s 929.0377 Ops/s $\textbf{\color{#d91a1a}-7.74\%}$
test_dqn_speed[reduce-overhead-None] 0.7612ms 0.5726ms 1.7464 KOps/s 1.7121 KOps/s $\color{#35bf28}+2.00\%$
test_ddpg_speed[False-None] 3.3254ms 2.9197ms 342.5045 Ops/s 349.1023 Ops/s $\color{#d91a1a}-1.89\%$
test_ddpg_speed[False-backward] 4.8038ms 4.2881ms 233.2050 Ops/s 239.5352 Ops/s $\color{#d91a1a}-2.64\%$
test_ddpg_speed[True-None] 1.4073ms 1.2863ms 777.4252 Ops/s 779.2128 Ops/s $\color{#d91a1a}-0.23\%$
test_ddpg_speed[True-backward] 2.4602ms 2.4082ms 415.2469 Ops/s 424.6631 Ops/s $\color{#d91a1a}-2.22\%$
test_ddpg_speed[reduce-overhead-None] 1.3955ms 1.2895ms 775.5027 Ops/s 761.8795 Ops/s $\color{#35bf28}+1.79\%$
test_sac_speed[False-None] 8.7830ms 8.2536ms 121.1593 Ops/s 119.4459 Ops/s $\color{#35bf28}+1.43\%$
test_sac_speed[False-backward] 12.1060ms 11.5878ms 86.2975 Ops/s 87.4550 Ops/s $\color{#d91a1a}-1.32\%$
test_sac_speed[True-None] 1.7966ms 1.7450ms 573.0579 Ops/s 559.0119 Ops/s $\color{#35bf28}+2.51\%$
test_sac_speed[True-backward] 3.5944ms 3.4850ms 286.9470 Ops/s 291.8155 Ops/s $\color{#d91a1a}-1.67\%$
test_sac_speed[reduce-overhead-None] 18.5475ms 10.4762ms 95.4544 Ops/s 95.1910 Ops/s $\color{#35bf28}+0.28\%$
test_redq_deprec_speed[False-None] 9.8800ms 9.2123ms 108.5504 Ops/s 106.4030 Ops/s $\color{#35bf28}+2.02\%$
test_redq_deprec_speed[False-backward] 13.0574ms 12.6013ms 79.3571 Ops/s 80.5474 Ops/s $\color{#d91a1a}-1.48\%$
test_redq_deprec_speed[True-None] 2.5780ms 2.4549ms 407.3436 Ops/s 400.2127 Ops/s $\color{#35bf28}+1.78\%$
test_redq_deprec_speed[True-backward] 4.5856ms 4.1920ms 238.5470 Ops/s 232.2969 Ops/s $\color{#35bf28}+2.69\%$
test_redq_deprec_speed[reduce-overhead-None] 15.0808ms 9.3443ms 107.0173 Ops/s 90.1002 Ops/s $\textbf{\color{#35bf28}+18.78\%}$
test_td3_speed[False-None] 8.3845ms 8.0848ms 123.6885 Ops/s 121.8208 Ops/s $\color{#35bf28}+1.53\%$
test_td3_speed[False-backward] 11.4486ms 10.7506ms 93.0178 Ops/s 91.4022 Ops/s $\color{#35bf28}+1.77\%$
test_td3_speed[True-None] 1.7440ms 1.6088ms 621.5846 Ops/s 626.1117 Ops/s $\color{#d91a1a}-0.72\%$
test_td3_speed[True-backward] 3.2008ms 3.1486ms 317.6019 Ops/s 310.9501 Ops/s $\color{#35bf28}+2.14\%$
test_td3_speed[reduce-overhead-None] 66.0130ms 23.0236ms 43.4336 Ops/s 43.5159 Ops/s $\color{#d91a1a}-0.19\%$
test_cql_speed[False-None] 17.4840ms 16.9932ms 58.8469 Ops/s 57.8452 Ops/s $\color{#35bf28}+1.73\%$
test_cql_speed[False-backward] 23.2625ms 22.6308ms 44.1877 Ops/s 43.5945 Ops/s $\color{#35bf28}+1.36\%$
test_cql_speed[True-None] 3.4481ms 3.1509ms 317.3732 Ops/s 303.1951 Ops/s $\color{#35bf28}+4.68\%$
test_cql_speed[True-backward] 5.5302ms 5.1537ms 194.0351 Ops/s 189.4039 Ops/s $\color{#35bf28}+2.45\%$
test_cql_speed[reduce-overhead-None] 18.6360ms 11.6770ms 85.6381 Ops/s 87.9122 Ops/s $\color{#d91a1a}-2.59\%$
test_a2c_speed[False-None] 4.4244ms 3.1889ms 313.5926 Ops/s 310.1032 Ops/s $\color{#35bf28}+1.13\%$
test_a2c_speed[False-backward] 6.5878ms 6.1360ms 162.9732 Ops/s 161.8368 Ops/s $\color{#35bf28}+0.70\%$
test_a2c_speed[True-None] 1.4035ms 1.2993ms 769.6746 Ops/s 763.0380 Ops/s $\color{#35bf28}+0.87\%$
test_a2c_speed[True-backward] 2.9385ms 2.8820ms 346.9870 Ops/s 323.4567 Ops/s $\textbf{\color{#35bf28}+7.27\%}$
test_a2c_speed[reduce-overhead-None] 1.0588ms 0.9498ms 1.0529 KOps/s 1.0463 KOps/s $\color{#35bf28}+0.63\%$
test_ppo_speed[False-None] 3.8575ms 3.7640ms 265.6757 Ops/s 260.7287 Ops/s $\color{#35bf28}+1.90\%$
test_ppo_speed[False-backward] 7.3070ms 6.8752ms 145.4494 Ops/s 139.1523 Ops/s $\color{#35bf28}+4.53\%$
test_ppo_speed[True-None] 1.5128ms 1.3846ms 722.2191 Ops/s 719.7250 Ops/s $\color{#35bf28}+0.35\%$
test_ppo_speed[True-backward] 3.0141ms 2.9710ms 336.5846 Ops/s 311.6719 Ops/s $\textbf{\color{#35bf28}+7.99\%}$
test_ppo_speed[reduce-overhead-None] 1.1082ms 1.0128ms 987.3275 Ops/s 958.1402 Ops/s $\color{#35bf28}+3.05\%$
test_reinforce_speed[False-None] 2.3189ms 2.2261ms 449.2079 Ops/s 438.4340 Ops/s $\color{#35bf28}+2.46\%$
test_reinforce_speed[False-backward] 3.4838ms 3.3644ms 297.2273 Ops/s 290.6693 Ops/s $\color{#35bf28}+2.26\%$
test_reinforce_speed[True-None] 1.3054ms 1.2188ms 820.5084 Ops/s 801.2850 Ops/s $\color{#35bf28}+2.40\%$
test_reinforce_speed[True-backward] 3.0622ms 2.9764ms 335.9709 Ops/s 326.3568 Ops/s $\color{#35bf28}+2.95\%$
test_reinforce_speed[reduce-overhead-None] 16.5168ms 9.0986ms 109.9064 Ops/s 98.8032 Ops/s $\textbf{\color{#35bf28}+11.24\%}$
test_iql_speed[False-None] 10.1534ms 9.3034ms 107.4877 Ops/s 105.0786 Ops/s $\color{#35bf28}+2.29\%$
test_iql_speed[False-backward] 13.5816ms 13.1979ms 75.7697 Ops/s 73.3951 Ops/s $\color{#35bf28}+3.24\%$
test_iql_speed[True-None] 2.1553ms 2.0916ms 478.0999 Ops/s 465.0218 Ops/s $\color{#35bf28}+2.81\%$
test_iql_speed[True-backward] 4.8438ms 4.6529ms 214.9190 Ops/s 205.0133 Ops/s $\color{#35bf28}+4.83\%$
test_iql_speed[reduce-overhead-None] 17.2464ms 10.0536ms 99.4666 Ops/s 77.6089 Ops/s $\textbf{\color{#35bf28}+28.16\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8553ms 5.7597ms 173.6198 Ops/s 169.1829 Ops/s $\color{#35bf28}+2.62\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.1654ms 0.2934ms 3.4080 KOps/s 2.8584 KOps/s $\textbf{\color{#35bf28}+19.23\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4848ms 0.2707ms 3.6944 KOps/s 2.8912 KOps/s $\textbf{\color{#35bf28}+27.78\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9061ms 5.5768ms 179.3138 Ops/s 174.7076 Ops/s $\color{#35bf28}+2.64\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.3998ms 0.3581ms 2.7927 KOps/s 2.6354 KOps/s $\textbf{\color{#35bf28}+5.97\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5906ms 0.3216ms 3.1099 KOps/s 2.9281 KOps/s $\textbf{\color{#35bf28}+6.21\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6395ms 1.4073ms 710.5611 Ops/s 683.0729 Ops/s $\color{#35bf28}+4.02\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5463ms 1.3335ms 749.9154 Ops/s 720.8739 Ops/s $\color{#35bf28}+4.03\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.8504ms 5.7454ms 174.0537 Ops/s 168.0441 Ops/s $\color{#35bf28}+3.58\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.5740ms 0.5012ms 1.9951 KOps/s 1.9716 KOps/s $\color{#35bf28}+1.19\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7072ms 0.4879ms 2.0496 KOps/s 2.1193 KOps/s $\color{#d91a1a}-3.29\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.7035ms 5.6034ms 178.4630 Ops/s 172.2745 Ops/s $\color{#35bf28}+3.59\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0409ms 0.2847ms 3.5120 KOps/s 3.0147 KOps/s $\textbf{\color{#35bf28}+16.50\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4933ms 0.2751ms 3.6346 KOps/s 3.0702 KOps/s $\textbf{\color{#35bf28}+18.38\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.7465ms 5.5583ms 179.9102 Ops/s 174.5635 Ops/s $\color{#35bf28}+3.06\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.4965ms 0.3817ms 2.6197 KOps/s 2.7384 KOps/s $\color{#d91a1a}-4.33\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6141ms 0.3603ms 2.7755 KOps/s 3.6066 KOps/s $\textbf{\color{#d91a1a}-23.05\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9138ms 5.7910ms 172.6830 Ops/s 168.8166 Ops/s $\color{#35bf28}+2.29\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2614ms 0.5078ms 1.9693 KOps/s 2.1079 KOps/s $\textbf{\color{#d91a1a}-6.58\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6982ms 0.4552ms 2.1969 KOps/s 2.2952 KOps/s $\color{#d91a1a}-4.29\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3596ms 4.9231ms 203.1254 Ops/s 49.4127 Ops/s $\textbf{\color{#35bf28}+311.08\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.1411ms 2.0614ms 485.1188 Ops/s 501.5907 Ops/s $\color{#d91a1a}-3.28\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.8071ms 0.9742ms 1.0265 KOps/s 758.7839 Ops/s $\textbf{\color{#35bf28}+35.28\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5852s 16.8827ms 59.2323 Ops/s 196.4295 Ops/s $\textbf{\color{#d91a1a}-69.85\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 9.3568ms 1.9229ms 520.0599 Ops/s 559.5848 Ops/s $\textbf{\color{#d91a1a}-7.06\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 3.4300ms 0.9860ms 1.0142 KOps/s 1.0255 KOps/s $\color{#d91a1a}-1.09\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.1279ms 5.2500ms 190.4763 Ops/s 188.8554 Ops/s $\color{#35bf28}+0.86\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.2097ms 1.9681ms 508.1086 Ops/s 478.5310 Ops/s $\textbf{\color{#35bf28}+6.18\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.7798ms 1.1417ms 875.8786 Ops/s 864.8474 Ops/s $\color{#35bf28}+1.28\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.6034ms 35.2246ms 28.3893 Ops/s 27.7931 Ops/s $\color{#35bf28}+2.14\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.4610ms 17.9869ms 55.5959 Ops/s 52.0742 Ops/s $\textbf{\color{#35bf28}+6.76\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.6418ms 37.4220ms 26.7223 Ops/s 26.5656 Ops/s $\color{#35bf28}+0.59\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.0898ms 18.2786ms 54.7088 Ops/s 53.4948 Ops/s $\color{#35bf28}+2.27\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.6585ms 38.3734ms 26.0598 Ops/s 25.5406 Ops/s $\color{#35bf28}+2.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.1403ms 19.7451ms 50.6454 Ops/s 49.0256 Ops/s $\color{#35bf28}+3.30\%$

vmoens added 11 commits February 2, 2026 11:11
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
vmoens added 14 commits February 3, 2026 08:29
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add llm-vllm, llm-sglang, llm-all extras for backend selection
- Base llm extra no longer includes inference backend
- Update sglang_nccl.py to use SGLang's native NCCL utilities
- Remove vLLM dependency from SGLang weight sync code

Users can now:
- pip install torchrl[llm-vllm] for vLLM backend
- pip install torchrl[llm-sglang] for SGLang backend
- pip install torchrl[llm-all] for both backends

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 5975a7d
Pull-Request: #3436
@vmoens vmoens merged commit 7131f5b into gh/vmoens/216/base Feb 3, 2026
115 of 117 checks passed
@vmoens vmoens deleted the gh/vmoens/216/head branch February 3, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Feature New feature llm/ LLM-related PR, triggers LLM CI tests WeightUpdate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant