Skip to content

[Feature] Auto-batching inference server: Ray transport#3495

Merged
vmoens merged 5 commits intogh/vmoens/237/basefrom
gh/vmoens/237/head
Feb 21, 2026
Merged

[Feature] Auto-batching inference server: Ray transport#3495
vmoens merged 5 commits intogh/vmoens/237/basefrom
gh/vmoens/237/head

Conversation

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3495

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit c30bf0e with merge base 266e4aa (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 11, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.0553μs 80.9858μs 12.3478 KOps/s 12.4285 KOps/s $\color{#d91a1a}-0.65\%$
test_tensor_to_bytestream_speed[torch.save] 0.1400ms 0.1393ms 7.1769 KOps/s 7.1654 KOps/s $\color{#35bf28}+0.16\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1119s 0.1118s 8.9460 Ops/s 9.5998 Ops/s $\textbf{\color{#d91a1a}-6.81\%}$
test_tensor_to_bytestream_speed[numpy] 2.5894μs 2.5779μs 387.9185 KOps/s 397.3414 KOps/s $\color{#d91a1a}-2.37\%$
test_tensor_to_bytestream_speed[safetensors] 36.2466μs 35.9183μs 27.8410 KOps/s 27.4408 KOps/s $\color{#35bf28}+1.46\%$
test_simple 0.5369s 0.5367s 1.8632 Ops/s 1.7560 Ops/s $\textbf{\color{#35bf28}+6.11\%}$
test_transformed 1.0708s 1.0696s 0.9349 Ops/s 0.9113 Ops/s $\color{#35bf28}+2.59\%$
test_serial 1.6478s 1.6431s 0.6086 Ops/s 0.5982 Ops/s $\color{#35bf28}+1.74\%$
test_parallel 1.0145s 1.0085s 0.9916 Ops/s 0.9704 Ops/s $\color{#35bf28}+2.18\%$
test_step_mdp_speed[True-True-True-True-True] 0.2045ms 41.7662μs 23.9428 KOps/s 24.4025 KOps/s $\color{#d91a1a}-1.88\%$
test_step_mdp_speed[True-True-True-True-False] 55.6310μs 23.4358μs 42.6697 KOps/s 42.5982 KOps/s $\color{#35bf28}+0.17\%$
test_step_mdp_speed[True-True-True-False-True] 52.4100μs 23.6573μs 42.2702 KOps/s 41.6061 KOps/s $\color{#35bf28}+1.60\%$
test_step_mdp_speed[True-True-True-False-False] 45.7610μs 12.8964μs 77.5411 KOps/s 77.0872 KOps/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[True-True-False-True-True] 92.2710μs 45.6646μs 21.8988 KOps/s 22.1850 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[True-True-False-True-False] 68.6710μs 26.0510μs 38.3862 KOps/s 37.9825 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[True-True-False-False-True] 72.6610μs 26.1438μs 38.2500 KOps/s 38.1496 KOps/s $\color{#35bf28}+0.26\%$
test_step_mdp_speed[True-True-False-False-False] 51.8200μs 15.6360μs 63.9550 KOps/s 63.2995 KOps/s $\color{#35bf28}+1.04\%$
test_step_mdp_speed[True-False-True-True-True] 0.1228ms 48.2743μs 20.7149 KOps/s 20.8048 KOps/s $\color{#d91a1a}-0.43\%$
test_step_mdp_speed[True-False-True-True-False] 62.1000μs 29.1160μs 34.3454 KOps/s 34.6494 KOps/s $\color{#d91a1a}-0.88\%$
test_step_mdp_speed[True-False-True-False-True] 61.5600μs 26.5242μs 37.7014 KOps/s 37.8443 KOps/s $\color{#d91a1a}-0.38\%$
test_step_mdp_speed[True-False-True-False-False] 42.5810μs 15.6507μs 63.8947 KOps/s 63.3456 KOps/s $\color{#35bf28}+0.87\%$
test_step_mdp_speed[True-False-False-True-True] 0.1069ms 49.4420μs 20.2257 KOps/s 19.9321 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[True-False-False-True-False] 63.9510μs 31.3895μs 31.8578 KOps/s 32.5586 KOps/s $\color{#d91a1a}-2.15\%$
test_step_mdp_speed[True-False-False-False-True] 61.5300μs 29.3003μs 34.1293 KOps/s 34.5753 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[True-False-False-False-False] 43.5110μs 18.2027μs 54.9368 KOps/s 54.0926 KOps/s $\color{#35bf28}+1.56\%$
test_step_mdp_speed[False-True-True-True-True] 72.9310μs 47.4017μs 21.0963 KOps/s 20.7842 KOps/s $\color{#35bf28}+1.50\%$
test_step_mdp_speed[False-True-True-True-False] 62.3900μs 28.6393μs 34.9170 KOps/s 34.8437 KOps/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[False-True-True-False-True] 2.4566ms 30.5988μs 32.6811 KOps/s 32.7820 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[False-True-True-False-False] 49.6510μs 17.2314μs 58.0334 KOps/s 56.6741 KOps/s $\color{#35bf28}+2.40\%$
test_step_mdp_speed[False-True-False-True-True] 0.1399ms 48.0149μs 20.8269 KOps/s 20.2069 KOps/s $\color{#35bf28}+3.07\%$
test_step_mdp_speed[False-True-False-True-False] 0.4435ms 31.0756μs 32.1796 KOps/s 31.6463 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[False-True-False-False-True] 0.4573ms 33.2152μs 30.1067 KOps/s 30.3694 KOps/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[False-True-False-False-False] 42.2000μs 19.6979μs 50.7669 KOps/s 49.9801 KOps/s $\color{#35bf28}+1.57\%$
test_step_mdp_speed[False-False-True-True-True] 0.4757ms 52.6590μs 18.9901 KOps/s 18.7542 KOps/s $\color{#35bf28}+1.26\%$
test_step_mdp_speed[False-False-True-True-False] 0.4441ms 33.9450μs 29.4594 KOps/s 29.3698 KOps/s $\color{#35bf28}+0.31\%$
test_step_mdp_speed[False-False-True-False-True] 0.4464ms 32.6498μs 30.6281 KOps/s 30.5184 KOps/s $\color{#35bf28}+0.36\%$
test_step_mdp_speed[False-False-True-False-False] 0.4358ms 20.0491μs 49.8775 KOps/s 50.2130 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[False-False-False-True-True] 95.7620μs 54.9058μs 18.2130 KOps/s 18.3440 KOps/s $\color{#d91a1a}-0.71\%$
test_step_mdp_speed[False-False-False-True-False] 0.4526ms 36.4828μs 27.4102 KOps/s 27.2920 KOps/s $\color{#35bf28}+0.43\%$
test_step_mdp_speed[False-False-False-False-True] 0.4501ms 34.9661μs 28.5991 KOps/s 28.6810 KOps/s $\color{#d91a1a}-0.29\%$
test_step_mdp_speed[False-False-False-False-False] 0.4338ms 22.2162μs 45.0122 KOps/s 45.5700 KOps/s $\color{#d91a1a}-1.22\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8368s 0.7367s 1.3574 Ops/s 1.3588 Ops/s $\color{#d91a1a}-0.10\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7007s 0.6040s 1.6557 Ops/s 1.6544 Ops/s $\color{#35bf28}+0.08\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7024s 1.6234s 0.6160 Ops/s 0.6121 Ops/s $\color{#35bf28}+0.64\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4845s 1.4042s 0.7122 Ops/s 0.7108 Ops/s $\color{#35bf28}+0.19\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9495s 1.8695s 0.5349 Ops/s 0.5327 Ops/s $\color{#35bf28}+0.41\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7264s 1.6431s 0.6086 Ops/s 0.6032 Ops/s $\color{#35bf28}+0.90\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7286s 4.6508s 0.2150 Ops/s 0.2171 Ops/s $\color{#d91a1a}-0.94\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5991s 4.4179s 0.2264 Ops/s 0.2287 Ops/s $\color{#d91a1a}-1.03\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9245s 1.8533s 0.5396 Ops/s 0.5356 Ops/s $\color{#35bf28}+0.74\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7019s 1.5841s 0.6313 Ops/s 0.6372 Ops/s $\color{#d91a1a}-0.93\%$
test_values[generalized_advantage_estimate-True-True] 10.4365ms 10.1234ms 98.7806 Ops/s 100.1945 Ops/s $\color{#d91a1a}-1.41\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.4620ms 17.7686ms 56.2789 Ops/s 56.8257 Ops/s $\color{#d91a1a}-0.96\%$
test_values[td0_return_estimate-False-False] 0.2262ms 0.1239ms 8.0739 KOps/s 8.3645 KOps/s $\color{#d91a1a}-3.47\%$
test_values[td1_return_estimate-False-False] 27.2401ms 26.7247ms 37.4186 Ops/s 37.8561 Ops/s $\color{#d91a1a}-1.16\%$
test_values[vec_td1_return_estimate-False-False] 18.6010ms 17.8662ms 55.9715 Ops/s 56.7875 Ops/s $\color{#d91a1a}-1.44\%$
test_values[td_lambda_return_estimate-True-False] 40.6269ms 39.2558ms 25.4740 Ops/s 25.6638 Ops/s $\color{#d91a1a}-0.74\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.0948ms 17.7897ms 56.2123 Ops/s 56.7673 Ops/s $\color{#d91a1a}-0.98\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.0349ms 8.9666ms 111.5254 Ops/s 114.2635 Ops/s $\color{#d91a1a}-2.40\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.9765ms 1.4933ms 669.6567 Ops/s 649.4592 Ops/s $\color{#35bf28}+3.11\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4391ms 0.4051ms 2.4683 KOps/s 2.4818 KOps/s $\color{#d91a1a}-0.54\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.1963ms 33.2537ms 30.0719 Ops/s 28.6363 Ops/s $\textbf{\color{#35bf28}+5.01\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.0684ms 1.6918ms 591.0897 Ops/s 590.7333 Ops/s $\color{#35bf28}+0.06\%$
test_dqn_speed[False-None] 1.4551ms 1.3713ms 729.2358 Ops/s 729.2215 Ops/s $+0.00\%$
test_dqn_speed[False-backward] 1.9985ms 1.8745ms 533.4645 Ops/s 527.1310 Ops/s $\color{#35bf28}+1.20\%$
test_dqn_speed[True-None] 1.0136ms 0.5454ms 1.8335 KOps/s 1.8281 KOps/s $\color{#35bf28}+0.30\%$
test_dqn_speed[True-backward] 1.0527ms 0.9979ms 1.0022 KOps/s 821.3987 Ops/s $\textbf{\color{#35bf28}+22.01\%}$
test_dqn_speed[reduce-overhead-None] 0.9904ms 0.5387ms 1.8564 KOps/s 1.8146 KOps/s $\color{#35bf28}+2.31\%$
test_ddpg_speed[False-None] 3.2199ms 2.8158ms 355.1354 Ops/s 355.6271 Ops/s $\color{#d91a1a}-0.14\%$
test_ddpg_speed[False-backward] 4.5837ms 4.0735ms 245.4913 Ops/s 248.7786 Ops/s $\color{#d91a1a}-1.32\%$
test_ddpg_speed[True-None] 1.8693ms 1.3894ms 719.7358 Ops/s 702.4931 Ops/s $\color{#35bf28}+2.45\%$
test_ddpg_speed[True-backward] 2.4725ms 2.4007ms 416.5425 Ops/s 398.9998 Ops/s $\color{#35bf28}+4.40\%$
test_ddpg_speed[reduce-overhead-None] 1.4821ms 1.3774ms 726.0276 Ops/s 696.1962 Ops/s $\color{#35bf28}+4.28\%$
test_sac_speed[False-None] 8.3892ms 7.7938ms 128.3076 Ops/s 129.0186 Ops/s $\color{#d91a1a}-0.55\%$
test_sac_speed[False-backward] 15.2737ms 11.5482ms 86.5933 Ops/s 91.6316 Ops/s $\textbf{\color{#d91a1a}-5.50\%}$
test_sac_speed[True-None] 2.6056ms 2.1484ms 465.4727 Ops/s 448.7944 Ops/s $\color{#35bf28}+3.72\%$
test_sac_speed[True-backward] 4.1793ms 4.0477ms 247.0529 Ops/s 204.6669 Ops/s $\textbf{\color{#35bf28}+20.71\%}$
test_sac_speed[reduce-overhead-None] 2.6286ms 2.1423ms 466.7871 Ops/s 459.1872 Ops/s $\color{#35bf28}+1.66\%$
test_redq_speed[False-None] 10.8768ms 10.3848ms 96.2950 Ops/s 94.5918 Ops/s $\color{#35bf28}+1.80\%$
test_redq_speed[False-backward] 19.1976ms 18.1723ms 55.0288 Ops/s 55.7739 Ops/s $\color{#d91a1a}-1.34\%$
test_redq_speed[True-None] 4.6671ms 4.4031ms 227.1122 Ops/s 220.1325 Ops/s $\color{#35bf28}+3.17\%$
test_redq_speed[True-backward] 10.2510ms 9.9387ms 100.6171 Ops/s 101.2119 Ops/s $\color{#d91a1a}-0.59\%$
test_redq_speed[reduce-overhead-None] 4.8132ms 4.3416ms 230.3277 Ops/s 224.4852 Ops/s $\color{#35bf28}+2.60\%$
test_redq_deprec_speed[False-None] 11.5139ms 11.0001ms 90.9079 Ops/s 90.2915 Ops/s $\color{#35bf28}+0.68\%$
test_redq_deprec_speed[False-backward] 16.2353ms 15.8309ms 63.1675 Ops/s 62.3824 Ops/s $\color{#35bf28}+1.26\%$
test_redq_deprec_speed[True-None] 4.0949ms 3.6504ms 273.9443 Ops/s 266.5442 Ops/s $\color{#35bf28}+2.78\%$
test_redq_deprec_speed[True-backward] 8.0364ms 7.7244ms 129.4607 Ops/s 128.8810 Ops/s $\color{#35bf28}+0.45\%$
test_redq_deprec_speed[reduce-overhead-None] 4.2036ms 3.6030ms 277.5453 Ops/s 268.8421 Ops/s $\color{#35bf28}+3.24\%$
test_td3_speed[False-None] 8.2041ms 7.9239ms 126.2009 Ops/s 126.1137 Ops/s $\color{#35bf28}+0.07\%$
test_td3_speed[False-backward] 11.2070ms 10.7319ms 93.1798 Ops/s 93.1865 Ops/s $-0.01\%$
test_td3_speed[True-None] 2.0204ms 1.8280ms 547.0406 Ops/s 545.3305 Ops/s $\color{#35bf28}+0.31\%$
test_td3_speed[True-backward] 3.7395ms 3.6130ms 276.7801 Ops/s 244.3802 Ops/s $\textbf{\color{#35bf28}+13.26\%}$
test_td3_speed[reduce-overhead-None] 1.8390ms 1.7973ms 556.3861 Ops/s 551.8527 Ops/s $\color{#35bf28}+0.82\%$
test_cql_speed[False-None] 28.5065ms 26.0414ms 38.4005 Ops/s 37.1887 Ops/s $\color{#35bf28}+3.26\%$
test_cql_speed[False-backward] 38.8992ms 35.9985ms 27.7789 Ops/s 28.0821 Ops/s $\color{#d91a1a}-1.08\%$
test_cql_speed[True-None] 15.5620ms 12.7222ms 78.6030 Ops/s 80.1705 Ops/s $\color{#d91a1a}-1.96\%$
test_cql_speed[True-backward] 19.4570ms 18.8919ms 52.9327 Ops/s 53.7144 Ops/s $\color{#d91a1a}-1.46\%$
test_cql_speed[reduce-overhead-None] 12.9660ms 12.6454ms 79.0800 Ops/s 78.7723 Ops/s $\color{#35bf28}+0.39\%$
test_a2c_speed[False-None] 5.5404ms 5.3643ms 186.4192 Ops/s 185.7087 Ops/s $\color{#35bf28}+0.38\%$
test_a2c_speed[False-backward] 12.3985ms 11.9125ms 83.9454 Ops/s 83.3785 Ops/s $\color{#35bf28}+0.68\%$
test_a2c_speed[True-None] 3.9159ms 3.7677ms 265.4144 Ops/s 261.1995 Ops/s $\color{#35bf28}+1.61\%$
test_a2c_speed[True-backward] 9.2650ms 8.7617ms 114.1336 Ops/s 115.2311 Ops/s $\color{#d91a1a}-0.95\%$
test_a2c_speed[reduce-overhead-None] 3.9226ms 3.7272ms 268.2953 Ops/s 267.0842 Ops/s $\color{#35bf28}+0.45\%$
test_ppo_speed[False-None] 6.2316ms 5.9899ms 166.9472 Ops/s 170.7843 Ops/s $\color{#d91a1a}-2.25\%$
test_ppo_speed[False-backward] 13.1765ms 12.7047ms 78.7110 Ops/s 78.8836 Ops/s $\color{#d91a1a}-0.22\%$
test_ppo_speed[True-None] 4.4848ms 3.6511ms 273.8925 Ops/s 271.2770 Ops/s $\color{#35bf28}+0.96\%$
test_ppo_speed[True-backward] 8.9224ms 8.5612ms 116.8064 Ops/s 116.5803 Ops/s $\color{#35bf28}+0.19\%$
test_ppo_speed[reduce-overhead-None] 3.8598ms 3.6527ms 273.7672 Ops/s 273.0713 Ops/s $\color{#35bf28}+0.25\%$
test_reinforce_speed[False-None] 4.9377ms 4.6431ms 215.3737 Ops/s 222.5747 Ops/s $\color{#d91a1a}-3.24\%$
test_reinforce_speed[False-backward] 7.8374ms 7.4841ms 133.6161 Ops/s 138.0915 Ops/s $\color{#d91a1a}-3.24\%$
test_reinforce_speed[True-None] 3.4109ms 2.9139ms 343.1827 Ops/s 343.5337 Ops/s $\color{#d91a1a}-0.10\%$
test_reinforce_speed[True-backward] 8.1804ms 7.8991ms 126.5964 Ops/s 126.2722 Ops/s $\color{#35bf28}+0.26\%$
test_reinforce_speed[reduce-overhead-None] 3.2942ms 2.8773ms 347.5484 Ops/s 347.2635 Ops/s $\color{#35bf28}+0.08\%$
test_iql_speed[False-None] 25.4266ms 20.9194ms 47.8024 Ops/s 48.2163 Ops/s $\color{#d91a1a}-0.86\%$
test_iql_speed[False-backward] 35.9689ms 30.9509ms 32.3093 Ops/s 32.0576 Ops/s $\color{#35bf28}+0.79\%$
test_iql_speed[True-None] 8.9446ms 8.5664ms 116.7350 Ops/s 114.6799 Ops/s $\color{#35bf28}+1.79\%$
test_iql_speed[True-backward] 17.5769ms 17.0689ms 58.5861 Ops/s 58.5330 Ops/s $\color{#35bf28}+0.09\%$
test_iql_speed[reduce-overhead-None] 8.9498ms 8.6392ms 115.7509 Ops/s 115.9295 Ops/s $\color{#d91a1a}-0.15\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4917ms 6.0510ms 165.2625 Ops/s 166.1215 Ops/s $\color{#d91a1a}-0.52\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.7409ms 0.3109ms 3.2163 KOps/s 3.1200 KOps/s $\color{#35bf28}+3.09\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5478ms 0.2726ms 3.6686 KOps/s 3.2596 KOps/s $\textbf{\color{#35bf28}+12.55\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1372ms 5.8305ms 171.5108 Ops/s 171.5732 Ops/s $\color{#d91a1a}-0.04\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0728ms 0.3526ms 2.8363 KOps/s 2.4713 KOps/s $\textbf{\color{#35bf28}+14.77\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5512ms 0.3100ms 3.2254 KOps/s 2.7291 KOps/s $\textbf{\color{#35bf28}+18.18\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7139ms 1.2841ms 778.7722 Ops/s 710.4509 Ops/s $\textbf{\color{#35bf28}+9.62\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4199ms 1.1702ms 854.5693 Ops/s 746.4757 Ops/s $\textbf{\color{#35bf28}+14.48\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.4240ms 6.0873ms 164.2759 Ops/s 168.5771 Ops/s $\color{#d91a1a}-2.55\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.7026ms 0.4924ms 2.0310 KOps/s 2.0871 KOps/s $\color{#d91a1a}-2.69\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9176ms 0.4839ms 2.0664 KOps/s 2.1109 KOps/s $\color{#d91a1a}-2.11\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9826ms 5.8527ms 170.8627 Ops/s 172.1226 Ops/s $\color{#d91a1a}-0.73\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9834ms 0.3482ms 2.8722 KOps/s 3.2588 KOps/s $\textbf{\color{#d91a1a}-11.86\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5168ms 0.3041ms 3.2883 KOps/s 3.4452 KOps/s $\color{#d91a1a}-4.55\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1365ms 5.8163ms 171.9301 Ops/s 172.6644 Ops/s $\color{#d91a1a}-0.43\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7484ms 0.3414ms 2.9289 KOps/s 2.9071 KOps/s $\color{#35bf28}+0.75\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6163ms 0.3527ms 2.8351 KOps/s 2.9430 KOps/s $\color{#d91a1a}-3.67\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1175ms 5.9951ms 166.8039 Ops/s 168.1545 Ops/s $\color{#d91a1a}-0.80\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9761ms 0.4557ms 2.1945 KOps/s 2.1635 KOps/s $\color{#35bf28}+1.43\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6785ms 0.4378ms 2.2840 KOps/s 2.2663 KOps/s $\color{#35bf28}+0.78\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3414ms 5.0149ms 199.4058 Ops/s 199.8227 Ops/s $\color{#d91a1a}-0.21\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 11.4376ms 2.2207ms 450.3175 Ops/s 483.2606 Ops/s $\textbf{\color{#d91a1a}-6.82\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.3697ms 0.9330ms 1.0718 KOps/s 1.1273 KOps/s $\color{#d91a1a}-4.92\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5576s 16.1087ms 62.0782 Ops/s 59.6713 Ops/s $\color{#35bf28}+4.03\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.5067ms 1.8348ms 545.0071 Ops/s 539.2296 Ops/s $\color{#35bf28}+1.07\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.0518ms 1.0873ms 919.7387 Ops/s 937.8269 Ops/s $\color{#d91a1a}-1.93\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.0017ms 5.1951ms 192.4885 Ops/s 190.0678 Ops/s $\color{#35bf28}+1.27\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.0533ms 1.9249ms 519.4977 Ops/s 516.4839 Ops/s $\color{#35bf28}+0.58\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.2121ms 1.0086ms 991.4935 Ops/s 969.5450 Ops/s $\color{#35bf28}+2.26\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 39.4855ms 35.6962ms 28.0142 Ops/s 27.6473 Ops/s $\color{#35bf28}+1.33\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.3381ms 18.3664ms 54.4473 Ops/s 55.6055 Ops/s $\color{#d91a1a}-2.08\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.6549ms 36.7700ms 27.1961 Ops/s 26.8043 Ops/s $\color{#35bf28}+1.46\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.5323ms 18.0663ms 55.3516 Ops/s 54.4357 Ops/s $\color{#35bf28}+1.68\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.2504ms 38.2371ms 26.1526 Ops/s 25.5018 Ops/s $\color{#35bf28}+2.55\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.4186ms 19.6510ms 50.8881 Ops/s 51.3601 Ops/s $\color{#d91a1a}-0.92\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8874ms 0.2255ms 4.4348 KOps/s 4.6067 KOps/s $\color{#d91a1a}-3.73\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.9110ms 1.4542ms 687.6409 Ops/s 687.2748 Ops/s $\color{#35bf28}+0.05\%$
test_storage_write_lazystack[100-img_shape2-large_img] 3.0570ms 2.4265ms 412.1177 Ops/s 412.5393 Ops/s $\color{#d91a1a}-0.10\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.6960ms 3.0479ms 328.0954 Ops/s 323.5098 Ops/s $\color{#35bf28}+1.42\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2114ms 0.1374ms 7.2789 KOps/s 7.3063 KOps/s $\color{#d91a1a}-0.37\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3638ms 0.2083ms 4.8016 KOps/s 5.2949 KOps/s $\textbf{\color{#d91a1a}-9.32\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 2.2140ms 1.9129ms 522.7538 Ops/s 537.8718 Ops/s $\color{#d91a1a}-2.81\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6818ms 1.4114ms 708.5169 Ops/s 723.3182 Ops/s $\color{#d91a1a}-2.05\%$
test_collector_stack_then_write[50-img_shape0-small] 1.5952ms 1.1167ms 895.5158 Ops/s 897.4117 Ops/s $\color{#d91a1a}-0.21\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.6343ms 3.7179ms 268.9708 Ops/s 275.4175 Ops/s $\color{#d91a1a}-2.34\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.4509ms 5.9593ms 167.8046 Ops/s 168.7582 Ops/s $\color{#d91a1a}-0.57\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 12.6869ms 7.2433ms 138.0582 Ops/s 137.8967 Ops/s $\color{#35bf28}+0.12\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.7155ms 0.2778ms 3.5993 KOps/s 3.6953 KOps/s $\color{#d91a1a}-2.60\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 2.0459ms 1.5868ms 630.1858 Ops/s 635.3672 Ops/s $\color{#d91a1a}-0.82\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 3.0252ms 2.5549ms 391.4019 Ops/s 396.2468 Ops/s $\color{#d91a1a}-1.22\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.8452ms 3.2695ms 305.8584 Ops/s 304.8362 Ops/s $\color{#35bf28}+0.34\%$
test_collector_without_rb[100-img_shape0-atari] 33.6339ms 32.7146ms 30.5674 Ops/s 30.5424 Ops/s $\color{#35bf28}+0.08\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.8136ms 65.3109ms 15.3114 Ops/s 15.5287 Ops/s $\color{#d91a1a}-1.40\%$
test_collector_with_rb[100-img_shape0-atari] 38.4504ms 37.5123ms 26.6579 Ops/s 26.9850 Ops/s $\color{#d91a1a}-1.21\%$
test_collector_with_rb[200-img_shape1-large_batch] 95.0935ms 74.5338ms 13.4167 Ops/s 13.4988 Ops/s $\color{#d91a1a}-0.61\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 11, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.4283μs 80.3214μs 12.4500 KOps/s 11.7812 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_tensor_to_bytestream_speed[torch.save] 0.1441ms 0.1405ms 7.1179 KOps/s 7.1951 KOps/s $\color{#d91a1a}-1.07\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1082s 0.1078s 9.2734 Ops/s 8.9969 Ops/s $\color{#35bf28}+3.07\%$
test_tensor_to_bytestream_speed[numpy] 2.6145μs 2.6105μs 383.0625 KOps/s 384.4756 KOps/s $\color{#d91a1a}-0.37\%$
test_tensor_to_bytestream_speed[safetensors] 39.4034μs 38.4974μs 25.9758 KOps/s 26.6253 KOps/s $\color{#d91a1a}-2.44\%$
test_simple 0.8166s 0.8068s 1.2395 Ops/s 1.2298 Ops/s $\color{#35bf28}+0.79\%$
test_transformed 1.5080s 1.4107s 0.7089 Ops/s 0.7142 Ops/s $\color{#d91a1a}-0.75\%$
test_serial 2.3546s 2.3204s 0.4310 Ops/s 0.4393 Ops/s $\color{#d91a1a}-1.90\%$
test_parallel 1.9042s 1.8193s 0.5497 Ops/s 0.5538 Ops/s $\color{#d91a1a}-0.74\%$
test_step_mdp_speed[True-True-True-True-True] 0.1875ms 41.1671μs 24.2913 KOps/s 23.4497 KOps/s $\color{#35bf28}+3.59\%$
test_step_mdp_speed[True-True-True-True-False] 52.3010μs 23.7926μs 42.0299 KOps/s 41.2621 KOps/s $\color{#35bf28}+1.86\%$
test_step_mdp_speed[True-True-True-False-True] 48.1310μs 24.0309μs 41.6131 KOps/s 40.9741 KOps/s $\color{#35bf28}+1.56\%$
test_step_mdp_speed[True-True-True-False-False] 84.9110μs 12.9706μs 77.0976 KOps/s 73.2955 KOps/s $\textbf{\color{#35bf28}+5.19\%}$
test_step_mdp_speed[True-True-False-True-True] 73.8020μs 45.1811μs 22.1332 KOps/s 21.4554 KOps/s $\color{#35bf28}+3.16\%$
test_step_mdp_speed[True-True-False-True-False] 57.4100μs 26.1606μs 38.2254 KOps/s 37.2645 KOps/s $\color{#35bf28}+2.58\%$
test_step_mdp_speed[True-True-False-False-True] 57.9910μs 26.2596μs 38.0813 KOps/s 37.2860 KOps/s $\color{#35bf28}+2.13\%$
test_step_mdp_speed[True-True-False-False-False] 42.7800μs 15.8323μs 63.1620 KOps/s 62.5739 KOps/s $\color{#35bf28}+0.94\%$
test_step_mdp_speed[True-False-True-True-True] 89.9310μs 48.6533μs 20.5536 KOps/s 20.5832 KOps/s $\color{#d91a1a}-0.14\%$
test_step_mdp_speed[True-False-True-True-False] 89.6610μs 29.1608μs 34.2926 KOps/s 34.3342 KOps/s $\color{#d91a1a}-0.12\%$
test_step_mdp_speed[True-False-True-False-True] 62.4010μs 26.6690μs 37.4968 KOps/s 38.2592 KOps/s $\color{#d91a1a}-1.99\%$
test_step_mdp_speed[True-False-True-False-False] 45.1210μs 15.9866μs 62.5526 KOps/s 62.4203 KOps/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[True-False-False-True-True] 89.1320μs 50.7564μs 19.7019 KOps/s 19.3867 KOps/s $\color{#35bf28}+1.63\%$
test_step_mdp_speed[True-False-False-True-False] 61.4610μs 31.5389μs 31.7069 KOps/s 31.9123 KOps/s $\color{#d91a1a}-0.64\%$
test_step_mdp_speed[True-False-False-False-True] 61.1310μs 28.9780μs 34.5089 KOps/s 34.8205 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[True-False-False-False-False] 57.1810μs 18.4307μs 54.2572 KOps/s 53.7256 KOps/s $\color{#35bf28}+0.99\%$
test_step_mdp_speed[False-True-True-True-True] 92.5510μs 47.7511μs 20.9419 KOps/s 20.5900 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[False-True-True-True-False] 76.2410μs 28.8066μs 34.7143 KOps/s 34.7049 KOps/s $\color{#35bf28}+0.03\%$
test_step_mdp_speed[False-True-True-False-True] 2.5147ms 30.3268μs 32.9741 KOps/s 32.4693 KOps/s $\color{#35bf28}+1.55\%$
test_step_mdp_speed[False-True-True-False-False] 0.4211ms 17.3522μs 57.6297 KOps/s 56.5660 KOps/s $\color{#35bf28}+1.88\%$
test_step_mdp_speed[False-True-False-True-True] 0.4549ms 50.1185μs 19.9527 KOps/s 19.6099 KOps/s $\color{#35bf28}+1.75\%$
test_step_mdp_speed[False-True-False-True-False] 76.1320μs 31.5666μs 31.6790 KOps/s 31.9905 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[False-True-False-False-True] 0.4437ms 32.7139μs 30.5681 KOps/s 30.0832 KOps/s $\color{#35bf28}+1.61\%$
test_step_mdp_speed[False-True-False-False-False] 0.4349ms 20.0657μs 49.8362 KOps/s 49.2058 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[False-False-True-True-True] 0.4749ms 52.2607μs 19.1348 KOps/s 18.5180 KOps/s $\color{#35bf28}+3.33\%$
test_step_mdp_speed[False-False-True-True-False] 88.1920μs 34.0618μs 29.3584 KOps/s 28.9608 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[False-False-True-False-True] 0.4508ms 33.3134μs 30.0179 KOps/s 30.5573 KOps/s $\color{#d91a1a}-1.77\%$
test_step_mdp_speed[False-False-True-False-False] 0.4322ms 20.0872μs 49.7829 KOps/s 50.1982 KOps/s $\color{#d91a1a}-0.83\%$
test_step_mdp_speed[False-False-False-True-True] 0.4670ms 55.3230μs 18.0757 KOps/s 18.4371 KOps/s $\color{#d91a1a}-1.96\%$
test_step_mdp_speed[False-False-False-True-False] 0.4487ms 36.8838μs 27.1122 KOps/s 27.8665 KOps/s $\color{#d91a1a}-2.71\%$
test_step_mdp_speed[False-False-False-False-True] 0.1018ms 34.7826μs 28.7500 KOps/s 28.4735 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[False-False-False-False-False] 0.4378ms 22.5639μs 44.3186 KOps/s 44.1331 KOps/s $\color{#35bf28}+0.42\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8372s 0.7344s 1.3617 Ops/s 1.3487 Ops/s $\color{#35bf28}+0.96\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7043s 0.6044s 1.6544 Ops/s 1.6437 Ops/s $\color{#35bf28}+0.65\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7122s 1.6335s 0.6122 Ops/s 0.6095 Ops/s $\color{#35bf28}+0.45\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4902s 1.4120s 0.7082 Ops/s 0.7070 Ops/s $\color{#35bf28}+0.17\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9548s 1.8744s 0.5335 Ops/s 0.5301 Ops/s $\color{#35bf28}+0.65\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7265s 1.6481s 0.6068 Ops/s 0.6025 Ops/s $\color{#35bf28}+0.71\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.8713s 4.6998s 0.2128 Ops/s 0.2152 Ops/s $\color{#d91a1a}-1.11\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5192s 4.4519s 0.2246 Ops/s 0.2247 Ops/s $\color{#d91a1a}-0.04\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0651s 1.9245s 0.5196 Ops/s 0.5323 Ops/s $\color{#d91a1a}-2.38\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6737s 1.5705s 0.6368 Ops/s 0.6294 Ops/s $\color{#35bf28}+1.17\%$
test_values[generalized_advantage_estimate-True-True] 21.2848ms 20.6839ms 48.3469 Ops/s 48.2370 Ops/s $\color{#35bf28}+0.23\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1408s 3.7450ms 267.0255 Ops/s 258.7286 Ops/s $\color{#35bf28}+3.21\%$
test_values[td0_return_estimate-False-False] 0.1127ms 84.5621μs 11.8256 KOps/s 11.9043 KOps/s $\color{#d91a1a}-0.66\%$
test_values[td1_return_estimate-False-False] 49.4638ms 48.6325ms 20.5624 Ops/s 20.3631 Ops/s $\color{#35bf28}+0.98\%$
test_values[vec_td1_return_estimate-False-False] 1.2868ms 1.0930ms 914.9455 Ops/s 909.3368 Ops/s $\color{#35bf28}+0.62\%$
test_values[td_lambda_return_estimate-True-False] 83.4762ms 81.2668ms 12.3051 Ops/s 12.4170 Ops/s $\color{#d91a1a}-0.90\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3823ms 1.0974ms 911.2496 Ops/s 911.0087 Ops/s $\color{#35bf28}+0.03\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.1241ms 20.8983ms 47.8508 Ops/s 47.7535 Ops/s $\color{#35bf28}+0.20\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0409ms 0.7682ms 1.3017 KOps/s 1.2965 KOps/s $\color{#35bf28}+0.40\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7393ms 0.6966ms 1.4356 KOps/s 1.4606 KOps/s $\color{#d91a1a}-1.71\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5906ms 1.5022ms 665.6707 Ops/s 666.8378 Ops/s $\color{#d91a1a}-0.18\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7594ms 0.7217ms 1.3855 KOps/s 1.4152 KOps/s $\color{#d91a1a}-2.09\%$
test_dqn_speed[False-None] 1.7631ms 1.5606ms 640.7817 Ops/s 654.4045 Ops/s $\color{#d91a1a}-2.08\%$
test_dqn_speed[False-backward] 2.3493ms 2.1899ms 456.6419 Ops/s 457.2877 Ops/s $\color{#d91a1a}-0.14\%$
test_dqn_speed[True-None] 0.7193ms 0.5521ms 1.8112 KOps/s 1.7739 KOps/s $\color{#35bf28}+2.11\%$
test_dqn_speed[True-backward] 1.1727ms 1.1019ms 907.5145 Ops/s 854.1531 Ops/s $\textbf{\color{#35bf28}+6.25\%}$
test_dqn_speed[reduce-overhead-None] 1.0049ms 0.5879ms 1.7009 KOps/s 1.6942 KOps/s $\color{#35bf28}+0.39\%$
test_ddpg_speed[False-None] 3.2858ms 2.8981ms 345.0480 Ops/s 346.3112 Ops/s $\color{#d91a1a}-0.36\%$
test_ddpg_speed[False-backward] 4.5640ms 4.1801ms 239.2298 Ops/s 237.5607 Ops/s $\color{#35bf28}+0.70\%$
test_ddpg_speed[True-None] 1.8304ms 1.3136ms 761.2851 Ops/s 754.8201 Ops/s $\color{#35bf28}+0.86\%$
test_ddpg_speed[True-backward] 2.8669ms 2.3790ms 420.3453 Ops/s 416.1153 Ops/s $\color{#35bf28}+1.02\%$
test_ddpg_speed[reduce-overhead-None] 1.5002ms 1.3385ms 747.0789 Ops/s 744.4824 Ops/s $\color{#35bf28}+0.35\%$
test_sac_speed[False-None] 8.9631ms 8.3914ms 119.1693 Ops/s 118.9855 Ops/s $\color{#35bf28}+0.15\%$
test_sac_speed[False-backward] 11.8983ms 11.3379ms 88.2001 Ops/s 86.8654 Ops/s $\color{#35bf28}+1.54\%$
test_sac_speed[True-None] 2.2862ms 1.7943ms 557.3049 Ops/s 552.5744 Ops/s $\color{#35bf28}+0.86\%$
test_sac_speed[True-backward] 3.5729ms 3.4225ms 292.1866 Ops/s 291.0310 Ops/s $\color{#35bf28}+0.40\%$
test_sac_speed[reduce-overhead-None] 19.0778ms 10.8706ms 91.9914 Ops/s 81.8666 Ops/s $\textbf{\color{#35bf28}+12.37\%}$
test_redq_deprec_speed[False-None] 9.9670ms 9.3869ms 106.5312 Ops/s 106.2963 Ops/s $\color{#35bf28}+0.22\%$
test_redq_deprec_speed[False-backward] 12.8306ms 12.4955ms 80.0289 Ops/s 79.2912 Ops/s $\color{#35bf28}+0.93\%$
test_redq_deprec_speed[True-None] 2.6033ms 2.5090ms 398.5575 Ops/s 396.4923 Ops/s $\color{#35bf28}+0.52\%$
test_redq_deprec_speed[True-backward] 4.1651ms 4.0796ms 245.1217 Ops/s 230.5340 Ops/s $\textbf{\color{#35bf28}+6.33\%}$
test_redq_deprec_speed[reduce-overhead-None] 16.0053ms 9.8054ms 101.9841 Ops/s 102.3404 Ops/s $\color{#d91a1a}-0.35\%$
test_td3_speed[False-None] 8.3698ms 8.2276ms 121.5420 Ops/s 121.8549 Ops/s $\color{#d91a1a}-0.26\%$
test_td3_speed[False-backward] 10.8906ms 10.6187ms 94.1739 Ops/s 91.6886 Ops/s $\color{#35bf28}+2.71\%$
test_td3_speed[True-None] 1.6706ms 1.6091ms 621.4712 Ops/s 625.8609 Ops/s $\color{#d91a1a}-0.70\%$
test_td3_speed[True-backward] 3.0749ms 3.0390ms 329.0516 Ops/s 305.5755 Ops/s $\textbf{\color{#35bf28}+7.68\%}$
test_td3_speed[reduce-overhead-None] 47.5406ms 24.3810ms 41.0155 Ops/s 40.5807 Ops/s $\color{#35bf28}+1.07\%$
test_cql_speed[False-None] 17.9928ms 17.4386ms 57.3441 Ops/s 57.5223 Ops/s $\color{#d91a1a}-0.31\%$
test_cql_speed[False-backward] 23.2666ms 22.6806ms 44.0906 Ops/s 43.3329 Ops/s $\color{#35bf28}+1.75\%$
test_cql_speed[True-None] 3.4413ms 3.2386ms 308.7746 Ops/s 305.8771 Ops/s $\color{#35bf28}+0.95\%$
test_cql_speed[True-backward] 5.4529ms 5.3129ms 188.2209 Ops/s 185.5529 Ops/s $\color{#35bf28}+1.44\%$
test_cql_speed[reduce-overhead-None] 18.8195ms 11.9188ms 83.9010 Ops/s 84.3272 Ops/s $\color{#d91a1a}-0.51\%$
test_a2c_speed[False-None] 3.9564ms 3.2501ms 307.6782 Ops/s 304.5374 Ops/s $\color{#35bf28}+1.03\%$
test_a2c_speed[False-backward] 6.6751ms 6.1840ms 161.7065 Ops/s 160.6083 Ops/s $\color{#35bf28}+0.68\%$
test_a2c_speed[True-None] 1.4134ms 1.3187ms 758.3211 Ops/s 756.6437 Ops/s $\color{#35bf28}+0.22\%$
test_a2c_speed[True-backward] 2.9981ms 2.9440ms 339.6716 Ops/s 337.6759 Ops/s $\color{#35bf28}+0.59\%$
test_a2c_speed[reduce-overhead-None] 1.0405ms 0.9785ms 1.0220 KOps/s 1.0272 KOps/s $\color{#d91a1a}-0.51\%$
test_ppo_speed[False-None] 3.9654ms 3.8712ms 258.3154 Ops/s 257.7776 Ops/s $\color{#35bf28}+0.21\%$
test_ppo_speed[False-backward] 7.4672ms 7.0314ms 142.2189 Ops/s 141.4781 Ops/s $\color{#35bf28}+0.52\%$
test_ppo_speed[True-None] 1.9643ms 1.4199ms 704.2836 Ops/s 714.2495 Ops/s $\color{#d91a1a}-1.40\%$
test_ppo_speed[True-backward] 3.1077ms 3.0713ms 325.5998 Ops/s 301.9891 Ops/s $\textbf{\color{#35bf28}+7.82\%}$
test_ppo_speed[reduce-overhead-None] 1.4706ms 1.0273ms 973.4258 Ops/s 937.6240 Ops/s $\color{#35bf28}+3.82\%$
test_reinforce_speed[False-None] 2.7317ms 2.2928ms 436.1549 Ops/s 433.5075 Ops/s $\color{#35bf28}+0.61\%$
test_reinforce_speed[False-backward] 3.7413ms 3.3055ms 302.5267 Ops/s 300.6447 Ops/s $\color{#35bf28}+0.63\%$
test_reinforce_speed[True-None] 1.4320ms 1.2531ms 797.9935 Ops/s 792.3447 Ops/s $\color{#35bf28}+0.71\%$
test_reinforce_speed[True-backward] 2.9848ms 2.9267ms 341.6829 Ops/s 339.9986 Ops/s $\color{#35bf28}+0.50\%$
test_reinforce_speed[reduce-overhead-None] 17.3177ms 9.5277ms 104.9571 Ops/s 103.9472 Ops/s $\color{#35bf28}+0.97\%$
test_iql_speed[False-None] 10.0727ms 9.4624ms 105.6814 Ops/s 105.1527 Ops/s $\color{#35bf28}+0.50\%$
test_iql_speed[False-backward] 14.1196ms 13.2315ms 75.5770 Ops/s 75.7893 Ops/s $\color{#d91a1a}-0.28\%$
test_iql_speed[True-None] 2.3323ms 2.1619ms 462.5490 Ops/s 460.7992 Ops/s $\color{#35bf28}+0.38\%$
test_iql_speed[True-backward] 5.1784ms 4.6912ms 213.1634 Ops/s 205.3415 Ops/s $\color{#35bf28}+3.81\%$
test_iql_speed[reduce-overhead-None] 17.7785ms 10.5109ms 95.1392 Ops/s 94.8231 Ops/s $\color{#35bf28}+0.33\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4744ms 6.0378ms 165.6240 Ops/s 167.3798 Ops/s $\color{#d91a1a}-1.05\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.1114ms 0.3575ms 2.7971 KOps/s 3.4477 KOps/s $\textbf{\color{#d91a1a}-18.87\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6330ms 0.3401ms 2.9403 KOps/s 3.6951 KOps/s $\textbf{\color{#d91a1a}-20.43\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4483ms 5.8630ms 170.5619 Ops/s 171.3758 Ops/s $\color{#d91a1a}-0.47\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9088ms 0.3392ms 2.9485 KOps/s 3.0238 KOps/s $\color{#d91a1a}-2.49\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.9423ms 0.3140ms 3.1844 KOps/s 3.7233 KOps/s $\textbf{\color{#d91a1a}-14.47\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.8069ms 1.4653ms 682.4702 Ops/s 653.6836 Ops/s $\color{#35bf28}+4.40\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6566ms 1.3704ms 729.7220 Ops/s 690.4853 Ops/s $\textbf{\color{#35bf28}+5.68\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.3361ms 6.1193ms 163.4161 Ops/s 167.9852 Ops/s $\color{#d91a1a}-2.72\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9263ms 0.4829ms 2.0709 KOps/s 1.8820 KOps/s $\textbf{\color{#35bf28}+10.04\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9686ms 0.4574ms 2.1862 KOps/s 1.9068 KOps/s $\textbf{\color{#35bf28}+14.65\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1296ms 5.9086ms 169.2450 Ops/s 170.2543 Ops/s $\color{#d91a1a}-0.59\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9117ms 0.3867ms 2.5857 KOps/s 2.9992 KOps/s $\textbf{\color{#d91a1a}-13.79\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5740ms 0.3702ms 2.7012 KOps/s 3.3256 KOps/s $\textbf{\color{#d91a1a}-18.78\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4819ms 5.8542ms 170.8188 Ops/s 171.9290 Ops/s $\color{#d91a1a}-0.65\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8801ms 0.3327ms 3.0055 KOps/s 2.8472 KOps/s $\textbf{\color{#35bf28}+5.56\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7207ms 0.3218ms 3.1078 KOps/s 3.4596 KOps/s $\textbf{\color{#d91a1a}-10.17\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.3018ms 6.0568ms 165.1039 Ops/s 167.5238 Ops/s $\color{#d91a1a}-1.44\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9936ms 0.5613ms 1.7817 KOps/s 2.2615 KOps/s $\textbf{\color{#d91a1a}-21.22\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7575ms 0.5019ms 1.9923 KOps/s 2.2379 KOps/s $\textbf{\color{#d91a1a}-10.97\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5861s 16.6494ms 60.0622 Ops/s 50.6639 Ops/s $\textbf{\color{#35bf28}+18.55\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.9621ms 1.9340ms 517.0599 Ops/s 537.9834 Ops/s $\color{#d91a1a}-3.89\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.8158ms 0.9765ms 1.0241 KOps/s 1.0439 KOps/s $\color{#d91a1a}-1.90\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 8.8657ms 5.0675ms 197.3354 Ops/s 195.9204 Ops/s $\color{#35bf28}+0.72\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 9.4083ms 1.9276ms 518.7933 Ops/s 495.5264 Ops/s $\color{#35bf28}+4.70\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.1323ms 0.9727ms 1.0281 KOps/s 1.0232 KOps/s $\color{#35bf28}+0.48\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5373s 15.8832ms 62.9597 Ops/s 187.9283 Ops/s $\textbf{\color{#d91a1a}-66.50\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.5606ms 2.1575ms 463.4985 Ops/s 506.2430 Ops/s $\textbf{\color{#d91a1a}-8.44\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.1088ms 1.1294ms 885.4243 Ops/s 823.4414 Ops/s $\textbf{\color{#35bf28}+7.53\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 39.9563ms 36.3532ms 27.5079 Ops/s 27.2485 Ops/s $\color{#35bf28}+0.95\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.5312ms 18.7219ms 53.4135 Ops/s 54.5609 Ops/s $\color{#d91a1a}-2.10\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.9220ms 37.7269ms 26.5063 Ops/s 26.5456 Ops/s $\color{#d91a1a}-0.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.3323ms 18.6708ms 53.5597 Ops/s 53.4554 Ops/s $\color{#35bf28}+0.20\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 42.2951ms 39.4271ms 25.3633 Ops/s 25.1535 Ops/s $\color{#35bf28}+0.83\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 22.1898ms 20.4873ms 48.8108 Ops/s 49.8728 Ops/s $\color{#d91a1a}-2.13\%$
test_storage_write_lazystack[50-img_shape0-small] 0.9144ms 0.2308ms 4.3333 KOps/s 4.5336 KOps/s $\color{#d91a1a}-4.42\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7225ms 1.4350ms 696.8444 Ops/s 684.5979 Ops/s $\color{#35bf28}+1.79\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.6257ms 2.3788ms 420.3780 Ops/s 423.3351 Ops/s $\color{#d91a1a}-0.70\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.3618ms 3.0112ms 332.0919 Ops/s 329.8100 Ops/s $\color{#35bf28}+0.69\%$
test_storage_write_contiguous[50-img_shape0-small] 0.6192ms 0.1638ms 6.1046 KOps/s 6.1294 KOps/s $\color{#d91a1a}-0.40\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3829ms 0.2284ms 4.3774 KOps/s 4.3376 KOps/s $\color{#35bf28}+0.92\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1576ms 1.8886ms 529.4947 Ops/s 523.5576 Ops/s $\color{#35bf28}+1.13\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.8583ms 1.3863ms 721.3668 Ops/s 721.4904 Ops/s $\color{#d91a1a}-0.02\%$
test_collector_stack_then_write[50-img_shape0-small] 1.5960ms 1.1710ms 853.9665 Ops/s 855.1088 Ops/s $\color{#d91a1a}-0.13\%$
test_collector_stack_then_write[100-img_shape1-atari] 4.1410ms 3.6933ms 270.7600 Ops/s 276.4463 Ops/s $\color{#d91a1a}-2.06\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.4005ms 6.0004ms 166.6548 Ops/s 168.6078 Ops/s $\color{#d91a1a}-1.16\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 8.9991ms 8.1842ms 122.1862 Ops/s 137.3156 Ops/s $\textbf{\color{#d91a1a}-11.02\%}$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4350ms 0.2801ms 3.5707 KOps/s 3.5788 KOps/s $\color{#d91a1a}-0.23\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7351ms 1.5738ms 635.4141 Ops/s 642.6897 Ops/s $\color{#d91a1a}-1.13\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8071ms 2.4795ms 403.3152 Ops/s 393.8202 Ops/s $\color{#35bf28}+2.41\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.6386ms 3.2553ms 307.1928 Ops/s 307.4017 Ops/s $\color{#d91a1a}-0.07\%$
test_collector_without_rb[100-img_shape0-atari] 33.4777ms 32.8992ms 30.3959 Ops/s 29.9721 Ops/s $\color{#35bf28}+1.41\%$
test_collector_without_rb[200-img_shape1-large_batch] 65.0309ms 64.5092ms 15.5017 Ops/s 15.3446 Ops/s $\color{#35bf28}+1.02\%$
test_collector_with_rb[100-img_shape0-atari] 38.1746ms 37.4208ms 26.7231 Ops/s 26.4511 Ops/s $\color{#35bf28}+1.03\%$
test_collector_with_rb[200-img_shape1-large_batch] 74.7173ms 73.7917ms 13.5517 Ops/s 13.5301 Ops/s $\color{#35bf28}+0.16\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 55.9423ms 55.2657ms 18.0944 Ops/s 17.8712 Ops/s $\color{#35bf28}+1.25\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1104s 0.1100s 9.0869 Ops/s 8.9518 Ops/s $\color{#35bf28}+1.51\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 57.8019ms 57.2549ms 17.4658 Ops/s 17.2691 Ops/s $\color{#35bf28}+1.14\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.7732s 0.1871s 5.3457 Ops/s 8.6436 Ops/s $\textbf{\color{#d91a1a}-38.15\%}$

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 21, 2026
Adds RayTransport using ray.util.queue.Queue for distributed inference
across Ray actors. Ray is imported lazily at instantiation time.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 660ee98
Pull-Request: #3495
Co-authored-by: Cursor <cursoragent@cursor.com>
@vmoens vmoens merged commit c30bf0e into gh/vmoens/237/base Feb 21, 2026
114 of 116 checks passed
@vmoens vmoens deleted the gh/vmoens/237/head branch February 21, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Feature New feature Modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant