Skip to content

[Feature] Auto-batching inference server: Monarch transport#3496

Merged
vmoens merged 5 commits intogh/vmoens/238/basefrom
gh/vmoens/238/head
Feb 21, 2026
Merged

[Feature] Auto-batching inference server: Monarch transport#3496
vmoens merged 5 commits intogh/vmoens/238/basefrom
gh/vmoens/238/head

Conversation

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3496

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit e65ee9c with merge base 266e4aa (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 11, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}19$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 85.9192μs 84.5298μs 11.8302 KOps/s 11.6699 KOps/s $\color{#35bf28}+1.37\%$
test_tensor_to_bytestream_speed[torch.save] 0.1463ms 0.1434ms 6.9719 KOps/s 7.1484 KOps/s $\color{#d91a1a}-2.47\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1121s 0.1113s 8.9861 Ops/s 9.1610 Ops/s $\color{#d91a1a}-1.91\%$
test_tensor_to_bytestream_speed[numpy] 2.6531μs 2.6445μs 378.1373 KOps/s 387.0803 KOps/s $\color{#d91a1a}-2.31\%$
test_tensor_to_bytestream_speed[safetensors] 39.9198μs 39.3128μs 25.4370 KOps/s 26.3128 KOps/s $\color{#d91a1a}-3.33\%$
test_simple 0.5459s 0.5453s 1.8340 Ops/s 1.7576 Ops/s $\color{#35bf28}+4.34\%$
test_transformed 1.0844s 1.0828s 0.9235 Ops/s 0.8982 Ops/s $\color{#35bf28}+2.82\%$
test_serial 1.6777s 1.6672s 0.5998 Ops/s 0.5947 Ops/s $\color{#35bf28}+0.85\%$
test_parallel 1.0137s 1.0102s 0.9899 Ops/s 0.9777 Ops/s $\color{#35bf28}+1.24\%$
test_step_mdp_speed[True-True-True-True-True] 0.3484ms 41.6167μs 24.0288 KOps/s 24.1729 KOps/s $\color{#d91a1a}-0.60\%$
test_step_mdp_speed[True-True-True-True-False] 51.8510μs 23.4895μs 42.5722 KOps/s 42.9071 KOps/s $\color{#d91a1a}-0.78\%$
test_step_mdp_speed[True-True-True-False-True] 57.1210μs 23.4165μs 42.7050 KOps/s 42.8035 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[True-True-True-False-False] 45.2000μs 13.0767μs 76.4717 KOps/s 76.9373 KOps/s $\color{#d91a1a}-0.61\%$
test_step_mdp_speed[True-True-False-True-True] 78.2420μs 44.7273μs 22.3577 KOps/s 22.2711 KOps/s $\color{#35bf28}+0.39\%$
test_step_mdp_speed[True-True-False-True-False] 55.8910μs 25.8596μs 38.6703 KOps/s 39.3689 KOps/s $\color{#d91a1a}-1.77\%$
test_step_mdp_speed[True-True-False-False-True] 59.7510μs 26.4480μs 37.8100 KOps/s 38.4477 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[True-True-False-False-False] 51.6810μs 15.5216μs 64.4263 KOps/s 65.1372 KOps/s $\color{#d91a1a}-1.09\%$
test_step_mdp_speed[True-False-True-True-True] 85.4420μs 47.1538μs 21.2072 KOps/s 21.1883 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[True-False-True-True-False] 55.7910μs 28.4029μs 35.2076 KOps/s 35.4377 KOps/s $\color{#d91a1a}-0.65\%$
test_step_mdp_speed[True-False-True-False-True] 57.2810μs 26.4461μs 37.8128 KOps/s 38.0270 KOps/s $\color{#d91a1a}-0.56\%$
test_step_mdp_speed[True-False-True-False-False] 41.0610μs 15.7488μs 63.4968 KOps/s 64.2616 KOps/s $\color{#d91a1a}-1.19\%$
test_step_mdp_speed[True-False-False-True-True] 84.0420μs 50.2980μs 19.8815 KOps/s 20.0582 KOps/s $\color{#d91a1a}-0.88\%$
test_step_mdp_speed[True-False-False-True-False] 64.5310μs 31.1087μs 32.1454 KOps/s 32.3287 KOps/s $\color{#d91a1a}-0.57\%$
test_step_mdp_speed[True-False-False-False-True] 60.4610μs 28.5782μs 34.9917 KOps/s 34.3276 KOps/s $\color{#35bf28}+1.93\%$
test_step_mdp_speed[True-False-False-False-False] 49.8210μs 17.9099μs 55.8351 KOps/s 54.4610 KOps/s $\color{#35bf28}+2.52\%$
test_step_mdp_speed[False-True-True-True-True] 76.5120μs 47.4451μs 21.0770 KOps/s 20.7847 KOps/s $\color{#35bf28}+1.41\%$
test_step_mdp_speed[False-True-True-True-False] 56.7010μs 28.6139μs 34.9480 KOps/s 34.4537 KOps/s $\color{#35bf28}+1.43\%$
test_step_mdp_speed[False-True-True-False-True] 2.4796ms 30.3578μs 32.9405 KOps/s 33.5676 KOps/s $\color{#d91a1a}-1.87\%$
test_step_mdp_speed[False-True-True-False-False] 43.9210μs 17.2473μs 57.9801 KOps/s 57.3199 KOps/s $\color{#35bf28}+1.15\%$
test_step_mdp_speed[False-True-False-True-True] 81.9420μs 49.3205μs 20.2755 KOps/s 20.1124 KOps/s $\color{#35bf28}+0.81\%$
test_step_mdp_speed[False-True-False-True-False] 63.4710μs 31.2015μs 32.0497 KOps/s 32.0056 KOps/s $\color{#35bf28}+0.14\%$
test_step_mdp_speed[False-True-False-False-True] 61.5710μs 32.6154μs 30.6604 KOps/s 30.8837 KOps/s $\color{#d91a1a}-0.72\%$
test_step_mdp_speed[False-True-False-False-False] 51.0710μs 19.8378μs 50.4088 KOps/s 49.6585 KOps/s $\color{#35bf28}+1.51\%$
test_step_mdp_speed[False-False-True-True-True] 88.2410μs 52.3655μs 19.0965 KOps/s 18.7538 KOps/s $\color{#35bf28}+1.83\%$
test_step_mdp_speed[False-False-True-True-False] 63.0110μs 33.6193μs 29.7449 KOps/s 29.7698 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[False-False-True-False-True] 63.8810μs 31.9322μs 31.3163 KOps/s 30.8919 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[False-False-True-False-False] 51.2610μs 19.9235μs 50.1919 KOps/s 50.1676 KOps/s $\color{#35bf28}+0.05\%$
test_step_mdp_speed[False-False-False-True-True] 84.1610μs 54.3507μs 18.3990 KOps/s 18.2447 KOps/s $\color{#35bf28}+0.85\%$
test_step_mdp_speed[False-False-False-True-False] 78.5920μs 36.1216μs 27.6842 KOps/s 27.6451 KOps/s $\color{#35bf28}+0.14\%$
test_step_mdp_speed[False-False-False-False-True] 62.2010μs 34.1345μs 29.2959 KOps/s 28.9143 KOps/s $\color{#35bf28}+1.32\%$
test_step_mdp_speed[False-False-False-False-False] 53.2410μs 22.1013μs 45.2463 KOps/s 45.0885 KOps/s $\color{#35bf28}+0.35\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8384s 0.7337s 1.3629 Ops/s 1.3553 Ops/s $\color{#35bf28}+0.57\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7018s 0.6040s 1.6555 Ops/s 1.6527 Ops/s $\color{#35bf28}+0.17\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7120s 1.6297s 0.6136 Ops/s 0.6114 Ops/s $\color{#35bf28}+0.35\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4881s 1.4080s 0.7102 Ops/s 0.7081 Ops/s $\color{#35bf28}+0.29\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9513s 1.8718s 0.5342 Ops/s 0.5312 Ops/s $\color{#35bf28}+0.58\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7256s 1.6466s 0.6073 Ops/s 0.6022 Ops/s $\color{#35bf28}+0.85\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7514s 4.6614s 0.2145 Ops/s 0.2170 Ops/s $\color{#d91a1a}-1.15\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5506s 4.4280s 0.2258 Ops/s 0.2260 Ops/s $\color{#d91a1a}-0.07\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0300s 1.8917s 0.5286 Ops/s 0.5388 Ops/s $\color{#d91a1a}-1.90\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7245s 1.6019s 0.6243 Ops/s 0.6146 Ops/s $\color{#35bf28}+1.58\%$
test_values[generalized_advantage_estimate-True-True] 10.8561ms 10.5722ms 94.5879 Ops/s 94.4414 Ops/s $\color{#35bf28}+0.16\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.4747ms 17.6721ms 56.5864 Ops/s 56.8107 Ops/s $\color{#d91a1a}-0.39\%$
test_values[td0_return_estimate-False-False] 0.1963ms 0.1243ms 8.0421 KOps/s 7.7283 KOps/s $\color{#35bf28}+4.06\%$
test_values[td1_return_estimate-False-False] 28.8997ms 28.5273ms 35.0541 Ops/s 33.8599 Ops/s $\color{#35bf28}+3.53\%$
test_values[vec_td1_return_estimate-False-False] 17.9116ms 17.6139ms 56.7733 Ops/s 56.7444 Ops/s $\color{#35bf28}+0.05\%$
test_values[td_lambda_return_estimate-True-False] 60.2066ms 42.5666ms 23.4926 Ops/s 22.8605 Ops/s $\color{#35bf28}+2.76\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.0197ms 16.9656ms 58.9429 Ops/s 56.6599 Ops/s $\color{#35bf28}+4.03\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.4462ms 9.3523ms 106.9253 Ops/s 107.5653 Ops/s $\color{#d91a1a}-0.60\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7748ms 1.4560ms 686.8239 Ops/s 651.9224 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5637ms 0.4180ms 2.3925 KOps/s 2.2844 KOps/s $\color{#35bf28}+4.74\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.1193ms 30.4230ms 32.8698 Ops/s 28.7411 Ops/s $\textbf{\color{#35bf28}+14.37\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.0956ms 1.7000ms 588.2474 Ops/s 582.4159 Ops/s $\color{#35bf28}+1.00\%$
test_dqn_speed[False-None] 1.4991ms 1.4015ms 713.5173 Ops/s 717.6558 Ops/s $\color{#d91a1a}-0.58\%$
test_dqn_speed[False-backward] 2.0251ms 1.9396ms 515.5622 Ops/s 522.0164 Ops/s $\color{#d91a1a}-1.24\%$
test_dqn_speed[True-None] 0.8171ms 0.5563ms 1.7975 KOps/s 1.7670 KOps/s $\color{#35bf28}+1.73\%$
test_dqn_speed[True-backward] 1.0445ms 1.0075ms 992.5274 Ops/s 899.5357 Ops/s $\textbf{\color{#35bf28}+10.34\%}$
test_dqn_speed[reduce-overhead-None] 0.7118ms 0.5347ms 1.8703 KOps/s 1.8187 KOps/s $\color{#35bf28}+2.84\%$
test_ddpg_speed[False-None] 3.2449ms 2.8452ms 351.4676 Ops/s 352.9853 Ops/s $\color{#d91a1a}-0.43\%$
test_ddpg_speed[False-backward] 4.1372ms 4.0554ms 246.5838 Ops/s 247.3685 Ops/s $\color{#d91a1a}-0.32\%$
test_ddpg_speed[True-None] 1.8100ms 1.3965ms 716.0920 Ops/s 694.6822 Ops/s $\color{#35bf28}+3.08\%$
test_ddpg_speed[True-backward] 2.5037ms 2.3867ms 418.9919 Ops/s 418.0236 Ops/s $\color{#35bf28}+0.23\%$
test_ddpg_speed[reduce-overhead-None] 1.7947ms 1.3919ms 718.4628 Ops/s 716.6745 Ops/s $\color{#35bf28}+0.25\%$
test_sac_speed[False-None] 8.4678ms 7.9823ms 125.2772 Ops/s 126.2811 Ops/s $\color{#d91a1a}-0.80\%$
test_sac_speed[False-backward] 11.7121ms 11.2263ms 89.0769 Ops/s 89.6567 Ops/s $\color{#d91a1a}-0.65\%$
test_sac_speed[True-None] 3.6378ms 2.1642ms 462.0551 Ops/s 468.1579 Ops/s $\color{#d91a1a}-1.30\%$
test_sac_speed[True-backward] 4.3691ms 3.9420ms 253.6811 Ops/s 247.9259 Ops/s $\color{#35bf28}+2.32\%$
test_sac_speed[reduce-overhead-None] 2.4570ms 2.0798ms 480.8176 Ops/s 472.9821 Ops/s $\color{#35bf28}+1.66\%$
test_redq_speed[False-None] 10.9357ms 10.2785ms 97.2903 Ops/s 95.3823 Ops/s $\color{#35bf28}+2.00\%$
test_redq_speed[False-backward] 18.2598ms 17.5420ms 57.0061 Ops/s 56.4228 Ops/s $\color{#35bf28}+1.03\%$
test_redq_speed[True-None] 4.5953ms 4.3114ms 231.9412 Ops/s 232.3683 Ops/s $\color{#d91a1a}-0.18\%$
test_redq_speed[True-backward] 9.7833ms 9.4901ms 105.3727 Ops/s 106.1829 Ops/s $\color{#d91a1a}-0.76\%$
test_redq_speed[reduce-overhead-None] 4.6998ms 4.2440ms 235.6282 Ops/s 237.7935 Ops/s $\color{#d91a1a}-0.91\%$
test_redq_deprec_speed[False-None] 11.7699ms 11.1939ms 89.3347 Ops/s 92.4257 Ops/s $\color{#d91a1a}-3.34\%$
test_redq_deprec_speed[False-backward] 16.5346ms 16.1262ms 62.0109 Ops/s 64.2241 Ops/s $\color{#d91a1a}-3.45\%$
test_redq_deprec_speed[True-None] 3.7010ms 3.5565ms 281.1762 Ops/s 270.1052 Ops/s $\color{#35bf28}+4.10\%$
test_redq_deprec_speed[True-backward] 7.7293ms 7.4764ms 133.7550 Ops/s 127.4773 Ops/s $\color{#35bf28}+4.92\%$
test_redq_deprec_speed[reduce-overhead-None] 3.9142ms 3.5323ms 283.0995 Ops/s 276.3457 Ops/s $\color{#35bf28}+2.44\%$
test_td3_speed[False-None] 8.1864ms 7.9966ms 125.0524 Ops/s 123.9259 Ops/s $\color{#35bf28}+0.91\%$
test_td3_speed[False-backward] 11.2861ms 10.8434ms 92.2219 Ops/s 91.4140 Ops/s $\color{#35bf28}+0.88\%$
test_td3_speed[True-None] 1.8229ms 1.7831ms 560.8326 Ops/s 561.8733 Ops/s $\color{#d91a1a}-0.19\%$
test_td3_speed[True-backward] 3.6737ms 3.5267ms 283.5503 Ops/s 230.2010 Ops/s $\textbf{\color{#35bf28}+23.18\%}$
test_td3_speed[reduce-overhead-None] 1.7984ms 1.7636ms 567.0069 Ops/s 562.1387 Ops/s $\color{#35bf28}+0.87\%$
test_cql_speed[False-None] 28.9729ms 25.9420ms 38.5475 Ops/s 38.8083 Ops/s $\color{#d91a1a}-0.67\%$
test_cql_speed[False-backward] 39.4071ms 35.6445ms 28.0548 Ops/s 28.2001 Ops/s $\color{#d91a1a}-0.52\%$
test_cql_speed[True-None] 15.0780ms 12.3212ms 81.1606 Ops/s 81.5976 Ops/s $\color{#d91a1a}-0.54\%$
test_cql_speed[True-backward] 18.7747ms 17.8817ms 55.9229 Ops/s 58.1860 Ops/s $\color{#d91a1a}-3.89\%$
test_cql_speed[reduce-overhead-None] 12.7932ms 12.3173ms 81.1865 Ops/s 80.1558 Ops/s $\color{#35bf28}+1.29\%$
test_a2c_speed[False-None] 5.6491ms 5.4421ms 183.7510 Ops/s 185.9478 Ops/s $\color{#d91a1a}-1.18\%$
test_a2c_speed[False-backward] 12.1477ms 11.8475ms 84.4057 Ops/s 84.3695 Ops/s $\color{#35bf28}+0.04\%$
test_a2c_speed[True-None] 3.8654ms 3.6459ms 274.2841 Ops/s 268.4086 Ops/s $\color{#35bf28}+2.19\%$
test_a2c_speed[True-backward] 8.7168ms 8.5207ms 117.3614 Ops/s 115.9202 Ops/s $\color{#35bf28}+1.24\%$
test_a2c_speed[reduce-overhead-None] 4.1253ms 3.6474ms 274.1651 Ops/s 270.8065 Ops/s $\color{#35bf28}+1.24\%$
test_ppo_speed[False-None] 6.3975ms 5.9431ms 168.2625 Ops/s 166.5817 Ops/s $\color{#35bf28}+1.01\%$
test_ppo_speed[False-backward] 12.8174ms 12.5324ms 79.7930 Ops/s 80.4581 Ops/s $\color{#d91a1a}-0.83\%$
test_ppo_speed[True-None] 3.9013ms 3.5429ms 282.2581 Ops/s 275.6042 Ops/s $\color{#35bf28}+2.41\%$
test_ppo_speed[True-backward] 8.6096ms 8.3321ms 120.0182 Ops/s 115.6739 Ops/s $\color{#35bf28}+3.76\%$
test_ppo_speed[reduce-overhead-None] 3.9413ms 3.5624ms 280.7136 Ops/s 279.6453 Ops/s $\color{#35bf28}+0.38\%$
test_reinforce_speed[False-None] 4.9598ms 4.5126ms 221.5993 Ops/s 226.2139 Ops/s $\color{#d91a1a}-2.04\%$
test_reinforce_speed[False-backward] 7.6096ms 7.3784ms 135.5301 Ops/s 137.4036 Ops/s $\color{#d91a1a}-1.36\%$
test_reinforce_speed[True-None] 3.2150ms 2.8214ms 354.4300 Ops/s 351.4879 Ops/s $\color{#35bf28}+0.84\%$
test_reinforce_speed[True-backward] 7.8255ms 7.6277ms 131.1019 Ops/s 119.6783 Ops/s $\textbf{\color{#35bf28}+9.55\%}$
test_reinforce_speed[reduce-overhead-None] 3.0251ms 2.8009ms 357.0282 Ops/s 350.4119 Ops/s $\color{#35bf28}+1.89\%$
test_iql_speed[False-None] 24.7536ms 20.1416ms 49.6485 Ops/s 50.3590 Ops/s $\color{#d91a1a}-1.41\%$
test_iql_speed[False-backward] 30.8681ms 30.2926ms 33.0114 Ops/s 32.8110 Ops/s $\color{#35bf28}+0.61\%$
test_iql_speed[True-None] 8.7983ms 8.2714ms 120.8987 Ops/s 117.8014 Ops/s $\color{#35bf28}+2.63\%$
test_iql_speed[True-backward] 21.2103ms 16.9509ms 58.9940 Ops/s 60.3999 Ops/s $\color{#d91a1a}-2.33\%$
test_iql_speed[reduce-overhead-None] 8.5587ms 8.3212ms 120.1748 Ops/s 116.6686 Ops/s $\color{#35bf28}+3.01\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2953ms 6.0075ms 166.4592 Ops/s 165.5688 Ops/s $\color{#35bf28}+0.54\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.6663ms 0.2829ms 3.5351 KOps/s 2.8785 KOps/s $\textbf{\color{#35bf28}+22.81\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4938ms 0.2641ms 3.7865 KOps/s 3.0289 KOps/s $\textbf{\color{#35bf28}+25.01\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0091ms 5.7709ms 173.2837 Ops/s 172.1631 Ops/s $\color{#35bf28}+0.65\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.4034ms 0.3157ms 3.1679 KOps/s 2.9663 KOps/s $\textbf{\color{#35bf28}+6.80\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5564ms 0.3061ms 3.2666 KOps/s 3.1109 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7039ms 1.4307ms 698.9425 Ops/s 733.6467 Ops/s $\color{#d91a1a}-4.73\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6050ms 1.3522ms 739.5502 Ops/s 781.0927 Ops/s $\textbf{\color{#d91a1a}-5.32\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.4248ms 6.1051ms 163.7967 Ops/s 168.6049 Ops/s $\color{#d91a1a}-2.85\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8893ms 0.4495ms 2.2249 KOps/s 2.0791 KOps/s $\textbf{\color{#35bf28}+7.01\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6734ms 0.4181ms 2.3919 KOps/s 2.1691 KOps/s $\textbf{\color{#35bf28}+10.27\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9145ms 5.8115ms 172.0727 Ops/s 172.2369 Ops/s $\color{#d91a1a}-0.10\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6727ms 0.3896ms 2.5666 KOps/s 2.9331 KOps/s $\textbf{\color{#d91a1a}-12.50\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5646ms 0.2946ms 3.3941 KOps/s 3.0752 KOps/s $\textbf{\color{#35bf28}+10.37\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0394ms 5.8161ms 171.9365 Ops/s 172.0848 Ops/s $\color{#d91a1a}-0.09\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.5396ms 0.2887ms 3.4634 KOps/s 2.9866 KOps/s $\textbf{\color{#35bf28}+15.96\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5371ms 0.2667ms 3.7501 KOps/s 3.1025 KOps/s $\textbf{\color{#35bf28}+20.87\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0612ms 5.9766ms 167.3199 Ops/s 167.7049 Ops/s $\color{#d91a1a}-0.23\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1996ms 0.4334ms 2.3073 KOps/s 1.9229 KOps/s $\textbf{\color{#35bf28}+19.99\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6130ms 0.4111ms 2.4324 KOps/s 2.4148 KOps/s $\color{#35bf28}+0.73\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4184ms 4.9560ms 201.7768 Ops/s 197.7119 Ops/s $\color{#35bf28}+2.06\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.5092ms 2.0115ms 497.1533 Ops/s 465.4518 Ops/s $\textbf{\color{#35bf28}+6.81\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.6034ms 0.9049ms 1.1052 KOps/s 773.8936 Ops/s $\textbf{\color{#35bf28}+42.80\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5219s 15.4205ms 64.8489 Ops/s 59.3237 Ops/s $\textbf{\color{#35bf28}+9.31\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 9.9698ms 1.8649ms 536.2158 Ops/s 572.3970 Ops/s $\textbf{\color{#d91a1a}-6.32\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 9.7731ms 1.2009ms 832.7263 Ops/s 1.1609 KOps/s $\textbf{\color{#d91a1a}-28.27\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.0972ms 5.2233ms 191.4489 Ops/s 188.0567 Ops/s $\color{#35bf28}+1.80\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.2260ms 2.0121ms 496.9916 Ops/s 480.8718 Ops/s $\color{#35bf28}+3.35\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.2156ms 1.1114ms 899.7913 Ops/s 929.5675 Ops/s $\color{#d91a1a}-3.20\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.8171ms 36.0102ms 27.7699 Ops/s 27.7868 Ops/s $\color{#d91a1a}-0.06\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.4821ms 18.1534ms 55.0860 Ops/s 34.5149 Ops/s $\textbf{\color{#35bf28}+59.60\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.1356ms 37.2428ms 26.8509 Ops/s 26.6232 Ops/s $\color{#35bf28}+0.86\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.0422ms 18.4639ms 54.1599 Ops/s 52.7770 Ops/s $\color{#35bf28}+2.62\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.2419ms 38.6444ms 25.8770 Ops/s 24.9251 Ops/s $\color{#35bf28}+3.82\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.3357ms 20.0295ms 49.9262 Ops/s 48.9863 Ops/s $\color{#35bf28}+1.92\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8324ms 0.2232ms 4.4807 KOps/s 4.4060 KOps/s $\color{#35bf28}+1.70\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.8451ms 1.4128ms 707.8052 Ops/s 717.3935 Ops/s $\color{#d91a1a}-1.34\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.8151ms 2.3999ms 416.6852 Ops/s 417.2330 Ops/s $\color{#d91a1a}-0.13\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0611ms 2.9168ms 342.8372 Ops/s 342.3309 Ops/s $\color{#35bf28}+0.15\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2104ms 0.1357ms 7.3709 KOps/s 7.2326 KOps/s $\color{#35bf28}+1.91\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3867ms 0.1914ms 5.2259 KOps/s 5.1169 KOps/s $\color{#35bf28}+2.13\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.8608ms 1.7317ms 577.4634 Ops/s 570.5248 Ops/s $\color{#35bf28}+1.22\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5000ms 1.3098ms 763.4917 Ops/s 783.8456 Ops/s $\color{#d91a1a}-2.60\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3238ms 1.1321ms 883.2818 Ops/s 891.4953 Ops/s $\color{#d91a1a}-0.92\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7834ms 3.5574ms 281.1077 Ops/s 279.4649 Ops/s $\color{#35bf28}+0.59\%$
test_collector_stack_then_write[100-img_shape2-large_img] 9.2847ms 5.5213ms 181.1164 Ops/s 180.9270 Ops/s $\color{#35bf28}+0.10\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.9313ms 6.9172ms 144.5677 Ops/s 144.9592 Ops/s $\color{#d91a1a}-0.27\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4414ms 0.2754ms 3.6305 KOps/s 3.5584 KOps/s $\color{#35bf28}+2.03\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7005ms 1.5231ms 656.5521 Ops/s 667.3559 Ops/s $\color{#d91a1a}-1.62\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8857ms 2.5039ms 399.3696 Ops/s 396.6474 Ops/s $\color{#35bf28}+0.69\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.2281ms 3.1168ms 320.8424 Ops/s 318.2492 Ops/s $\color{#35bf28}+0.81\%$
test_collector_without_rb[100-img_shape0-atari] 33.1703ms 32.6993ms 30.5817 Ops/s 30.3589 Ops/s $\color{#35bf28}+0.73\%$
test_collector_without_rb[200-img_shape1-large_batch] 64.9409ms 64.4988ms 15.5042 Ops/s 15.3838 Ops/s $\color{#35bf28}+0.78\%$
test_collector_with_rb[100-img_shape0-atari] 38.2838ms 37.4843ms 26.6778 Ops/s 26.5899 Ops/s $\color{#35bf28}+0.33\%$
test_collector_with_rb[200-img_shape1-large_batch] 73.1142ms 72.6927ms 13.7565 Ops/s 13.6367 Ops/s $\color{#35bf28}+0.88\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 11, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 80.9962μs 79.7988μs 12.5315 KOps/s 12.8628 KOps/s $\color{#d91a1a}-2.58\%$
test_tensor_to_bytestream_speed[torch.save] 0.1379ms 0.1376ms 7.2690 KOps/s 7.3371 KOps/s $\color{#d91a1a}-0.93\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1087s 0.1083s 9.2319 Ops/s 9.5672 Ops/s $\color{#d91a1a}-3.50\%$
test_tensor_to_bytestream_speed[numpy] 2.6222μs 2.6113μs 382.9455 KOps/s 405.9524 KOps/s $\textbf{\color{#d91a1a}-5.67\%}$
test_tensor_to_bytestream_speed[safetensors] 38.6229μs 38.1324μs 26.2244 KOps/s 27.3521 KOps/s $\color{#d91a1a}-4.12\%$
test_simple 0.7776s 0.7771s 1.2869 Ops/s 1.2563 Ops/s $\color{#35bf28}+2.44\%$
test_transformed 1.3632s 1.3572s 0.7368 Ops/s 0.7246 Ops/s $\color{#35bf28}+1.68\%$
test_serial 2.3000s 2.2795s 0.4387 Ops/s 0.4408 Ops/s $\color{#d91a1a}-0.48\%$
test_parallel 1.8876s 1.7987s 0.5560 Ops/s 0.5458 Ops/s $\color{#35bf28}+1.87\%$
test_step_mdp_speed[True-True-True-True-True] 0.2654ms 41.2745μs 24.2280 KOps/s 24.1350 KOps/s $\color{#35bf28}+0.39\%$
test_step_mdp_speed[True-True-True-True-False] 51.6110μs 23.1642μs 43.1701 KOps/s 42.9315 KOps/s $\color{#35bf28}+0.56\%$
test_step_mdp_speed[True-True-True-False-True] 49.1710μs 23.2394μs 43.0303 KOps/s 43.0257 KOps/s $\color{#35bf28}+0.01\%$
test_step_mdp_speed[True-True-True-False-False] 41.8210μs 12.6194μs 79.2432 KOps/s 78.3085 KOps/s $\color{#35bf28}+1.19\%$
test_step_mdp_speed[True-True-False-True-True] 72.5420μs 44.0838μs 22.6841 KOps/s 22.7850 KOps/s $\color{#d91a1a}-0.44\%$
test_step_mdp_speed[True-True-False-True-False] 68.1320μs 25.0416μs 39.9336 KOps/s 39.6600 KOps/s $\color{#35bf28}+0.69\%$
test_step_mdp_speed[True-True-False-False-True] 63.3910μs 26.1351μs 38.2627 KOps/s 40.2109 KOps/s $\color{#d91a1a}-4.84\%$
test_step_mdp_speed[True-True-False-False-False] 50.5310μs 15.1514μs 66.0007 KOps/s 66.4466 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[True-False-True-True-True] 76.7410μs 46.6649μs 21.4294 KOps/s 21.6513 KOps/s $\color{#d91a1a}-1.02\%$
test_step_mdp_speed[True-False-True-True-False] 55.8810μs 28.1215μs 35.5600 KOps/s 35.6833 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[True-False-True-False-True] 53.5710μs 25.4003μs 39.3697 KOps/s 39.2566 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-False-True-False-False] 46.5310μs 15.4520μs 64.7164 KOps/s 65.6443 KOps/s $\color{#d91a1a}-1.41\%$
test_step_mdp_speed[True-False-False-True-True] 83.3210μs 49.3939μs 20.2454 KOps/s 20.6586 KOps/s $\color{#d91a1a}-2.00\%$
test_step_mdp_speed[True-False-False-True-False] 69.6110μs 30.6924μs 32.5814 KOps/s 32.6067 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[True-False-False-False-True] 61.6520μs 28.2650μs 35.3794 KOps/s 36.2785 KOps/s $\color{#d91a1a}-2.48\%$
test_step_mdp_speed[True-False-False-False-False] 46.9710μs 17.7468μs 56.3481 KOps/s 56.9817 KOps/s $\color{#d91a1a}-1.11\%$
test_step_mdp_speed[False-True-True-True-True] 75.9610μs 47.0345μs 21.2610 KOps/s 21.8441 KOps/s $\color{#d91a1a}-2.67\%$
test_step_mdp_speed[False-True-True-True-False] 80.4320μs 28.1440μs 35.5315 KOps/s 36.0217 KOps/s $\color{#d91a1a}-1.36\%$
test_step_mdp_speed[False-True-True-False-True] 2.6270ms 29.8376μs 33.5147 KOps/s 33.8274 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[False-True-True-False-False] 46.4110μs 16.9926μs 58.8492 KOps/s 58.0578 KOps/s $\color{#35bf28}+1.36\%$
test_step_mdp_speed[False-True-False-True-True] 94.8720μs 48.6827μs 20.5412 KOps/s 20.7575 KOps/s $\color{#d91a1a}-1.04\%$
test_step_mdp_speed[False-True-False-True-False] 80.5020μs 31.0546μs 32.2013 KOps/s 32.6941 KOps/s $\color{#d91a1a}-1.51\%$
test_step_mdp_speed[False-True-False-False-True] 59.7710μs 31.7601μs 31.4860 KOps/s 31.9275 KOps/s $\color{#d91a1a}-1.38\%$
test_step_mdp_speed[False-True-False-False-False] 48.9000μs 19.2176μs 52.0356 KOps/s 51.5094 KOps/s $\color{#35bf28}+1.02\%$
test_step_mdp_speed[False-False-True-True-True] 0.1002ms 50.6811μs 19.7312 KOps/s 19.4444 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[False-False-True-True-False] 65.5610μs 33.8314μs 29.5584 KOps/s 29.6473 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[False-False-True-False-True] 59.5710μs 31.5777μs 31.6680 KOps/s 30.9797 KOps/s $\color{#35bf28}+2.22\%$
test_step_mdp_speed[False-False-True-False-False] 50.8910μs 19.2515μs 51.9440 KOps/s 50.5701 KOps/s $\color{#35bf28}+2.72\%$
test_step_mdp_speed[False-False-False-True-True] 86.0220μs 53.3186μs 18.7552 KOps/s 18.4346 KOps/s $\color{#35bf28}+1.74\%$
test_step_mdp_speed[False-False-False-True-False] 63.2220μs 35.7651μs 27.9602 KOps/s 28.1961 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[False-False-False-False-True] 65.9420μs 32.9061μs 30.3895 KOps/s 29.7377 KOps/s $\color{#35bf28}+2.19\%$
test_step_mdp_speed[False-False-False-False-False] 47.0610μs 21.8111μs 45.8482 KOps/s 45.8438 KOps/s $+0.01\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7131s 0.7060s 1.4165 Ops/s 1.3706 Ops/s $\color{#35bf28}+3.35\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6878s 0.5950s 1.6806 Ops/s 1.6689 Ops/s $\color{#35bf28}+0.70\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6718s 1.5960s 0.6266 Ops/s 0.6216 Ops/s $\color{#35bf28}+0.80\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4590s 1.3776s 0.7259 Ops/s 0.7121 Ops/s $\color{#35bf28}+1.93\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9124s 1.8318s 0.5459 Ops/s 0.5382 Ops/s $\color{#35bf28}+1.43\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.6931s 1.6179s 0.6181 Ops/s 0.6129 Ops/s $\color{#35bf28}+0.85\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6985s 4.5568s 0.2195 Ops/s 0.2179 Ops/s $\color{#35bf28}+0.72\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5568s 4.4213s 0.2262 Ops/s 0.2283 Ops/s $\color{#d91a1a}-0.92\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.8846s 1.8058s 0.5538 Ops/s 0.5475 Ops/s $\color{#35bf28}+1.14\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6168s 1.5353s 0.6513 Ops/s 0.6449 Ops/s $\color{#35bf28}+1.00\%$
test_values[generalized_advantage_estimate-True-True] 20.8797ms 20.5369ms 48.6929 Ops/s 50.5958 Ops/s $\color{#d91a1a}-3.76\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1292s 3.5026ms 285.4988 Ops/s 264.8948 Ops/s $\textbf{\color{#35bf28}+7.78\%}$
test_values[td0_return_estimate-False-False] 0.1046ms 83.7355μs 11.9424 KOps/s 12.3216 KOps/s $\color{#d91a1a}-3.08\%$
test_values[td1_return_estimate-False-False] 48.5609ms 48.2539ms 20.7237 Ops/s 21.3266 Ops/s $\color{#d91a1a}-2.83\%$
test_values[vec_td1_return_estimate-False-False] 1.3919ms 1.0892ms 918.0969 Ops/s 932.0045 Ops/s $\color{#d91a1a}-1.49\%$
test_values[td_lambda_return_estimate-True-False] 79.5094ms 79.0015ms 12.6580 Ops/s 13.0594 Ops/s $\color{#d91a1a}-3.07\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2893ms 1.0839ms 922.5741 Ops/s 937.1451 Ops/s $\color{#d91a1a}-1.55\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.6724ms 20.4391ms 48.9259 Ops/s 48.8468 Ops/s $\color{#35bf28}+0.16\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0427ms 0.7570ms 1.3210 KOps/s 1.3391 KOps/s $\color{#d91a1a}-1.35\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7281ms 0.6755ms 1.4804 KOps/s 1.5090 KOps/s $\color{#d91a1a}-1.89\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5518ms 1.4900ms 671.1225 Ops/s 678.3419 Ops/s $\color{#d91a1a}-1.06\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7458ms 0.6916ms 1.4459 KOps/s 1.4716 KOps/s $\color{#d91a1a}-1.74\%$
test_dqn_speed[False-None] 1.6067ms 1.5086ms 662.8784 Ops/s 667.0216 Ops/s $\color{#d91a1a}-0.62\%$
test_dqn_speed[False-backward] 2.3043ms 2.1599ms 462.9767 Ops/s 468.3936 Ops/s $\color{#d91a1a}-1.16\%$
test_dqn_speed[True-None] 1.0334ms 0.5422ms 1.8443 KOps/s 1.8238 KOps/s $\color{#35bf28}+1.12\%$
test_dqn_speed[True-backward] 1.1599ms 1.0864ms 920.5027 Ops/s 851.2332 Ops/s $\textbf{\color{#35bf28}+8.14\%}$
test_dqn_speed[reduce-overhead-None] 0.7166ms 0.5958ms 1.6785 KOps/s 1.7360 KOps/s $\color{#d91a1a}-3.31\%$
test_ddpg_speed[False-None] 3.3292ms 2.8916ms 345.8311 Ops/s 356.4725 Ops/s $\color{#d91a1a}-2.99\%$
test_ddpg_speed[False-backward] 4.6051ms 4.1578ms 240.5112 Ops/s 239.2287 Ops/s $\color{#35bf28}+0.54\%$
test_ddpg_speed[True-None] 1.3595ms 1.2725ms 785.8741 Ops/s 747.7032 Ops/s $\textbf{\color{#35bf28}+5.11\%}$
test_ddpg_speed[True-backward] 2.4059ms 2.3103ms 432.8519 Ops/s 407.0121 Ops/s $\textbf{\color{#35bf28}+6.35\%}$
test_ddpg_speed[reduce-overhead-None] 1.4371ms 1.3082ms 764.4183 Ops/s 740.8437 Ops/s $\color{#35bf28}+3.18\%$
test_sac_speed[False-None] 8.7506ms 8.2207ms 121.6438 Ops/s 120.1068 Ops/s $\color{#35bf28}+1.28\%$
test_sac_speed[False-backward] 11.9240ms 11.4500ms 87.3362 Ops/s 87.3086 Ops/s $\color{#35bf28}+0.03\%$
test_sac_speed[True-None] 1.8594ms 1.7593ms 568.4135 Ops/s 565.7086 Ops/s $\color{#35bf28}+0.48\%$
test_sac_speed[True-backward] 3.5728ms 3.4917ms 286.3913 Ops/s 301.1933 Ops/s $\color{#d91a1a}-4.91\%$
test_sac_speed[reduce-overhead-None] 19.1020ms 10.8722ms 91.9781 Ops/s 82.6668 Ops/s $\textbf{\color{#35bf28}+11.26\%}$
test_redq_deprec_speed[False-None] 9.9300ms 9.1930ms 108.7788 Ops/s 108.1757 Ops/s $\color{#35bf28}+0.56\%$
test_redq_deprec_speed[False-backward] 13.0982ms 12.6018ms 79.3534 Ops/s 81.3155 Ops/s $\color{#d91a1a}-2.41\%$
test_redq_deprec_speed[True-None] 2.5689ms 2.4671ms 405.3297 Ops/s 392.5544 Ops/s $\color{#35bf28}+3.25\%$
test_redq_deprec_speed[True-backward] 4.6567ms 4.2579ms 234.8596 Ops/s 237.8893 Ops/s $\color{#d91a1a}-1.27\%$
test_redq_deprec_speed[reduce-overhead-None] 15.8792ms 9.6706ms 103.4067 Ops/s 102.6303 Ops/s $\color{#35bf28}+0.76\%$
test_td3_speed[False-None] 8.1415ms 8.0648ms 123.9949 Ops/s 124.2165 Ops/s $\color{#d91a1a}-0.18\%$
test_td3_speed[False-backward] 11.1713ms 10.6992ms 93.4646 Ops/s 93.8118 Ops/s $\color{#d91a1a}-0.37\%$
test_td3_speed[True-None] 1.6029ms 1.5760ms 634.5157 Ops/s 632.6186 Ops/s $\color{#35bf28}+0.30\%$
test_td3_speed[True-backward] 3.7754ms 3.1949ms 312.9961 Ops/s 332.5043 Ops/s $\textbf{\color{#d91a1a}-5.87\%}$
test_td3_speed[reduce-overhead-None] 45.9354ms 23.6880ms 42.2156 Ops/s 41.5103 Ops/s $\color{#35bf28}+1.70\%$
test_cql_speed[False-None] 17.3222ms 17.0291ms 58.7231 Ops/s 58.7275 Ops/s $-0.01\%$
test_cql_speed[False-backward] 23.0364ms 22.5948ms 44.2580 Ops/s 44.9955 Ops/s $\color{#d91a1a}-1.64\%$
test_cql_speed[True-None] 3.2626ms 3.1431ms 318.1567 Ops/s 317.3105 Ops/s $\color{#35bf28}+0.27\%$
test_cql_speed[True-backward] 6.2535ms 5.4069ms 184.9483 Ops/s 193.8192 Ops/s $\color{#d91a1a}-4.58\%$
test_cql_speed[reduce-overhead-None] 18.7856ms 11.7962ms 84.7731 Ops/s 85.4663 Ops/s $\color{#d91a1a}-0.81\%$
test_a2c_speed[False-None] 4.0152ms 3.1987ms 312.6240 Ops/s 314.8445 Ops/s $\color{#d91a1a}-0.71\%$
test_a2c_speed[False-backward] 6.9812ms 6.4711ms 154.5326 Ops/s 158.8281 Ops/s $\color{#d91a1a}-2.70\%$
test_a2c_speed[True-None] 1.3922ms 1.2932ms 773.2598 Ops/s 763.4503 Ops/s $\color{#35bf28}+1.28\%$
test_a2c_speed[True-backward] 3.1917ms 3.0632ms 326.4545 Ops/s 348.9079 Ops/s $\textbf{\color{#d91a1a}-6.44\%}$
test_a2c_speed[reduce-overhead-None] 1.0465ms 0.9580ms 1.0438 KOps/s 1.0622 KOps/s $\color{#d91a1a}-1.73\%$
test_ppo_speed[False-None] 3.9453ms 3.8262ms 261.3535 Ops/s 265.7968 Ops/s $\color{#d91a1a}-1.67\%$
test_ppo_speed[False-backward] 7.6870ms 7.2231ms 138.4442 Ops/s 145.9145 Ops/s $\textbf{\color{#d91a1a}-5.12\%}$
test_ppo_speed[True-None] 1.5046ms 1.3861ms 721.4517 Ops/s 720.8689 Ops/s $\color{#35bf28}+0.08\%$
test_ppo_speed[True-backward] 3.2972ms 3.1871ms 313.7680 Ops/s 316.0550 Ops/s $\color{#d91a1a}-0.72\%$
test_ppo_speed[reduce-overhead-None] 1.0948ms 1.0241ms 976.4241 Ops/s 967.1321 Ops/s $\color{#35bf28}+0.96\%$
test_reinforce_speed[False-None] 2.4053ms 2.2573ms 442.9991 Ops/s 444.9707 Ops/s $\color{#d91a1a}-0.44\%$
test_reinforce_speed[False-backward] 3.6168ms 3.4276ms 291.7470 Ops/s 305.4343 Ops/s $\color{#d91a1a}-4.48\%$
test_reinforce_speed[True-None] 1.3475ms 1.2265ms 815.3242 Ops/s 813.2097 Ops/s $\color{#35bf28}+0.26\%$
test_reinforce_speed[True-backward] 3.0420ms 2.9984ms 333.5156 Ops/s 347.0860 Ops/s $\color{#d91a1a}-3.91\%$
test_reinforce_speed[reduce-overhead-None] 17.0110ms 9.3682ms 106.7442 Ops/s 105.2420 Ops/s $\color{#35bf28}+1.43\%$
test_iql_speed[False-None] 9.9608ms 9.3236ms 107.2553 Ops/s 107.5384 Ops/s $\color{#d91a1a}-0.26\%$
test_iql_speed[False-backward] 13.8110ms 13.3145ms 75.1060 Ops/s 77.1118 Ops/s $\color{#d91a1a}-2.60\%$
test_iql_speed[True-None] 2.2295ms 2.1175ms 472.2558 Ops/s 456.2000 Ops/s $\color{#35bf28}+3.52\%$
test_iql_speed[True-backward] 5.1698ms 4.7577ms 210.1874 Ops/s 214.7198 Ops/s $\color{#d91a1a}-2.11\%$
test_iql_speed[reduce-overhead-None] 17.5852ms 10.3602ms 96.5231 Ops/s 75.8871 Ops/s $\textbf{\color{#35bf28}+27.19\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3127ms 5.9191ms 168.9443 Ops/s 168.9360 Ops/s $+0.00\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.1185ms 0.2838ms 3.5242 KOps/s 3.5095 KOps/s $\color{#35bf28}+0.42\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6068ms 0.2664ms 3.7539 KOps/s 3.7098 KOps/s $\color{#35bf28}+1.19\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0680ms 5.7412ms 174.1797 Ops/s 173.1501 Ops/s $\color{#35bf28}+0.59\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8547ms 0.2749ms 3.6373 KOps/s 2.9688 KOps/s $\textbf{\color{#35bf28}+22.52\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4776ms 0.2640ms 3.7877 KOps/s 3.4228 KOps/s $\textbf{\color{#35bf28}+10.66\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4874ms 1.2506ms 799.5890 Ops/s 711.6310 Ops/s $\textbf{\color{#35bf28}+12.36\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6196ms 1.1756ms 850.6469 Ops/s 745.2718 Ops/s $\textbf{\color{#35bf28}+14.14\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2433ms 6.0210ms 166.0843 Ops/s 167.0920 Ops/s $\color{#d91a1a}-0.60\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8906ms 0.4958ms 2.0171 KOps/s 2.2062 KOps/s $\textbf{\color{#d91a1a}-8.57\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7591ms 0.4672ms 2.1403 KOps/s 2.1037 KOps/s $\color{#35bf28}+1.74\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0132ms 5.8529ms 170.8558 Ops/s 171.2502 Ops/s $\color{#d91a1a}-0.23\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7064ms 0.2801ms 3.5705 KOps/s 2.7979 KOps/s $\textbf{\color{#35bf28}+27.61\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4555ms 0.2609ms 3.8330 KOps/s 2.7435 KOps/s $\textbf{\color{#35bf28}+39.71\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0824ms 5.8436ms 171.1261 Ops/s 172.0095 Ops/s $\color{#d91a1a}-0.51\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7799ms 0.2781ms 3.5963 KOps/s 3.4582 KOps/s $\color{#35bf28}+3.99\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5870ms 0.3402ms 2.9396 KOps/s 3.6793 KOps/s $\textbf{\color{#d91a1a}-20.10\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2435ms 6.0451ms 165.4223 Ops/s 168.2628 Ops/s $\color{#d91a1a}-1.69\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.8371ms 0.4661ms 2.1456 KOps/s 2.2891 KOps/s $\textbf{\color{#d91a1a}-6.27\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6696ms 0.4621ms 2.1640 KOps/s 2.3832 KOps/s $\textbf{\color{#d91a1a}-9.20\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.6004s 16.9612ms 58.9583 Ops/s 194.4398 Ops/s $\textbf{\color{#d91a1a}-69.68\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 7.2910ms 1.9720ms 507.1078 Ops/s 491.2396 Ops/s $\color{#35bf28}+3.23\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 11.4617ms 1.3177ms 758.8810 Ops/s 1.0421 KOps/s $\textbf{\color{#d91a1a}-27.18\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.5239ms 5.0525ms 197.9235 Ops/s 189.8898 Ops/s $\color{#35bf28}+4.23\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.3924ms 1.9247ms 519.5596 Ops/s 506.2616 Ops/s $\color{#35bf28}+2.63\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.0574ms 0.9235ms 1.0828 KOps/s 719.6312 Ops/s $\textbf{\color{#35bf28}+50.46\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.1950ms 5.2368ms 190.9563 Ops/s 50.3611 Ops/s $\textbf{\color{#35bf28}+279.17\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.3469ms 2.0927ms 477.8529 Ops/s 497.3521 Ops/s $\color{#d91a1a}-3.92\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.7254ms 1.1123ms 899.0740 Ops/s 880.1495 Ops/s $\color{#35bf28}+2.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.7350ms 35.7918ms 27.9393 Ops/s 27.3413 Ops/s $\color{#35bf28}+2.19\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.9297ms 17.9628ms 55.6707 Ops/s 55.8123 Ops/s $\color{#d91a1a}-0.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.1236ms 37.1415ms 26.9241 Ops/s 26.3661 Ops/s $\color{#35bf28}+2.12\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.9656ms 18.3315ms 54.5509 Ops/s 54.0886 Ops/s $\color{#35bf28}+0.85\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.3842ms 38.5477ms 25.9419 Ops/s 25.4079 Ops/s $\color{#35bf28}+2.10\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.0373ms 19.7685ms 50.5855 Ops/s 49.9908 Ops/s $\color{#35bf28}+1.19\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8355ms 0.2155ms 4.6396 KOps/s 4.4669 KOps/s $\color{#35bf28}+3.87\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6962ms 1.3543ms 738.4145 Ops/s 717.4504 Ops/s $\color{#35bf28}+2.92\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.5285ms 2.3027ms 434.2785 Ops/s 427.6426 Ops/s $\color{#35bf28}+1.55\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0707ms 2.8837ms 346.7789 Ops/s 343.9897 Ops/s $\color{#35bf28}+0.81\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2444ms 0.1601ms 6.2450 KOps/s 6.0901 KOps/s $\color{#35bf28}+2.54\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3887ms 0.2105ms 4.7507 KOps/s 4.4013 KOps/s $\textbf{\color{#35bf28}+7.94\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.8819ms 1.7745ms 563.5330 Ops/s 554.1026 Ops/s $\color{#35bf28}+1.70\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4143ms 1.2962ms 771.5086 Ops/s 717.2851 Ops/s $\textbf{\color{#35bf28}+7.56\%}$
test_collector_stack_then_write[50-img_shape0-small] 1.3100ms 1.1467ms 872.0398 Ops/s 887.8851 Ops/s $\color{#d91a1a}-1.78\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.9023ms 3.7581ms 266.0918 Ops/s 276.9431 Ops/s $\color{#d91a1a}-3.92\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.8890ms 5.7650ms 173.4619 Ops/s 171.4946 Ops/s $\color{#35bf28}+1.15\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.4696ms 7.0230ms 142.3893 Ops/s 138.9422 Ops/s $\color{#35bf28}+2.48\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4503ms 0.2766ms 3.6156 KOps/s 3.5368 KOps/s $\color{#35bf28}+2.23\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6061ms 1.4728ms 678.9834 Ops/s 651.2050 Ops/s $\color{#35bf28}+4.27\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6114ms 2.4040ms 415.9810 Ops/s 405.4746 Ops/s $\color{#35bf28}+2.59\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.2087ms 3.0863ms 324.0141 Ops/s 315.8687 Ops/s $\color{#35bf28}+2.58\%$
test_collector_without_rb[100-img_shape0-atari] 33.2608ms 32.5819ms 30.6919 Ops/s 30.4235 Ops/s $\color{#35bf28}+0.88\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.0642ms 64.3300ms 15.5448 Ops/s 15.5533 Ops/s $\color{#d91a1a}-0.05\%$
test_collector_with_rb[100-img_shape0-atari] 38.1524ms 37.1866ms 26.8914 Ops/s 26.7455 Ops/s $\color{#35bf28}+0.55\%$
test_collector_with_rb[200-img_shape1-large_batch] 74.4536ms 72.5292ms 13.7876 Ops/s 13.4625 Ops/s $\color{#35bf28}+2.41\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 57.2041ms 54.8514ms 18.2311 Ops/s 18.0642 Ops/s $\color{#35bf28}+0.92\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1085s 0.1083s 9.2311 Ops/s 9.0467 Ops/s $\color{#35bf28}+2.04\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 56.7732ms 56.5007ms 17.6989 Ops/s 17.3229 Ops/s $\color{#35bf28}+2.17\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1175s 0.1150s 8.6968 Ops/s 8.7615 Ops/s $\color{#d91a1a}-0.74\%$

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 21, 2026
Adds MonarchTransport for distributed inference on GPU clusters using
Monarch's actor model and RDMA channels. Monarch is imported lazily
at instantiation time.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 8ea2d20
Pull-Request: #3496
Co-authored-by: Cursor <cursoragent@cursor.com>
@vmoens vmoens merged commit e65ee9c into gh/vmoens/238/base Feb 21, 2026
114 of 116 checks passed
@vmoens vmoens deleted the gh/vmoens/238/head branch February 21, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Feature New feature Modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant