Skip to content

[Feature] Auto-batching inference server: core server and transport protocol#3492

Merged
vmoens merged 2 commits intogh/vmoens/234/basefrom
gh/vmoens/234/head
Feb 21, 2026
Merged

[Feature] Auto-batching inference server: core server and transport protocol#3492
vmoens merged 2 commits intogh/vmoens/234/basefrom
gh/vmoens/234/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Feb 11, 2026

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3492

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 579feae with merge base 266e4aa (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 11, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}26$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.8924μs 80.4380μs 12.4319 KOps/s 12.2831 KOps/s $\color{#35bf28}+1.21\%$
test_tensor_to_bytestream_speed[torch.save] 0.1392ms 0.1387ms 7.2121 KOps/s 7.0965 KOps/s $\color{#35bf28}+1.63\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1086s 0.1079s 9.2639 Ops/s 8.9062 Ops/s $\color{#35bf28}+4.02\%$
test_tensor_to_bytestream_speed[numpy] 2.5989μs 2.5914μs 385.8910 KOps/s 357.8120 KOps/s $\textbf{\color{#35bf28}+7.85\%}$
test_tensor_to_bytestream_speed[safetensors] 40.3321μs 39.3630μs 25.4046 KOps/s 27.2855 KOps/s $\textbf{\color{#d91a1a}-6.89\%}$
test_simple 0.5500s 0.5464s 1.8300 Ops/s 1.7205 Ops/s $\textbf{\color{#35bf28}+6.37\%}$
test_transformed 1.0849s 1.0806s 0.9254 Ops/s 0.8900 Ops/s $\color{#35bf28}+3.98\%$
test_serial 1.6570s 1.6507s 0.6058 Ops/s 0.5889 Ops/s $\color{#35bf28}+2.87\%$
test_parallel 1.1364s 1.0359s 0.9653 Ops/s 0.9532 Ops/s $\color{#35bf28}+1.27\%$
test_step_mdp_speed[True-True-True-True-True] 0.2101ms 41.6021μs 24.0373 KOps/s 23.7613 KOps/s $\color{#35bf28}+1.16\%$
test_step_mdp_speed[True-True-True-True-False] 47.7610μs 23.4113μs 42.7144 KOps/s 42.2510 KOps/s $\color{#35bf28}+1.10\%$
test_step_mdp_speed[True-True-True-False-True] 55.1710μs 23.2444μs 43.0211 KOps/s 41.7862 KOps/s $\color{#35bf28}+2.96\%$
test_step_mdp_speed[True-True-True-False-False] 43.9410μs 12.8680μs 77.7123 KOps/s 75.5689 KOps/s $\color{#35bf28}+2.84\%$
test_step_mdp_speed[True-True-False-True-True] 72.5510μs 44.4300μs 22.5073 KOps/s 22.2516 KOps/s $\color{#35bf28}+1.15\%$
test_step_mdp_speed[True-True-False-True-False] 0.1147ms 25.3247μs 39.4872 KOps/s 38.3410 KOps/s $\color{#35bf28}+2.99\%$
test_step_mdp_speed[True-True-False-False-True] 52.9210μs 25.7775μs 38.7936 KOps/s 37.7001 KOps/s $\color{#35bf28}+2.90\%$
test_step_mdp_speed[True-True-False-False-False] 48.9110μs 15.3109μs 65.3128 KOps/s 62.9313 KOps/s $\color{#35bf28}+3.78\%$
test_step_mdp_speed[True-False-True-True-True] 74.3720μs 48.0153μs 20.8267 KOps/s 20.6905 KOps/s $\color{#35bf28}+0.66\%$
test_step_mdp_speed[True-False-True-True-False] 55.7410μs 28.6157μs 34.9459 KOps/s 34.8017 KOps/s $\color{#35bf28}+0.41\%$
test_step_mdp_speed[True-False-True-False-True] 51.2610μs 26.1956μs 38.1744 KOps/s 37.6235 KOps/s $\color{#35bf28}+1.46\%$
test_step_mdp_speed[True-False-True-False-False] 42.1410μs 15.6118μs 64.0542 KOps/s 63.0739 KOps/s $\color{#35bf28}+1.55\%$
test_step_mdp_speed[True-False-False-True-True] 76.0320μs 49.2785μs 20.2928 KOps/s 19.8054 KOps/s $\color{#35bf28}+2.46\%$
test_step_mdp_speed[True-False-False-True-False] 60.3910μs 30.7428μs 32.5280 KOps/s 31.6628 KOps/s $\color{#35bf28}+2.73\%$
test_step_mdp_speed[True-False-False-False-True] 57.3710μs 28.7861μs 34.7390 KOps/s 34.4807 KOps/s $\color{#35bf28}+0.75\%$
test_step_mdp_speed[True-False-False-False-False] 51.8810μs 18.0246μs 55.4798 KOps/s 54.8598 KOps/s $\color{#35bf28}+1.13\%$
test_step_mdp_speed[False-True-True-True-True] 77.8420μs 47.7706μs 20.9334 KOps/s 20.9198 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[False-True-True-True-False] 61.8510μs 28.6130μs 34.9492 KOps/s 34.1694 KOps/s $\color{#35bf28}+2.28\%$
test_step_mdp_speed[False-True-True-False-True] 2.4866ms 30.3657μs 32.9319 KOps/s 32.2445 KOps/s $\color{#35bf28}+2.13\%$
test_step_mdp_speed[False-True-True-False-False] 51.5210μs 17.3189μs 57.7404 KOps/s 56.8693 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[False-True-False-True-True] 78.0310μs 50.2322μs 19.9075 KOps/s 20.1057 KOps/s $\color{#d91a1a}-0.99\%$
test_step_mdp_speed[False-True-False-True-False] 54.6420μs 31.0877μs 32.1671 KOps/s 31.9909 KOps/s $\color{#35bf28}+0.55\%$
test_step_mdp_speed[False-True-False-False-True] 68.8210μs 32.2484μs 31.0093 KOps/s 30.6305 KOps/s $\color{#35bf28}+1.24\%$
test_step_mdp_speed[False-True-False-False-False] 48.6310μs 19.8822μs 50.2962 KOps/s 48.7618 KOps/s $\color{#35bf28}+3.15\%$
test_step_mdp_speed[False-False-True-True-True] 89.0120μs 52.3599μs 19.0986 KOps/s 18.7251 KOps/s $\color{#35bf28}+1.99\%$
test_step_mdp_speed[False-False-True-True-False] 62.5110μs 33.6936μs 29.6793 KOps/s 29.3437 KOps/s $\color{#35bf28}+1.14\%$
test_step_mdp_speed[False-False-True-False-True] 69.3710μs 32.2475μs 31.0102 KOps/s 30.6226 KOps/s $\color{#35bf28}+1.27\%$
test_step_mdp_speed[False-False-True-False-False] 48.1110μs 19.5931μs 51.0384 KOps/s 49.2031 KOps/s $\color{#35bf28}+3.73\%$
test_step_mdp_speed[False-False-False-True-True] 0.1112ms 54.3068μs 18.4139 KOps/s 18.0640 KOps/s $\color{#35bf28}+1.94\%$
test_step_mdp_speed[False-False-False-True-False] 65.6920μs 35.8603μs 27.8860 KOps/s 27.3679 KOps/s $\color{#35bf28}+1.89\%$
test_step_mdp_speed[False-False-False-False-True] 64.1420μs 34.3117μs 29.1446 KOps/s 28.8537 KOps/s $\color{#35bf28}+1.01\%$
test_step_mdp_speed[False-False-False-False-False] 47.8010μs 21.9875μs 45.4804 KOps/s 44.2210 KOps/s $\color{#35bf28}+2.85\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7186s 0.7135s 1.4015 Ops/s 1.3427 Ops/s $\color{#35bf28}+4.38\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6990s 0.6033s 1.6575 Ops/s 1.6328 Ops/s $\color{#35bf28}+1.51\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7156s 1.6383s 0.6104 Ops/s 0.6071 Ops/s $\color{#35bf28}+0.53\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4913s 1.4066s 0.7109 Ops/s 0.7070 Ops/s $\color{#35bf28}+0.55\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9539s 1.8757s 0.5331 Ops/s 0.5260 Ops/s $\color{#35bf28}+1.36\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7330s 1.6508s 0.6058 Ops/s 0.6016 Ops/s $\color{#35bf28}+0.69\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6917s 4.6116s 0.2168 Ops/s 0.2126 Ops/s $\color{#35bf28}+1.97\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5246s 4.4301s 0.2257 Ops/s 0.2252 Ops/s $\color{#35bf28}+0.23\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9529s 1.8845s 0.5306 Ops/s 0.5354 Ops/s $\color{#d91a1a}-0.89\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6424s 1.5652s 0.6389 Ops/s 0.6357 Ops/s $\color{#35bf28}+0.50\%$
test_values[generalized_advantage_estimate-True-True] 10.5742ms 10.4292ms 95.8844 Ops/s 91.6412 Ops/s $\color{#35bf28}+4.63\%$
test_values[vec_generalized_advantage_estimate-True-True] 20.6376ms 15.0963ms 66.2415 Ops/s 56.9148 Ops/s $\textbf{\color{#35bf28}+16.39\%}$
test_values[td0_return_estimate-False-False] 0.2210ms 0.1334ms 7.4958 KOps/s 7.4548 KOps/s $\color{#35bf28}+0.55\%$
test_values[td1_return_estimate-False-False] 29.3446ms 28.7984ms 34.7242 Ops/s 33.7220 Ops/s $\color{#35bf28}+2.97\%$
test_values[vec_td1_return_estimate-False-False] 18.0053ms 15.9160ms 62.8297 Ops/s 57.0503 Ops/s $\textbf{\color{#35bf28}+10.13\%}$
test_values[td_lambda_return_estimate-True-False] 43.2570ms 42.2563ms 23.6651 Ops/s 22.6678 Ops/s $\color{#35bf28}+4.40\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.5307ms 16.2071ms 61.7012 Ops/s 56.9596 Ops/s $\textbf{\color{#35bf28}+8.32\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.5254ms 9.2329ms 108.3086 Ops/s 103.6674 Ops/s $\color{#35bf28}+4.48\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7061ms 1.5392ms 649.6805 Ops/s 658.2270 Ops/s $\color{#d91a1a}-1.30\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5227ms 0.4332ms 2.3086 KOps/s 2.3440 KOps/s $\color{#d91a1a}-1.51\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 29.8249ms 29.3740ms 34.0437 Ops/s 29.0303 Ops/s $\textbf{\color{#35bf28}+17.27\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8879ms 1.7154ms 582.9441 Ops/s 580.7818 Ops/s $\color{#35bf28}+0.37\%$
test_dqn_speed[False-None] 1.5714ms 1.3911ms 718.8685 Ops/s 708.7807 Ops/s $\color{#35bf28}+1.42\%$
test_dqn_speed[False-backward] 2.4013ms 1.9245ms 519.6283 Ops/s 524.6353 Ops/s $\color{#d91a1a}-0.95\%$
test_dqn_speed[True-None] 0.7789ms 0.5441ms 1.8380 KOps/s 1.7399 KOps/s $\textbf{\color{#35bf28}+5.64\%}$
test_dqn_speed[True-backward] 1.0363ms 0.9936ms 1.0065 KOps/s 916.2394 Ops/s $\textbf{\color{#35bf28}+9.85\%}$
test_dqn_speed[reduce-overhead-None] 0.6688ms 0.5223ms 1.9148 KOps/s 1.7666 KOps/s $\textbf{\color{#35bf28}+8.38\%}$
test_ddpg_speed[False-None] 3.1458ms 2.8123ms 355.5820 Ops/s 352.4268 Ops/s $\color{#35bf28}+0.90\%$
test_ddpg_speed[False-backward] 4.0620ms 3.9885ms 250.7196 Ops/s 248.7865 Ops/s $\color{#35bf28}+0.78\%$
test_ddpg_speed[True-None] 1.4942ms 1.3825ms 723.3080 Ops/s 712.9007 Ops/s $\color{#35bf28}+1.46\%$
test_ddpg_speed[True-backward] 2.4102ms 2.3537ms 424.8598 Ops/s 411.3847 Ops/s $\color{#35bf28}+3.28\%$
test_ddpg_speed[reduce-overhead-None] 1.5342ms 1.3628ms 733.7638 Ops/s 711.5093 Ops/s $\color{#35bf28}+3.13\%$
test_sac_speed[False-None] 8.4712ms 7.9035ms 126.5269 Ops/s 126.3227 Ops/s $\color{#35bf28}+0.16\%$
test_sac_speed[False-backward] 11.6342ms 11.0808ms 90.2465 Ops/s 89.8587 Ops/s $\color{#35bf28}+0.43\%$
test_sac_speed[True-None] 2.1825ms 2.0817ms 480.3716 Ops/s 456.4786 Ops/s $\textbf{\color{#35bf28}+5.23\%}$
test_sac_speed[True-backward] 4.1207ms 3.9042ms 256.1322 Ops/s 218.4304 Ops/s $\textbf{\color{#35bf28}+17.26\%}$
test_sac_speed[reduce-overhead-None] 2.2678ms 2.0698ms 483.1482 Ops/s 472.4771 Ops/s $\color{#35bf28}+2.26\%$
test_redq_speed[False-None] 10.7864ms 10.2434ms 97.6237 Ops/s 96.8932 Ops/s $\color{#35bf28}+0.75\%$
test_redq_speed[False-backward] 18.1398ms 17.5770ms 56.8926 Ops/s 57.3530 Ops/s $\color{#d91a1a}-0.80\%$
test_redq_speed[True-None] 4.3973ms 4.1910ms 238.6081 Ops/s 237.0749 Ops/s $\color{#35bf28}+0.65\%$
test_redq_speed[True-backward] 9.7470ms 9.3663ms 106.7653 Ops/s 107.8835 Ops/s $\color{#d91a1a}-1.04\%$
test_redq_speed[reduce-overhead-None] 4.5120ms 4.1684ms 239.9011 Ops/s 232.7917 Ops/s $\color{#35bf28}+3.05\%$
test_redq_deprec_speed[False-None] 11.9131ms 10.8946ms 91.7888 Ops/s 90.0798 Ops/s $\color{#35bf28}+1.90\%$
test_redq_deprec_speed[False-backward] 16.2153ms 15.7020ms 63.6863 Ops/s 62.7681 Ops/s $\color{#35bf28}+1.46\%$
test_redq_deprec_speed[True-None] 3.6670ms 3.5101ms 284.8937 Ops/s 283.5242 Ops/s $\color{#35bf28}+0.48\%$
test_redq_deprec_speed[True-backward] 7.5461ms 7.3597ms 135.8749 Ops/s 128.8660 Ops/s $\textbf{\color{#35bf28}+5.44\%}$
test_redq_deprec_speed[reduce-overhead-None] 4.2422ms 3.4548ms 289.4563 Ops/s 271.2660 Ops/s $\textbf{\color{#35bf28}+6.71\%}$
test_td3_speed[False-None] 8.1799ms 7.9587ms 125.6481 Ops/s 125.1196 Ops/s $\color{#35bf28}+0.42\%$
test_td3_speed[False-backward] 11.2620ms 10.8151ms 92.4635 Ops/s 92.4846 Ops/s $\color{#d91a1a}-0.02\%$
test_td3_speed[True-None] 1.8099ms 1.7611ms 567.8196 Ops/s 560.3721 Ops/s $\color{#35bf28}+1.33\%$
test_td3_speed[True-backward] 3.7337ms 3.5842ms 279.0017 Ops/s 259.6573 Ops/s $\textbf{\color{#35bf28}+7.45\%}$
test_td3_speed[reduce-overhead-None] 1.7707ms 1.7330ms 577.0277 Ops/s 570.3907 Ops/s $\color{#35bf28}+1.16\%$
test_cql_speed[False-None] 28.2023ms 25.7339ms 38.8593 Ops/s 38.4212 Ops/s $\color{#35bf28}+1.14\%$
test_cql_speed[False-backward] 35.3844ms 34.6487ms 28.8611 Ops/s 28.0776 Ops/s $\color{#35bf28}+2.79\%$
test_cql_speed[True-None] 12.4538ms 12.0508ms 82.9823 Ops/s 81.6417 Ops/s $\color{#35bf28}+1.64\%$
test_cql_speed[True-backward] 18.9318ms 18.1767ms 55.0154 Ops/s 55.7492 Ops/s $\color{#d91a1a}-1.32\%$
test_cql_speed[reduce-overhead-None] 12.7420ms 12.0877ms 82.7287 Ops/s 82.7094 Ops/s $\color{#35bf28}+0.02\%$
test_a2c_speed[False-None] 5.5265ms 5.3440ms 187.1241 Ops/s 186.1938 Ops/s $\color{#35bf28}+0.50\%$
test_a2c_speed[False-backward] 11.9469ms 11.4848ms 87.0713 Ops/s 87.2751 Ops/s $\color{#d91a1a}-0.23\%$
test_a2c_speed[True-None] 3.8246ms 3.6705ms 272.4390 Ops/s 271.7419 Ops/s $\color{#35bf28}+0.26\%$
test_a2c_speed[True-backward] 8.7708ms 8.4673ms 118.1013 Ops/s 117.5237 Ops/s $\color{#35bf28}+0.49\%$
test_a2c_speed[reduce-overhead-None] 3.8298ms 3.6605ms 273.1856 Ops/s 272.9658 Ops/s $\color{#35bf28}+0.08\%$
test_ppo_speed[False-None] 6.0211ms 5.8306ms 171.5083 Ops/s 172.8693 Ops/s $\color{#d91a1a}-0.79\%$
test_ppo_speed[False-backward] 12.6292ms 12.2856ms 81.3963 Ops/s 81.5324 Ops/s $\color{#d91a1a}-0.17\%$
test_ppo_speed[True-None] 3.8161ms 3.5724ms 279.9240 Ops/s 267.8662 Ops/s $\color{#35bf28}+4.50\%$
test_ppo_speed[True-backward] 8.5217ms 8.2320ms 121.4767 Ops/s 120.7046 Ops/s $\color{#35bf28}+0.64\%$
test_ppo_speed[reduce-overhead-None] 3.7326ms 3.5667ms 280.3713 Ops/s 274.4267 Ops/s $\color{#35bf28}+2.17\%$
test_reinforce_speed[False-None] 4.7725ms 4.5261ms 220.9420 Ops/s 218.3911 Ops/s $\color{#35bf28}+1.17\%$
test_reinforce_speed[False-backward] 7.6578ms 7.3052ms 136.8893 Ops/s 134.2636 Ops/s $\color{#35bf28}+1.96\%$
test_reinforce_speed[True-None] 3.0123ms 2.8239ms 354.1197 Ops/s 338.4513 Ops/s $\color{#35bf28}+4.63\%$
test_reinforce_speed[True-backward] 7.8335ms 7.6049ms 131.4947 Ops/s 129.5302 Ops/s $\color{#35bf28}+1.52\%$
test_reinforce_speed[reduce-overhead-None] 3.0007ms 2.8111ms 355.7372 Ops/s 356.3996 Ops/s $\color{#d91a1a}-0.19\%$
test_iql_speed[False-None] 25.1799ms 20.1891ms 49.5316 Ops/s 50.2796 Ops/s $\color{#d91a1a}-1.49\%$
test_iql_speed[False-backward] 37.0312ms 30.2312ms 33.0784 Ops/s 33.5068 Ops/s $\color{#d91a1a}-1.28\%$
test_iql_speed[True-None] 8.4816ms 8.2429ms 121.3172 Ops/s 120.5648 Ops/s $\color{#35bf28}+0.62\%$
test_iql_speed[True-backward] 16.7635ms 16.2956ms 61.3661 Ops/s 59.7419 Ops/s $\color{#35bf28}+2.72\%$
test_iql_speed[reduce-overhead-None] 8.5641ms 8.2767ms 120.8214 Ops/s 115.7763 Ops/s $\color{#35bf28}+4.36\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1752ms 6.0124ms 166.3228 Ops/s 165.0700 Ops/s $\color{#35bf28}+0.76\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.9323ms 0.3182ms 3.1431 KOps/s 2.7539 KOps/s $\textbf{\color{#35bf28}+14.13\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5470ms 0.2606ms 3.8375 KOps/s 2.9853 KOps/s $\textbf{\color{#35bf28}+28.55\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0266ms 5.7706ms 173.2918 Ops/s 172.3154 Ops/s $\color{#35bf28}+0.57\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.4745ms 0.2731ms 3.6613 KOps/s 3.2757 KOps/s $\textbf{\color{#35bf28}+11.77\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4968ms 0.2563ms 3.9013 KOps/s 3.4580 KOps/s $\textbf{\color{#35bf28}+12.82\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7892ms 1.4079ms 710.2977 Ops/s 746.8805 Ops/s $\color{#d91a1a}-4.90\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5866ms 1.3290ms 752.4625 Ops/s 801.6034 Ops/s $\textbf{\color{#d91a1a}-6.13\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.9086ms 6.0643ms 164.8995 Ops/s 170.0899 Ops/s $\color{#d91a1a}-3.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0495ms 0.5428ms 1.8424 KOps/s 1.9145 KOps/s $\color{#d91a1a}-3.76\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7171ms 0.4586ms 2.1803 KOps/s 2.0872 KOps/s $\color{#35bf28}+4.46\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0481ms 5.7456ms 174.0451 Ops/s 172.7924 Ops/s $\color{#35bf28}+0.72\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9546ms 0.3397ms 2.9436 KOps/s 2.6688 KOps/s $\textbf{\color{#35bf28}+10.30\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5938ms 0.3240ms 3.0865 KOps/s 2.8113 KOps/s $\textbf{\color{#35bf28}+9.79\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0439ms 5.8060ms 172.2369 Ops/s 173.1834 Ops/s $\color{#d91a1a}-0.55\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6489ms 0.3170ms 3.1550 KOps/s 2.7091 KOps/s $\textbf{\color{#35bf28}+16.46\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5959ms 0.3386ms 2.9531 KOps/s 2.8270 KOps/s $\color{#35bf28}+4.46\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0275ms 5.9210ms 168.8906 Ops/s 167.8461 Ops/s $\color{#35bf28}+0.62\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2145ms 0.5179ms 1.9310 KOps/s 2.0595 KOps/s $\textbf{\color{#d91a1a}-6.24\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8002ms 0.5040ms 1.9840 KOps/s 2.1673 KOps/s $\textbf{\color{#d91a1a}-8.46\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3878ms 4.9548ms 201.8251 Ops/s 199.6947 Ops/s $\color{#35bf28}+1.07\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.3009ms 1.9775ms 505.6944 Ops/s 463.2745 Ops/s $\textbf{\color{#35bf28}+9.16\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.8644ms 0.8923ms 1.1206 KOps/s 788.6276 Ops/s $\textbf{\color{#35bf28}+42.10\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5299s 15.5935ms 64.1295 Ops/s 59.0222 Ops/s $\textbf{\color{#35bf28}+8.65\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 9.5524ms 1.9268ms 518.9966 Ops/s 578.2450 Ops/s $\textbf{\color{#d91a1a}-10.25\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.1304ms 0.8548ms 1.1699 KOps/s 788.4363 Ops/s $\textbf{\color{#35bf28}+48.39\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.0687ms 5.2379ms 190.9146 Ops/s 188.9543 Ops/s $\color{#35bf28}+1.04\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.1386ms 1.9270ms 518.9291 Ops/s 515.8797 Ops/s $\color{#35bf28}+0.59\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.5057ms 1.0680ms 936.3730 Ops/s 934.2139 Ops/s $\color{#35bf28}+0.23\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.9887ms 35.8954ms 27.8587 Ops/s 27.6208 Ops/s $\color{#35bf28}+0.86\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.6454ms 18.2572ms 54.7729 Ops/s 54.4978 Ops/s $\color{#35bf28}+0.50\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.9635ms 37.2512ms 26.8448 Ops/s 26.5892 Ops/s $\color{#35bf28}+0.96\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.8855ms 18.4738ms 54.1306 Ops/s 34.5159 Ops/s $\textbf{\color{#35bf28}+56.83\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 45.9209ms 38.9930ms 25.6457 Ops/s 25.1774 Ops/s $\color{#35bf28}+1.86\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.8900ms 20.0626ms 49.8440 Ops/s 49.3313 Ops/s $\color{#35bf28}+1.04\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8426ms 0.2216ms 4.5132 KOps/s 4.5597 KOps/s $\color{#d91a1a}-1.02\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7176ms 1.4075ms 710.5023 Ops/s 719.8982 Ops/s $\color{#d91a1a}-1.31\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7976ms 2.3124ms 432.4501 Ops/s 416.7876 Ops/s $\color{#35bf28}+3.76\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0730ms 2.9232ms 342.0942 Ops/s 339.9276 Ops/s $\color{#35bf28}+0.64\%$
test_storage_write_contiguous[50-img_shape0-small] 0.4913ms 0.1319ms 7.5804 KOps/s 7.2769 KOps/s $\color{#35bf28}+4.17\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3375ms 0.1830ms 5.4641 KOps/s 5.2045 KOps/s $\color{#35bf28}+4.99\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.0500ms 1.8012ms 555.1920 Ops/s 574.4589 Ops/s $\color{#d91a1a}-3.35\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4542ms 1.2902ms 775.0835 Ops/s 766.6136 Ops/s $\color{#35bf28}+1.10\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3252ms 1.1211ms 892.0057 Ops/s 889.5621 Ops/s $\color{#35bf28}+0.27\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.6303ms 3.5141ms 284.5690 Ops/s 275.6347 Ops/s $\color{#35bf28}+3.24\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.4343ms 5.7722ms 173.2430 Ops/s 178.7722 Ops/s $\color{#d91a1a}-3.09\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.4294ms 7.2887ms 137.1978 Ops/s 139.8234 Ops/s $\color{#d91a1a}-1.88\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4339ms 0.2736ms 3.6549 KOps/s 3.5343 KOps/s $\color{#35bf28}+3.41\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6754ms 1.5255ms 655.5052 Ops/s 657.0596 Ops/s $\color{#d91a1a}-0.24\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6906ms 2.4308ms 411.3792 Ops/s 397.2096 Ops/s $\color{#35bf28}+3.57\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.2628ms 3.1377ms 318.7061 Ops/s 316.7406 Ops/s $\color{#35bf28}+0.62\%$
test_collector_without_rb[100-img_shape0-atari] 32.9421ms 32.4277ms 30.8378 Ops/s 30.0741 Ops/s $\color{#35bf28}+2.54\%$
test_collector_without_rb[200-img_shape1-large_batch] 64.3341ms 63.9249ms 15.6433 Ops/s 15.3765 Ops/s $\color{#35bf28}+1.74\%$
test_collector_with_rb[100-img_shape0-atari] 37.9177ms 37.2034ms 26.8793 Ops/s 26.4477 Ops/s $\color{#35bf28}+1.63\%$
test_collector_with_rb[200-img_shape1-large_batch] 75.0437ms 73.3414ms 13.6349 Ops/s 13.4857 Ops/s $\color{#35bf28}+1.11\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 11, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.1227μs 79.8140μs 12.5291 KOps/s 11.7279 KOps/s $\textbf{\color{#35bf28}+6.83\%}$
test_tensor_to_bytestream_speed[torch.save] 0.1399ms 0.1395ms 7.1689 KOps/s 6.9033 KOps/s $\color{#35bf28}+3.85\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1150s 0.1145s 8.7327 Ops/s 8.9442 Ops/s $\color{#d91a1a}-2.36\%$
test_tensor_to_bytestream_speed[numpy] 2.5534μs 2.5517μs 391.8916 KOps/s 365.6540 KOps/s $\textbf{\color{#35bf28}+7.18\%}$
test_tensor_to_bytestream_speed[safetensors] 37.2691μs 37.1122μs 26.9453 KOps/s 25.1199 KOps/s $\textbf{\color{#35bf28}+7.27\%}$
test_simple 0.8061s 0.7997s 1.2505 Ops/s 1.2270 Ops/s $\color{#35bf28}+1.91\%$
test_transformed 1.4993s 1.4105s 0.7090 Ops/s 0.7097 Ops/s $\color{#d91a1a}-0.11\%$
test_serial 2.3685s 2.3481s 0.4259 Ops/s 0.4309 Ops/s $\color{#d91a1a}-1.16\%$
test_parallel 1.9128s 1.8149s 0.5510 Ops/s 0.5514 Ops/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[True-True-True-True-True] 0.3291ms 42.3043μs 23.6382 KOps/s 24.0877 KOps/s $\color{#d91a1a}-1.87\%$
test_step_mdp_speed[True-True-True-True-False] 58.9210μs 23.4293μs 42.6815 KOps/s 42.4398 KOps/s $\color{#35bf28}+0.57\%$
test_step_mdp_speed[True-True-True-False-True] 89.7510μs 23.7491μs 42.1068 KOps/s 42.4213 KOps/s $\color{#d91a1a}-0.74\%$
test_step_mdp_speed[True-True-True-False-False] 44.6810μs 12.8469μs 77.8397 KOps/s 77.0227 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[True-True-False-True-True] 85.7020μs 44.9704μs 22.2368 KOps/s 22.0432 KOps/s $\color{#35bf28}+0.88\%$
test_step_mdp_speed[True-True-False-True-False] 75.7710μs 25.8952μs 38.6172 KOps/s 38.3822 KOps/s $\color{#35bf28}+0.61\%$
test_step_mdp_speed[True-True-False-False-True] 69.7810μs 26.2979μs 38.0258 KOps/s 37.9417 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[True-True-False-False-False] 47.9410μs 15.7046μs 63.6754 KOps/s 64.5388 KOps/s $\color{#d91a1a}-1.34\%$
test_step_mdp_speed[True-False-True-True-True] 84.4410μs 48.2951μs 20.7061 KOps/s 21.1852 KOps/s $\color{#d91a1a}-2.26\%$
test_step_mdp_speed[True-False-True-True-False] 72.8020μs 28.6766μs 34.8716 KOps/s 34.0682 KOps/s $\color{#35bf28}+2.36\%$
test_step_mdp_speed[True-False-True-False-True] 60.1810μs 26.2614μs 38.0787 KOps/s 38.2702 KOps/s $\color{#d91a1a}-0.50\%$
test_step_mdp_speed[True-False-True-False-False] 52.8110μs 15.7851μs 63.3510 KOps/s 64.1698 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[True-False-False-True-True] 94.7110μs 50.0889μs 19.9645 KOps/s 20.4905 KOps/s $\color{#d91a1a}-2.57\%$
test_step_mdp_speed[True-False-False-True-False] 66.9120μs 31.7130μs 31.5328 KOps/s 32.7597 KOps/s $\color{#d91a1a}-3.75\%$
test_step_mdp_speed[True-False-False-False-True] 63.3410μs 29.1319μs 34.3266 KOps/s 35.4602 KOps/s $\color{#d91a1a}-3.20\%$
test_step_mdp_speed[True-False-False-False-False] 47.5010μs 18.5020μs 54.0481 KOps/s 56.8002 KOps/s $\color{#d91a1a}-4.85\%$
test_step_mdp_speed[False-True-True-True-True] 95.9410μs 48.0805μs 20.7985 KOps/s 21.2201 KOps/s $\color{#d91a1a}-1.99\%$
test_step_mdp_speed[False-True-True-True-False] 57.4710μs 28.7557μs 34.7757 KOps/s 34.6457 KOps/s $\color{#35bf28}+0.38\%$
test_step_mdp_speed[False-True-True-False-True] 2.5376ms 30.3976μs 32.8974 KOps/s 33.2777 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[False-True-True-False-False] 75.6510μs 17.3405μs 57.6685 KOps/s 57.4373 KOps/s $\color{#35bf28}+0.40\%$
test_step_mdp_speed[False-True-False-True-True] 97.2820μs 50.9335μs 19.6334 KOps/s 19.9481 KOps/s $\color{#d91a1a}-1.58\%$
test_step_mdp_speed[False-True-False-True-False] 0.1219ms 31.4175μs 31.8294 KOps/s 31.7601 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[False-True-False-False-True] 67.2610μs 32.6098μs 30.6657 KOps/s 30.8321 KOps/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[False-True-False-False-False] 53.8810μs 19.8161μs 50.4640 KOps/s 50.4105 KOps/s $\color{#35bf28}+0.11\%$
test_step_mdp_speed[False-False-True-True-True] 97.6520μs 53.2290μs 18.7868 KOps/s 19.1239 KOps/s $\color{#d91a1a}-1.76\%$
test_step_mdp_speed[False-False-True-True-False] 66.3210μs 34.1767μs 29.2597 KOps/s 30.0715 KOps/s $\color{#d91a1a}-2.70\%$
test_step_mdp_speed[False-False-True-False-True] 68.0710μs 32.3971μs 30.8670 KOps/s 32.0334 KOps/s $\color{#d91a1a}-3.64\%$
test_step_mdp_speed[False-False-True-False-False] 54.8410μs 19.7904μs 50.5295 KOps/s 51.9181 KOps/s $\color{#d91a1a}-2.67\%$
test_step_mdp_speed[False-False-False-True-True] 91.6620μs 55.1545μs 18.1309 KOps/s 18.7326 KOps/s $\color{#d91a1a}-3.21\%$
test_step_mdp_speed[False-False-False-True-False] 71.3120μs 36.4069μs 27.4673 KOps/s 28.1377 KOps/s $\color{#d91a1a}-2.38\%$
test_step_mdp_speed[False-False-False-False-True] 79.7420μs 34.1499μs 29.2827 KOps/s 29.4617 KOps/s $\color{#d91a1a}-0.61\%$
test_step_mdp_speed[False-False-False-False-False] 50.0010μs 22.2342μs 44.9758 KOps/s 44.5020 KOps/s $\color{#35bf28}+1.06\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7148s 0.7131s 1.4023 Ops/s 1.3415 Ops/s $\color{#35bf28}+4.53\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7004s 0.6038s 1.6561 Ops/s 1.6261 Ops/s $\color{#35bf28}+1.85\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6993s 1.6231s 0.6161 Ops/s 0.6087 Ops/s $\color{#35bf28}+1.22\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4777s 1.3981s 0.7153 Ops/s 0.7077 Ops/s $\color{#35bf28}+1.08\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9456s 1.8614s 0.5372 Ops/s 0.5249 Ops/s $\color{#35bf28}+2.35\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7239s 1.6443s 0.6082 Ops/s 0.6008 Ops/s $\color{#35bf28}+1.23\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7644s 4.6236s 0.2163 Ops/s 0.2182 Ops/s $\color{#d91a1a}-0.87\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5068s 4.4201s 0.2262 Ops/s 0.2212 Ops/s $\color{#35bf28}+2.29\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9502s 1.8731s 0.5339 Ops/s 0.5376 Ops/s $\color{#d91a1a}-0.70\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7241s 1.5855s 0.6307 Ops/s 0.6302 Ops/s $\color{#35bf28}+0.07\%$
test_values[generalized_advantage_estimate-True-True] 21.5885ms 21.1426ms 47.2978 Ops/s 48.2566 Ops/s $\color{#d91a1a}-1.99\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1251s 3.4352ms 291.1017 Ops/s 267.8459 Ops/s $\textbf{\color{#35bf28}+8.68\%}$
test_values[td0_return_estimate-False-False] 0.1095ms 85.6244μs 11.6789 KOps/s 11.9066 KOps/s $\color{#d91a1a}-1.91\%$
test_values[td1_return_estimate-False-False] 50.4005ms 50.1112ms 19.9556 Ops/s 20.2786 Ops/s $\color{#d91a1a}-1.59\%$
test_values[vec_td1_return_estimate-False-False] 1.3615ms 1.1074ms 902.9784 Ops/s 911.9916 Ops/s $\color{#d91a1a}-0.99\%$
test_values[td_lambda_return_estimate-True-False] 88.2255ms 83.5687ms 11.9662 Ops/s 12.3763 Ops/s $\color{#d91a1a}-3.31\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3246ms 1.1025ms 907.0654 Ops/s 916.2394 Ops/s $\color{#d91a1a}-1.00\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.4522ms 21.3215ms 46.9010 Ops/s 48.4388 Ops/s $\color{#d91a1a}-3.17\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0445ms 0.7761ms 1.2884 KOps/s 1.3023 KOps/s $\color{#d91a1a}-1.07\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7390ms 0.6954ms 1.4379 KOps/s 1.4614 KOps/s $\color{#d91a1a}-1.61\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5518ms 1.5070ms 663.5571 Ops/s 665.5899 Ops/s $\color{#d91a1a}-0.31\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7833ms 0.7126ms 1.4032 KOps/s 1.4279 KOps/s $\color{#d91a1a}-1.73\%$
test_dqn_speed[False-None] 1.6323ms 1.5434ms 647.9382 Ops/s 651.3795 Ops/s $\color{#d91a1a}-0.53\%$
test_dqn_speed[False-backward] 2.2465ms 2.1888ms 456.8670 Ops/s 460.0156 Ops/s $\color{#d91a1a}-0.68\%$
test_dqn_speed[True-None] 0.6662ms 0.5603ms 1.7849 KOps/s 1.7788 KOps/s $\color{#35bf28}+0.35\%$
test_dqn_speed[True-backward] 1.2360ms 1.2067ms 828.6956 Ops/s 812.1684 Ops/s $\color{#35bf28}+2.03\%$
test_dqn_speed[reduce-overhead-None] 0.6444ms 0.5796ms 1.7253 KOps/s 1.6907 KOps/s $\color{#35bf28}+2.05\%$
test_ddpg_speed[False-None] 3.2589ms 2.8904ms 345.9704 Ops/s 348.1619 Ops/s $\color{#d91a1a}-0.63\%$
test_ddpg_speed[False-backward] 4.7459ms 4.2963ms 232.7565 Ops/s 234.2297 Ops/s $\color{#d91a1a}-0.63\%$
test_ddpg_speed[True-None] 1.3883ms 1.3128ms 761.7567 Ops/s 757.9672 Ops/s $\color{#35bf28}+0.50\%$
test_ddpg_speed[True-backward] 2.6103ms 2.5292ms 395.3764 Ops/s 394.8066 Ops/s $\color{#35bf28}+0.14\%$
test_ddpg_speed[reduce-overhead-None] 1.4809ms 1.3514ms 739.9843 Ops/s 744.3052 Ops/s $\color{#d91a1a}-0.58\%$
test_sac_speed[False-None] 8.9138ms 8.3797ms 119.3365 Ops/s 119.6016 Ops/s $\color{#d91a1a}-0.22\%$
test_sac_speed[False-backward] 12.1579ms 11.6768ms 85.6402 Ops/s 86.0374 Ops/s $\color{#d91a1a}-0.46\%$
test_sac_speed[True-None] 1.8792ms 1.8071ms 553.3831 Ops/s 555.6173 Ops/s $\color{#d91a1a}-0.40\%$
test_sac_speed[True-backward] 3.6592ms 3.5905ms 278.5117 Ops/s 278.2921 Ops/s $\color{#35bf28}+0.08\%$
test_sac_speed[reduce-overhead-None] 19.9960ms 11.1398ms 89.7679 Ops/s 82.5056 Ops/s $\textbf{\color{#35bf28}+8.80\%}$
test_redq_deprec_speed[False-None] 10.0003ms 9.3666ms 106.7618 Ops/s 106.8381 Ops/s $\color{#d91a1a}-0.07\%$
test_redq_deprec_speed[False-backward] 13.3003ms 12.8194ms 78.0068 Ops/s 78.0977 Ops/s $\color{#d91a1a}-0.12\%$
test_redq_deprec_speed[True-None] 2.6295ms 2.5323ms 394.8958 Ops/s 394.2834 Ops/s $\color{#35bf28}+0.16\%$
test_redq_deprec_speed[True-backward] 4.7149ms 4.3131ms 231.8544 Ops/s 239.5218 Ops/s $\color{#d91a1a}-3.20\%$
test_redq_deprec_speed[reduce-overhead-None] 16.3996ms 9.9269ms 100.7363 Ops/s 102.6503 Ops/s $\color{#d91a1a}-1.86\%$
test_td3_speed[False-None] 8.4190ms 8.2324ms 121.4712 Ops/s 121.8932 Ops/s $\color{#d91a1a}-0.35\%$
test_td3_speed[False-backward] 11.3783ms 10.9227ms 91.5522 Ops/s 94.4635 Ops/s $\color{#d91a1a}-3.08\%$
test_td3_speed[True-None] 1.6417ms 1.6221ms 616.4812 Ops/s 592.9582 Ops/s $\color{#35bf28}+3.97\%$
test_td3_speed[True-backward] 3.2914ms 3.2317ms 309.4379 Ops/s 323.0394 Ops/s $\color{#d91a1a}-4.21\%$
test_td3_speed[reduce-overhead-None] 47.8453ms 25.0475ms 39.9242 Ops/s 40.4182 Ops/s $\color{#d91a1a}-1.22\%$
test_cql_speed[False-None] 17.6530ms 17.3530ms 57.6269 Ops/s 57.8104 Ops/s $\color{#d91a1a}-0.32\%$
test_cql_speed[False-backward] 23.5410ms 23.0909ms 43.3070 Ops/s 44.2804 Ops/s $\color{#d91a1a}-2.20\%$
test_cql_speed[True-None] 3.4154ms 3.2430ms 308.3574 Ops/s 301.6494 Ops/s $\color{#35bf28}+2.22\%$
test_cql_speed[True-backward] 5.8500ms 5.3809ms 185.8432 Ops/s 181.2007 Ops/s $\color{#35bf28}+2.56\%$
test_cql_speed[reduce-overhead-None] 19.9015ms 12.0976ms 82.6607 Ops/s 83.7327 Ops/s $\color{#d91a1a}-1.28\%$
test_a2c_speed[False-None] 3.9296ms 3.2845ms 304.4562 Ops/s 305.1022 Ops/s $\color{#d91a1a}-0.21\%$
test_a2c_speed[False-backward] 6.8050ms 6.2874ms 159.0475 Ops/s 154.8313 Ops/s $\color{#35bf28}+2.72\%$
test_a2c_speed[True-None] 1.8051ms 1.3178ms 758.8534 Ops/s 755.6461 Ops/s $\color{#35bf28}+0.42\%$
test_a2c_speed[True-backward] 3.0586ms 2.9841ms 335.1041 Ops/s 323.1653 Ops/s $\color{#35bf28}+3.69\%$
test_a2c_speed[reduce-overhead-None] 1.2142ms 0.9708ms 1.0301 KOps/s 1.0241 KOps/s $\color{#35bf28}+0.58\%$
test_ppo_speed[False-None] 4.0479ms 3.8962ms 256.6604 Ops/s 257.3948 Ops/s $\color{#d91a1a}-0.29\%$
test_ppo_speed[False-backward] 7.4872ms 7.0424ms 141.9970 Ops/s 139.6371 Ops/s $\color{#35bf28}+1.69\%$
test_ppo_speed[True-None] 1.5723ms 1.4228ms 702.8499 Ops/s 708.7738 Ops/s $\color{#d91a1a}-0.84\%$
test_ppo_speed[True-backward] 3.2334ms 3.1178ms 320.7367 Ops/s 304.0181 Ops/s $\textbf{\color{#35bf28}+5.50\%}$
test_ppo_speed[reduce-overhead-None] 1.0654ms 1.0183ms 982.0163 Ops/s 943.4916 Ops/s $\color{#35bf28}+4.08\%$
test_reinforce_speed[False-None] 2.6503ms 2.2861ms 437.4256 Ops/s 433.5642 Ops/s $\color{#35bf28}+0.89\%$
test_reinforce_speed[False-backward] 3.5301ms 3.4256ms 291.9201 Ops/s 290.0852 Ops/s $\color{#35bf28}+0.63\%$
test_reinforce_speed[True-None] 1.3335ms 1.2533ms 797.8978 Ops/s 789.8434 Ops/s $\color{#35bf28}+1.02\%$
test_reinforce_speed[True-backward] 3.1485ms 3.0710ms 325.6259 Ops/s 325.1523 Ops/s $\color{#35bf28}+0.15\%$
test_reinforce_speed[reduce-overhead-None] 17.5751ms 9.6389ms 103.7461 Ops/s 105.7121 Ops/s $\color{#d91a1a}-1.86\%$
test_iql_speed[False-None] 10.0291ms 9.4312ms 106.0312 Ops/s 105.9541 Ops/s $\color{#35bf28}+0.07\%$
test_iql_speed[False-backward] 13.5604ms 13.1880ms 75.8263 Ops/s 74.3853 Ops/s $\color{#35bf28}+1.94\%$
test_iql_speed[True-None] 2.2427ms 2.1579ms 463.4220 Ops/s 460.7333 Ops/s $\color{#35bf28}+0.58\%$
test_iql_speed[True-backward] 4.7525ms 4.6710ms 214.0855 Ops/s 204.8329 Ops/s $\color{#35bf28}+4.52\%$
test_iql_speed[reduce-overhead-None] 18.5080ms 10.6965ms 93.4888 Ops/s 95.7582 Ops/s $\color{#d91a1a}-2.37\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2151ms 6.0194ms 166.1286 Ops/s 167.6437 Ops/s $\color{#d91a1a}-0.90\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9212ms 0.2796ms 3.5761 KOps/s 3.5248 KOps/s $\color{#35bf28}+1.45\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5335ms 0.2633ms 3.7978 KOps/s 2.8302 KOps/s $\textbf{\color{#35bf28}+34.19\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1193ms 5.8557ms 170.7725 Ops/s 173.5312 Ops/s $\color{#d91a1a}-1.59\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6416ms 0.3522ms 2.8392 KOps/s 3.1363 KOps/s $\textbf{\color{#d91a1a}-9.47\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6615ms 0.3388ms 2.9519 KOps/s 3.4840 KOps/s $\textbf{\color{#d91a1a}-15.27\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5946ms 1.4142ms 707.1130 Ops/s 723.7863 Ops/s $\color{#d91a1a}-2.30\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5263ms 1.3431ms 744.5573 Ops/s 792.5300 Ops/s $\textbf{\color{#d91a1a}-6.05\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.6369ms 6.0201ms 166.1111 Ops/s 168.1687 Ops/s $\color{#d91a1a}-1.22\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9434ms 0.4270ms 2.3419 KOps/s 2.0524 KOps/s $\textbf{\color{#35bf28}+14.11\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5969ms 0.4087ms 2.4470 KOps/s 2.1496 KOps/s $\textbf{\color{#35bf28}+13.83\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0270ms 5.8931ms 169.6906 Ops/s 171.5286 Ops/s $\color{#d91a1a}-1.07\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7845ms 0.2779ms 3.5982 KOps/s 2.8715 KOps/s $\textbf{\color{#35bf28}+25.30\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4604ms 0.2605ms 3.8385 KOps/s 3.1833 KOps/s $\textbf{\color{#35bf28}+20.58\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1179ms 5.8526ms 170.8645 Ops/s 174.0605 Ops/s $\color{#d91a1a}-1.84\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7876ms 0.3152ms 3.1724 KOps/s 2.7829 KOps/s $\textbf{\color{#35bf28}+14.00\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5509ms 0.3186ms 3.1388 KOps/s 2.9444 KOps/s $\textbf{\color{#35bf28}+6.60\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1946ms 6.0173ms 166.1872 Ops/s 168.4058 Ops/s $\color{#d91a1a}-1.32\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0706ms 0.4910ms 2.0365 KOps/s 2.1936 KOps/s $\textbf{\color{#d91a1a}-7.16\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8070ms 0.4921ms 2.0323 KOps/s 2.4270 KOps/s $\textbf{\color{#d91a1a}-16.27\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5776s 16.4107ms 60.9358 Ops/s 51.1736 Ops/s $\textbf{\color{#35bf28}+19.08\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 12.6514ms 2.0314ms 492.2758 Ops/s 488.8309 Ops/s $\color{#35bf28}+0.70\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.2085ms 0.9359ms 1.0684 KOps/s 1.0841 KOps/s $\color{#d91a1a}-1.45\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 11.0696ms 5.4124ms 184.7623 Ops/s 196.0544 Ops/s $\textbf{\color{#d91a1a}-5.76\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 7.5187ms 2.1851ms 457.6521 Ops/s 507.8454 Ops/s $\textbf{\color{#d91a1a}-9.88\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 12.2641ms 1.3825ms 723.3229 Ops/s 750.2838 Ops/s $\color{#d91a1a}-3.59\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5302s 15.7051ms 63.6737 Ops/s 191.4288 Ops/s $\textbf{\color{#d91a1a}-66.74\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.0803ms 1.9896ms 502.6131 Ops/s 474.3913 Ops/s $\textbf{\color{#35bf28}+5.95\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.0735ms 1.1484ms 870.7970 Ops/s 927.0951 Ops/s $\textbf{\color{#d91a1a}-6.07\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.9054ms 36.1056ms 27.6965 Ops/s 27.3438 Ops/s $\color{#35bf28}+1.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.1266ms 18.4183ms 54.2939 Ops/s 53.9914 Ops/s $\color{#35bf28}+0.56\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.3085ms 36.9238ms 27.0828 Ops/s 26.4605 Ops/s $\color{#35bf28}+2.35\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.2708ms 18.5867ms 53.8018 Ops/s 53.6460 Ops/s $\color{#35bf28}+0.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.8745ms 38.7253ms 25.8229 Ops/s 25.3256 Ops/s $\color{#35bf28}+1.96\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.5923ms 20.1472ms 49.6346 Ops/s 49.3739 Ops/s $\color{#35bf28}+0.53\%$
test_storage_write_lazystack[50-img_shape0-small] 0.9160ms 0.2226ms 4.4931 KOps/s 4.5309 KOps/s $\color{#d91a1a}-0.84\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6384ms 1.4216ms 703.4259 Ops/s 713.0830 Ops/s $\color{#d91a1a}-1.35\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7389ms 2.3105ms 432.8012 Ops/s 431.9566 Ops/s $\color{#35bf28}+0.20\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0536ms 2.8899ms 346.0299 Ops/s 341.0459 Ops/s $\color{#35bf28}+1.46\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2365ms 0.1632ms 6.1267 KOps/s 6.0887 KOps/s $\color{#35bf28}+0.62\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3934ms 0.2359ms 4.2395 KOps/s 4.3638 KOps/s $\color{#d91a1a}-2.85\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9870ms 1.8810ms 531.6378 Ops/s 542.4369 Ops/s $\color{#d91a1a}-1.99\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.7905ms 1.3747ms 727.4434 Ops/s 707.5563 Ops/s $\color{#35bf28}+2.81\%$
test_collector_stack_then_write[50-img_shape0-small] 1.5093ms 1.1725ms 852.9047 Ops/s 855.0203 Ops/s $\color{#d91a1a}-0.25\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8489ms 3.6503ms 273.9494 Ops/s 278.1676 Ops/s $\color{#d91a1a}-1.52\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.0969ms 5.8103ms 172.1088 Ops/s 174.1326 Ops/s $\color{#d91a1a}-1.16\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.5854ms 7.4071ms 135.0061 Ops/s 139.4296 Ops/s $\color{#d91a1a}-3.17\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4418ms 0.2772ms 3.6074 KOps/s 3.6493 KOps/s $\color{#d91a1a}-1.15\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7002ms 1.5392ms 649.6758 Ops/s 660.3436 Ops/s $\color{#d91a1a}-1.62\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8665ms 2.4549ms 407.3503 Ops/s 416.9292 Ops/s $\color{#d91a1a}-2.30\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.2830ms 3.1100ms 321.5484 Ops/s 318.5518 Ops/s $\color{#35bf28}+0.94\%$
test_collector_without_rb[100-img_shape0-atari] 33.5125ms 32.9230ms 30.3739 Ops/s 30.2211 Ops/s $\color{#35bf28}+0.51\%$
test_collector_without_rb[200-img_shape1-large_batch] 65.1003ms 64.4526ms 15.5153 Ops/s 15.5124 Ops/s $\color{#35bf28}+0.02\%$
test_collector_with_rb[100-img_shape0-atari] 37.7477ms 37.1487ms 26.9188 Ops/s 26.6650 Ops/s $\color{#35bf28}+0.95\%$
test_collector_with_rb[200-img_shape1-large_batch] 74.1761ms 73.4820ms 13.6088 Ops/s 13.6526 Ops/s $\color{#d91a1a}-0.32\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 57.5985ms 56.2729ms 17.7705 Ops/s 17.9346 Ops/s $\color{#d91a1a}-0.91\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1145s 0.1115s 8.9653 Ops/s 8.9806 Ops/s $\color{#d91a1a}-0.17\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 59.7626ms 58.4001ms 17.1233 Ops/s 17.2588 Ops/s $\color{#d91a1a}-0.79\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.7881s 0.1912s 5.2302 Ops/s 8.6870 Ops/s $\textbf{\color{#d91a1a}-39.79\%}$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 21, 2026
…rotocol (#3492)

Adds the `torchrl.modules.inference_server` subpackage with:
- `InferenceTransport` ABC defining the submit/drain/resolve protocol
- `InferenceServer` with background worker loop (collate -> forward -> unbind)
- `InferenceClient` wrapping submit+result for synchronous policy-like API
- Tests and Sphinx docs

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 8e4c4a5
Pull-Request: #3492
Co-authored-by: Cursor <cursoragent@cursor.com>
@vmoens vmoens merged commit 579feae into gh/vmoens/234/base Feb 21, 2026
116 checks passed
@vmoens vmoens deleted the gh/vmoens/234/head branch February 21, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Documentation Improvements or additions to documentation Feature New feature Modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant