Skip to content

[Feature] Auto-batching inference server: weight sync integration#3497

Merged
vmoens merged 5 commits intogh/vmoens/239/basefrom
gh/vmoens/239/head
Feb 21, 2026
Merged

[Feature] Auto-batching inference server: weight sync integration#3497
vmoens merged 5 commits intogh/vmoens/239/basefrom
gh/vmoens/239/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Feb 11, 2026

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3497

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit 575fa0b with merge base 266e4aa (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Feb 11, 2026
Wires WeightSyncScheme into the server loop:
- init_on_receiver + connect at startup
- Non-blocking receive() poll between inference batches
- threading.Lock protects model during weight updates
- End-to-end tests and updated Sphinx docs with usage tutorial

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 50480f5
Pull-Request: #3497
@github-actions github-actions bot added Documentation Improvements or additions to documentation Modules labels Feb 11, 2026
@github-actions github-actions bot added the Feature New feature label Feb 11, 2026
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 11, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.6947μs 79.0885μs 12.6441 KOps/s 12.7268 KOps/s $\color{#d91a1a}-0.65\%$
test_tensor_to_bytestream_speed[torch.save] 0.1419ms 0.1411ms 7.0873 KOps/s 7.3601 KOps/s $\color{#d91a1a}-3.71\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1081s 0.1079s 9.2718 Ops/s 9.8569 Ops/s $\textbf{\color{#d91a1a}-5.94\%}$
test_tensor_to_bytestream_speed[numpy] 2.5727μs 2.5438μs 393.1159 KOps/s 405.1509 KOps/s $\color{#d91a1a}-2.97\%$
test_tensor_to_bytestream_speed[safetensors] 35.1735μs 34.6844μs 28.8314 KOps/s 28.5659 KOps/s $\color{#35bf28}+0.93\%$
test_simple 0.5311s 0.5302s 1.8862 Ops/s 1.8154 Ops/s $\color{#35bf28}+3.90\%$
test_transformed 1.0548s 1.0536s 0.9491 Ops/s 0.9292 Ops/s $\color{#35bf28}+2.15\%$
test_serial 1.6463s 1.6238s 0.6158 Ops/s 0.6087 Ops/s $\color{#35bf28}+1.17\%$
test_parallel 0.9995s 0.9906s 1.0095 Ops/s 0.9848 Ops/s $\color{#35bf28}+2.51\%$
test_step_mdp_speed[True-True-True-True-True] 0.2160ms 40.6612μs 24.5935 KOps/s 24.3417 KOps/s $\color{#35bf28}+1.03\%$
test_step_mdp_speed[True-True-True-True-False] 54.9300μs 23.2913μs 42.9344 KOps/s 43.5787 KOps/s $\color{#d91a1a}-1.48\%$
test_step_mdp_speed[True-True-True-False-True] 66.2210μs 22.8622μs 43.7403 KOps/s 42.7861 KOps/s $\color{#35bf28}+2.23\%$
test_step_mdp_speed[True-True-True-False-False] 35.6710μs 12.6793μs 78.8686 KOps/s 77.5014 KOps/s $\color{#35bf28}+1.76\%$
test_step_mdp_speed[True-True-False-True-True] 78.2720μs 43.5547μs 22.9597 KOps/s 22.2294 KOps/s $\color{#35bf28}+3.29\%$
test_step_mdp_speed[True-True-False-True-False] 73.9320μs 25.1347μs 39.7857 KOps/s 39.0147 KOps/s $\color{#35bf28}+1.98\%$
test_step_mdp_speed[True-True-False-False-True] 62.4910μs 25.3823μs 39.3975 KOps/s 38.0211 KOps/s $\color{#35bf28}+3.62\%$
test_step_mdp_speed[True-True-False-False-False] 41.3210μs 15.2858μs 65.4204 KOps/s 63.8453 KOps/s $\color{#35bf28}+2.47\%$
test_step_mdp_speed[True-False-True-True-True] 94.5820μs 46.9523μs 21.2982 KOps/s 20.8644 KOps/s $\color{#35bf28}+2.08\%$
test_step_mdp_speed[True-False-True-True-False] 90.4020μs 28.4923μs 35.0972 KOps/s 35.0417 KOps/s $\color{#35bf28}+0.16\%$
test_step_mdp_speed[True-False-True-False-True] 55.3110μs 25.1852μs 39.7058 KOps/s 37.0144 KOps/s $\textbf{\color{#35bf28}+7.27\%}$
test_step_mdp_speed[True-False-True-False-False] 44.9010μs 15.2593μs 65.5337 KOps/s 63.9845 KOps/s $\color{#35bf28}+2.42\%$
test_step_mdp_speed[True-False-False-True-True] 80.7910μs 47.8103μs 20.9160 KOps/s 20.1728 KOps/s $\color{#35bf28}+3.68\%$
test_step_mdp_speed[True-False-False-True-False] 62.2910μs 29.9440μs 33.3957 KOps/s 32.5532 KOps/s $\color{#35bf28}+2.59\%$
test_step_mdp_speed[True-False-False-False-True] 59.3210μs 27.2416μs 36.7086 KOps/s 34.8949 KOps/s $\textbf{\color{#35bf28}+5.20\%}$
test_step_mdp_speed[True-False-False-False-False] 51.0110μs 18.0277μs 55.4701 KOps/s 56.2994 KOps/s $\color{#d91a1a}-1.47\%$
test_step_mdp_speed[False-True-True-True-True] 81.2910μs 45.4315μs 22.0112 KOps/s 21.1762 KOps/s $\color{#35bf28}+3.94\%$
test_step_mdp_speed[False-True-True-True-False] 57.8910μs 28.2312μs 35.4219 KOps/s 35.7860 KOps/s $\color{#d91a1a}-1.02\%$
test_step_mdp_speed[False-True-True-False-True] 2.4843ms 29.3479μs 34.0739 KOps/s 32.7947 KOps/s $\color{#35bf28}+3.90\%$
test_step_mdp_speed[False-True-True-False-False] 49.2610μs 16.8668μs 59.2881 KOps/s 58.5841 KOps/s $\color{#35bf28}+1.20\%$
test_step_mdp_speed[False-True-False-True-True] 79.6920μs 48.7833μs 20.4988 KOps/s 20.0967 KOps/s $\color{#35bf28}+2.00\%$
test_step_mdp_speed[False-True-False-True-False] 66.3410μs 30.5954μs 32.6847 KOps/s 32.9614 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[False-True-False-False-True] 74.6310μs 30.9132μs 32.3486 KOps/s 31.0841 KOps/s $\color{#35bf28}+4.07\%$
test_step_mdp_speed[False-True-False-False-False] 58.2110μs 19.4918μs 51.3037 KOps/s 51.6949 KOps/s $\color{#d91a1a}-0.76\%$
test_step_mdp_speed[False-False-True-True-True] 87.3220μs 50.8924μs 19.6493 KOps/s 19.3558 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[False-False-True-True-False] 64.0110μs 33.0485μs 30.2585 KOps/s 30.2463 KOps/s $\color{#35bf28}+0.04\%$
test_step_mdp_speed[False-False-True-False-True] 69.6310μs 31.0116μs 32.2460 KOps/s 31.1538 KOps/s $\color{#35bf28}+3.51\%$
test_step_mdp_speed[False-False-True-False-False] 51.0010μs 19.3460μs 51.6902 KOps/s 50.8567 KOps/s $\color{#35bf28}+1.64\%$
test_step_mdp_speed[False-False-False-True-True] 96.9920μs 51.9502μs 19.2492 KOps/s 18.4772 KOps/s $\color{#35bf28}+4.18\%$
test_step_mdp_speed[False-False-False-True-False] 70.9710μs 34.4301μs 29.0443 KOps/s 27.9071 KOps/s $\color{#35bf28}+4.07\%$
test_step_mdp_speed[False-False-False-False-True] 67.3510μs 32.5674μs 30.7056 KOps/s 28.8595 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_step_mdp_speed[False-False-False-False-False] 55.0610μs 21.4690μs 46.5788 KOps/s 46.0458 KOps/s $\color{#35bf28}+1.16\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8140s 0.7187s 1.3913 Ops/s 1.3867 Ops/s $\color{#35bf28}+0.34\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6852s 0.5875s 1.7021 Ops/s 1.6933 Ops/s $\color{#35bf28}+0.52\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6605s 1.5866s 0.6303 Ops/s 0.6286 Ops/s $\color{#35bf28}+0.26\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4510s 1.3738s 0.7279 Ops/s 0.7274 Ops/s $\color{#35bf28}+0.08\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9015s 1.8240s 0.5483 Ops/s 0.5479 Ops/s $\color{#35bf28}+0.07\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.6839s 1.6061s 0.6226 Ops/s 0.6193 Ops/s $\color{#35bf28}+0.53\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6811s 4.5295s 0.2208 Ops/s 0.2226 Ops/s $\color{#d91a1a}-0.84\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5145s 4.4063s 0.2269 Ops/s 0.2278 Ops/s $\color{#d91a1a}-0.36\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.8989s 1.8392s 0.5437 Ops/s 0.5479 Ops/s $\color{#d91a1a}-0.77\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6266s 1.5489s 0.6456 Ops/s 0.6433 Ops/s $\color{#35bf28}+0.36\%$
test_values[generalized_advantage_estimate-True-True] 10.7658ms 10.2083ms 97.9599 Ops/s 97.7879 Ops/s $\color{#35bf28}+0.18\%$
test_values[vec_generalized_advantage_estimate-True-True] 22.5253ms 17.8556ms 56.0047 Ops/s 57.1143 Ops/s $\color{#d91a1a}-1.94\%$
test_values[td0_return_estimate-False-False] 0.2252ms 0.1317ms 7.5910 KOps/s 7.7021 KOps/s $\color{#d91a1a}-1.44\%$
test_values[td1_return_estimate-False-False] 29.5777ms 28.6310ms 34.9272 Ops/s 36.7345 Ops/s $\color{#d91a1a}-4.92\%$
test_values[vec_td1_return_estimate-False-False] 21.2555ms 17.6850ms 56.5450 Ops/s 56.9801 Ops/s $\color{#d91a1a}-0.76\%$
test_values[td_lambda_return_estimate-True-False] 45.1866ms 42.6088ms 23.4693 Ops/s 25.1896 Ops/s $\textbf{\color{#d91a1a}-6.83\%}$
test_values[vec_td_lambda_return_estimate-True-False] 21.2533ms 17.7438ms 56.3576 Ops/s 56.8875 Ops/s $\color{#d91a1a}-0.93\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.1827ms 8.9191ms 112.1192 Ops/s 113.7270 Ops/s $\color{#d91a1a}-1.41\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.6952ms 1.5200ms 657.8912 Ops/s 663.3060 Ops/s $\color{#d91a1a}-0.82\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5105ms 0.4250ms 2.3530 KOps/s 2.3876 KOps/s $\color{#d91a1a}-1.45\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.1452ms 34.6708ms 28.8428 Ops/s 28.6313 Ops/s $\color{#35bf28}+0.74\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8508ms 1.7144ms 583.2868 Ops/s 582.0408 Ops/s $\color{#35bf28}+0.21\%$
test_dqn_speed[False-None] 1.5212ms 1.3683ms 730.8257 Ops/s 726.3428 Ops/s $\color{#35bf28}+0.62\%$
test_dqn_speed[False-backward] 2.0254ms 1.9184ms 521.2733 Ops/s 516.2682 Ops/s $\color{#35bf28}+0.97\%$
test_dqn_speed[True-None] 0.7292ms 0.5453ms 1.8338 KOps/s 1.8294 KOps/s $\color{#35bf28}+0.24\%$
test_dqn_speed[True-backward] 1.0663ms 0.9968ms 1.0032 KOps/s 1.0122 KOps/s $\color{#d91a1a}-0.89\%$
test_dqn_speed[reduce-overhead-None] 0.6493ms 0.5284ms 1.8925 KOps/s 1.8849 KOps/s $\color{#35bf28}+0.40\%$
test_ddpg_speed[False-None] 3.1517ms 2.8196ms 354.6655 Ops/s 359.1842 Ops/s $\color{#d91a1a}-1.26\%$
test_ddpg_speed[False-backward] 4.2368ms 4.0596ms 246.3274 Ops/s 250.5850 Ops/s $\color{#d91a1a}-1.70\%$
test_ddpg_speed[True-None] 1.5357ms 1.3870ms 720.9554 Ops/s 730.0991 Ops/s $\color{#d91a1a}-1.25\%$
test_ddpg_speed[True-backward] 2.4511ms 2.3967ms 417.2490 Ops/s 404.5285 Ops/s $\color{#35bf28}+3.14\%$
test_ddpg_speed[reduce-overhead-None] 1.4344ms 1.3773ms 726.0436 Ops/s 700.5757 Ops/s $\color{#35bf28}+3.64\%$
test_sac_speed[False-None] 8.3787ms 7.8259ms 127.7814 Ops/s 127.2252 Ops/s $\color{#35bf28}+0.44\%$
test_sac_speed[False-backward] 11.4833ms 11.0103ms 90.8239 Ops/s 90.7898 Ops/s $\color{#35bf28}+0.04\%$
test_sac_speed[True-None] 2.5115ms 2.1100ms 473.9404 Ops/s 468.2052 Ops/s $\color{#35bf28}+1.22\%$
test_sac_speed[True-backward] 4.1190ms 3.9943ms 250.3542 Ops/s 255.3355 Ops/s $\color{#d91a1a}-1.95\%$
test_sac_speed[reduce-overhead-None] 2.4890ms 2.1036ms 475.3721 Ops/s 468.1829 Ops/s $\color{#35bf28}+1.54\%$
test_redq_speed[False-None] 14.9551ms 10.4820ms 95.4015 Ops/s 97.8064 Ops/s $\color{#d91a1a}-2.46\%$
test_redq_speed[False-backward] 19.3041ms 17.8452ms 56.0376 Ops/s 57.8568 Ops/s $\color{#d91a1a}-3.14\%$
test_redq_speed[True-None] 4.7267ms 4.2619ms 234.6348 Ops/s 242.4526 Ops/s $\color{#d91a1a}-3.22\%$
test_redq_speed[True-backward] 9.9757ms 9.5402ms 104.8200 Ops/s 108.7723 Ops/s $\color{#d91a1a}-3.63\%$
test_redq_speed[reduce-overhead-None] 4.5062ms 4.1765ms 239.4327 Ops/s 246.0445 Ops/s $\color{#d91a1a}-2.69\%$
test_redq_deprec_speed[False-None] 11.5868ms 11.0295ms 90.6663 Ops/s 89.9017 Ops/s $\color{#35bf28}+0.85\%$
test_redq_deprec_speed[False-backward] 16.2973ms 15.9643ms 62.6396 Ops/s 62.6472 Ops/s $\color{#d91a1a}-0.01\%$
test_redq_deprec_speed[True-None] 3.9608ms 3.5650ms 280.5083 Ops/s 272.5234 Ops/s $\color{#35bf28}+2.93\%$
test_redq_deprec_speed[True-backward] 7.8483ms 7.5744ms 132.0244 Ops/s 130.1795 Ops/s $\color{#35bf28}+1.42\%$
test_redq_deprec_speed[reduce-overhead-None] 3.7714ms 3.5086ms 285.0139 Ops/s 268.3246 Ops/s $\textbf{\color{#35bf28}+6.22\%}$
test_td3_speed[False-None] 8.0878ms 7.8753ms 126.9793 Ops/s 125.2033 Ops/s $\color{#35bf28}+1.42\%$
test_td3_speed[False-backward] 11.4439ms 10.7525ms 93.0020 Ops/s 93.1620 Ops/s $\color{#d91a1a}-0.17\%$
test_td3_speed[True-None] 1.8203ms 1.7772ms 562.6679 Ops/s 565.7414 Ops/s $\color{#d91a1a}-0.54\%$
test_td3_speed[True-backward] 3.6692ms 3.5546ms 281.3266 Ops/s 258.9591 Ops/s $\textbf{\color{#35bf28}+8.64\%}$
test_td3_speed[reduce-overhead-None] 1.8055ms 1.7496ms 571.5595 Ops/s 573.3679 Ops/s $\color{#d91a1a}-0.32\%$
test_cql_speed[False-None] 28.4633ms 25.5552ms 39.1310 Ops/s 38.0022 Ops/s $\color{#35bf28}+2.97\%$
test_cql_speed[False-backward] 38.4954ms 35.3017ms 28.3273 Ops/s 27.6287 Ops/s $\color{#35bf28}+2.53\%$
test_cql_speed[True-None] 12.5455ms 12.1634ms 82.2140 Ops/s 83.3532 Ops/s $\color{#d91a1a}-1.37\%$
test_cql_speed[True-backward] 18.9491ms 18.1648ms 55.0516 Ops/s 55.6123 Ops/s $\color{#d91a1a}-1.01\%$
test_cql_speed[reduce-overhead-None] 12.7896ms 12.3875ms 80.7263 Ops/s 82.2593 Ops/s $\color{#d91a1a}-1.86\%$
test_a2c_speed[False-None] 5.5921ms 5.4069ms 184.9501 Ops/s 184.6684 Ops/s $\color{#35bf28}+0.15\%$
test_a2c_speed[False-backward] 12.3043ms 11.8626ms 84.2983 Ops/s 84.3125 Ops/s $\color{#d91a1a}-0.02\%$
test_a2c_speed[True-None] 3.9643ms 3.6920ms 270.8557 Ops/s 264.6226 Ops/s $\color{#35bf28}+2.36\%$
test_a2c_speed[True-backward] 8.8217ms 8.5763ms 116.6005 Ops/s 119.6106 Ops/s $\color{#d91a1a}-2.52\%$
test_a2c_speed[reduce-overhead-None] 4.0610ms 3.6657ms 272.8028 Ops/s 274.1690 Ops/s $\color{#d91a1a}-0.50\%$
test_ppo_speed[False-None] 6.5560ms 5.8920ms 169.7205 Ops/s 173.0584 Ops/s $\color{#d91a1a}-1.93\%$
test_ppo_speed[False-backward] 13.0158ms 12.5468ms 79.7019 Ops/s 81.3172 Ops/s $\color{#d91a1a}-1.99\%$
test_ppo_speed[True-None] 3.6751ms 3.5736ms 279.8298 Ops/s 266.7194 Ops/s $\color{#35bf28}+4.92\%$
test_ppo_speed[True-backward] 8.7951ms 8.3620ms 119.5887 Ops/s 120.7678 Ops/s $\color{#d91a1a}-0.98\%$
test_ppo_speed[reduce-overhead-None] 3.9520ms 3.5596ms 280.9288 Ops/s 280.5872 Ops/s $\color{#35bf28}+0.12\%$
test_reinforce_speed[False-None] 4.6144ms 4.4397ms 225.2398 Ops/s 223.4180 Ops/s $\color{#35bf28}+0.82\%$
test_reinforce_speed[False-backward] 7.5996ms 7.3520ms 136.0183 Ops/s 137.8588 Ops/s $\color{#d91a1a}-1.34\%$
test_reinforce_speed[True-None] 3.0362ms 2.8424ms 351.8188 Ops/s 353.0828 Ops/s $\color{#d91a1a}-0.36\%$
test_reinforce_speed[True-backward] 8.0901ms 7.7673ms 128.7449 Ops/s 122.9706 Ops/s $\color{#35bf28}+4.70\%$
test_reinforce_speed[reduce-overhead-None] 3.2243ms 2.8400ms 352.1077 Ops/s 339.9731 Ops/s $\color{#35bf28}+3.57\%$
test_iql_speed[False-None] 25.8974ms 20.0837ms 49.7917 Ops/s 48.7743 Ops/s $\color{#35bf28}+2.09\%$
test_iql_speed[False-backward] 36.2971ms 30.5087ms 32.7775 Ops/s 32.3498 Ops/s $\color{#35bf28}+1.32\%$
test_iql_speed[True-None] 8.9936ms 8.3758ms 119.3912 Ops/s 120.1457 Ops/s $\color{#d91a1a}-0.63\%$
test_iql_speed[True-backward] 17.1195ms 16.6533ms 60.0480 Ops/s 57.5508 Ops/s $\color{#35bf28}+4.34\%$
test_iql_speed[reduce-overhead-None] 8.7702ms 8.3817ms 119.3074 Ops/s 119.4122 Ops/s $\color{#d91a1a}-0.09\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9688ms 5.8443ms 171.1055 Ops/s 170.9364 Ops/s $\color{#35bf28}+0.10\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.8214ms 0.3758ms 2.6607 KOps/s 2.9555 KOps/s $\textbf{\color{#d91a1a}-9.97\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6839ms 0.3530ms 2.8329 KOps/s 2.8588 KOps/s $\color{#d91a1a}-0.91\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.7792ms 5.5750ms 179.3721 Ops/s 177.8493 Ops/s $\color{#35bf28}+0.86\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8520ms 0.3636ms 2.7506 KOps/s 2.8043 KOps/s $\color{#d91a1a}-1.91\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5792ms 0.3395ms 2.9451 KOps/s 2.9354 KOps/s $\color{#35bf28}+0.33\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7136ms 1.3826ms 723.2771 Ops/s 722.5288 Ops/s $\color{#35bf28}+0.10\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5723ms 1.3147ms 760.6169 Ops/s 759.3229 Ops/s $\color{#35bf28}+0.17\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.4097ms 5.8646ms 170.5144 Ops/s 176.8324 Ops/s $\color{#d91a1a}-3.57\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8891ms 0.4855ms 2.0598 KOps/s 1.9803 KOps/s $\color{#35bf28}+4.01\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8290ms 0.4779ms 2.0926 KOps/s 2.0809 KOps/s $\color{#35bf28}+0.57\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8369ms 5.5899ms 178.8927 Ops/s 178.1934 Ops/s $\color{#35bf28}+0.39\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8125ms 0.3497ms 2.8594 KOps/s 3.5533 KOps/s $\textbf{\color{#d91a1a}-19.53\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4796ms 0.3085ms 3.2410 KOps/s 3.6583 KOps/s $\textbf{\color{#d91a1a}-11.41\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.7689ms 5.5716ms 179.4826 Ops/s 178.3065 Ops/s $\color{#35bf28}+0.66\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8953ms 0.3209ms 3.1163 KOps/s 2.8123 KOps/s $\textbf{\color{#35bf28}+10.81\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6743ms 0.2589ms 3.8623 KOps/s 3.0038 KOps/s $\textbf{\color{#35bf28}+28.58\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9987ms 5.7365ms 174.3237 Ops/s 173.1419 Ops/s $\color{#35bf28}+0.68\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7816ms 0.4646ms 2.1523 KOps/s 2.0344 KOps/s $\textbf{\color{#35bf28}+5.80\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7216ms 0.4632ms 2.1590 KOps/s 2.1056 KOps/s $\color{#35bf28}+2.54\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.2263ms 4.8429ms 206.4875 Ops/s 203.1713 Ops/s $\color{#35bf28}+1.63\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.8174ms 2.1008ms 476.0133 Ops/s 478.9948 Ops/s $\color{#d91a1a}-0.62\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.0470ms 1.0900ms 917.4004 Ops/s 787.6100 Ops/s $\textbf{\color{#35bf28}+16.48\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5238s 15.3568ms 65.1179 Ops/s 60.4920 Ops/s $\textbf{\color{#35bf28}+7.65\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.8104ms 1.7405ms 574.5352 Ops/s 585.8140 Ops/s $\color{#d91a1a}-1.93\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.9733ms 0.8490ms 1.1778 KOps/s 783.8037 Ops/s $\textbf{\color{#35bf28}+50.27\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.4551ms 5.1322ms 194.8463 Ops/s 192.8493 Ops/s $\color{#35bf28}+1.04\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.1896ms 2.0260ms 493.5829 Ops/s 522.7464 Ops/s $\textbf{\color{#d91a1a}-5.58\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.1657ms 1.1563ms 864.8388 Ops/s 914.4937 Ops/s $\textbf{\color{#d91a1a}-5.43\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.4488ms 35.4512ms 28.2078 Ops/s 28.0049 Ops/s $\color{#35bf28}+0.72\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.6358ms 18.0301ms 55.4629 Ops/s 55.5474 Ops/s $\color{#d91a1a}-0.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.4339ms 36.9128ms 27.0909 Ops/s 27.2100 Ops/s $\color{#d91a1a}-0.44\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.3243ms 18.4055ms 54.3315 Ops/s 54.7451 Ops/s $\color{#d91a1a}-0.76\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.8355ms 38.2518ms 26.1426 Ops/s 25.9517 Ops/s $\color{#35bf28}+0.74\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.3795ms 19.9766ms 50.0585 Ops/s 50.3821 Ops/s $\color{#d91a1a}-0.64\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8181ms 0.2202ms 4.5421 KOps/s 4.4479 KOps/s $\color{#35bf28}+2.12\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7222ms 1.4166ms 705.8961 Ops/s 709.3287 Ops/s $\color{#d91a1a}-0.48\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.5131ms 2.3315ms 428.9042 Ops/s 431.3287 Ops/s $\color{#d91a1a}-0.56\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0526ms 2.9106ms 343.5699 Ops/s 339.3822 Ops/s $\color{#35bf28}+1.23\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2206ms 0.1352ms 7.3952 KOps/s 7.4278 KOps/s $\color{#d91a1a}-0.44\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.4180ms 0.2009ms 4.9776 KOps/s 5.1393 KOps/s $\color{#d91a1a}-3.15\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1616ms 1.7599ms 568.2226 Ops/s 567.1324 Ops/s $\color{#35bf28}+0.19\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4614ms 1.2963ms 771.4118 Ops/s 777.6909 Ops/s $\color{#d91a1a}-0.81\%$
test_collector_stack_then_write[50-img_shape0-small] 1.8086ms 1.1232ms 890.3116 Ops/s 916.3264 Ops/s $\color{#d91a1a}-2.84\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.3897ms 3.4687ms 288.2951 Ops/s 281.9993 Ops/s $\color{#35bf28}+2.23\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.0012ms 5.5858ms 179.0257 Ops/s 174.5965 Ops/s $\color{#35bf28}+2.54\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 14.8807ms 7.2109ms 138.6788 Ops/s 143.5728 Ops/s $\color{#d91a1a}-3.41\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4196ms 0.2666ms 3.7516 KOps/s 3.5439 KOps/s $\textbf{\color{#35bf28}+5.86\%}$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6432ms 1.5130ms 660.9327 Ops/s 647.8856 Ops/s $\color{#35bf28}+2.01\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8590ms 2.4567ms 407.0557 Ops/s 409.0772 Ops/s $\color{#d91a1a}-0.49\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.2721ms 3.1024ms 322.3287 Ops/s 313.8539 Ops/s $\color{#35bf28}+2.70\%$
test_collector_without_rb[100-img_shape0-atari] 32.6390ms 32.1357ms 31.1180 Ops/s 30.8951 Ops/s $\color{#35bf28}+0.72\%$
test_collector_without_rb[200-img_shape1-large_batch] 63.6997ms 63.3362ms 15.7888 Ops/s 15.6227 Ops/s $\color{#35bf28}+1.06\%$
test_collector_with_rb[100-img_shape0-atari] 37.2297ms 36.6437ms 27.2898 Ops/s 26.8581 Ops/s $\color{#35bf28}+1.61\%$
test_collector_with_rb[200-img_shape1-large_batch] 71.5556ms 71.0817ms 14.0683 Ops/s 13.3805 Ops/s $\textbf{\color{#35bf28}+5.14\%}$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 11, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 84.7610μs 83.1606μs 12.0249 KOps/s 12.3730 KOps/s $\color{#d91a1a}-2.81\%$
test_tensor_to_bytestream_speed[torch.save] 0.1451ms 0.1446ms 6.9173 KOps/s 7.1471 KOps/s $\color{#d91a1a}-3.21\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1128s 0.1122s 8.9137 Ops/s 9.0184 Ops/s $\color{#d91a1a}-1.16\%$
test_tensor_to_bytestream_speed[numpy] 2.5994μs 2.5922μs 385.7770 KOps/s 368.3385 KOps/s $\color{#35bf28}+4.73\%$
test_tensor_to_bytestream_speed[safetensors] 37.0149μs 36.8843μs 27.1118 KOps/s 27.7306 KOps/s $\color{#d91a1a}-2.23\%$
test_simple 0.8192s 0.8109s 1.2332 Ops/s 1.2193 Ops/s $\color{#35bf28}+1.14\%$
test_transformed 1.3882s 1.3870s 0.7210 Ops/s 0.7065 Ops/s $\color{#35bf28}+2.05\%$
test_serial 2.3055s 2.3031s 0.4342 Ops/s 0.4300 Ops/s $\color{#35bf28}+0.97\%$
test_parallel 1.9073s 1.8542s 0.5393 Ops/s 0.5438 Ops/s $\color{#d91a1a}-0.83\%$
test_step_mdp_speed[True-True-True-True-True] 0.4826ms 42.6115μs 23.4678 KOps/s 23.8751 KOps/s $\color{#d91a1a}-1.71\%$
test_step_mdp_speed[True-True-True-True-False] 47.5510μs 23.6400μs 42.3012 KOps/s 41.8761 KOps/s $\color{#35bf28}+1.01\%$
test_step_mdp_speed[True-True-True-False-True] 0.4641ms 23.5961μs 42.3798 KOps/s 41.6018 KOps/s $\color{#35bf28}+1.87\%$
test_step_mdp_speed[True-True-True-False-False] 0.4326ms 13.0004μs 76.9208 KOps/s 75.6175 KOps/s $\color{#35bf28}+1.72\%$
test_step_mdp_speed[True-True-False-True-True] 0.4727ms 44.8640μs 22.2896 KOps/s 22.1963 KOps/s $\color{#35bf28}+0.42\%$
test_step_mdp_speed[True-True-False-True-False] 93.8610μs 25.4853μs 39.2383 KOps/s 37.5540 KOps/s $\color{#35bf28}+4.48\%$
test_step_mdp_speed[True-True-False-False-True] 52.3810μs 26.5133μs 37.7169 KOps/s 37.4627 KOps/s $\color{#35bf28}+0.68\%$
test_step_mdp_speed[True-True-False-False-False] 42.7900μs 15.8391μs 63.1348 KOps/s 62.5094 KOps/s $\color{#35bf28}+1.00\%$
test_step_mdp_speed[True-False-True-True-True] 75.1410μs 47.9330μs 20.8625 KOps/s 20.5248 KOps/s $\color{#35bf28}+1.65\%$
test_step_mdp_speed[True-False-True-True-False] 63.9110μs 29.3311μs 34.0935 KOps/s 34.0087 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[True-False-True-False-True] 61.4310μs 26.6483μs 37.5258 KOps/s 37.0238 KOps/s $\color{#35bf28}+1.36\%$
test_step_mdp_speed[True-False-True-False-False] 49.8410μs 15.8367μs 63.1444 KOps/s 61.7160 KOps/s $\color{#35bf28}+2.31\%$
test_step_mdp_speed[True-False-False-True-True] 87.7510μs 49.8942μs 20.0424 KOps/s 19.7406 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[True-False-False-True-False] 67.4610μs 31.7735μs 31.4728 KOps/s 31.0895 KOps/s $\color{#35bf28}+1.23\%$
test_step_mdp_speed[True-False-False-False-True] 57.0110μs 28.9465μs 34.5465 KOps/s 34.5452 KOps/s $+0.00\%$
test_step_mdp_speed[True-False-False-False-False] 52.1610μs 18.1730μs 55.0266 KOps/s 53.6012 KOps/s $\color{#35bf28}+2.66\%$
test_step_mdp_speed[False-True-True-True-True] 97.8820μs 47.8949μs 20.8791 KOps/s 20.5935 KOps/s $\color{#35bf28}+1.39\%$
test_step_mdp_speed[False-True-True-True-False] 60.9410μs 28.8810μs 34.6248 KOps/s 34.4869 KOps/s $\color{#35bf28}+0.40\%$
test_step_mdp_speed[False-True-True-False-True] 2.4719ms 30.5160μs 32.7697 KOps/s 31.9825 KOps/s $\color{#35bf28}+2.46\%$
test_step_mdp_speed[False-True-True-False-False] 46.6310μs 17.5261μs 57.0576 KOps/s 55.9630 KOps/s $\color{#35bf28}+1.96\%$
test_step_mdp_speed[False-True-False-True-True] 81.9210μs 50.0490μs 19.9804 KOps/s 19.4033 KOps/s $\color{#35bf28}+2.97\%$
test_step_mdp_speed[False-True-False-True-False] 64.0110μs 31.5288μs 31.7171 KOps/s 30.8757 KOps/s $\color{#35bf28}+2.72\%$
test_step_mdp_speed[False-True-False-False-True] 72.6710μs 32.5371μs 30.7342 KOps/s 30.1425 KOps/s $\color{#35bf28}+1.96\%$
test_step_mdp_speed[False-True-False-False-False] 55.4000μs 20.2874μs 49.2916 KOps/s 48.8750 KOps/s $\color{#35bf28}+0.85\%$
test_step_mdp_speed[False-False-True-True-True] 0.1001ms 53.1291μs 18.8221 KOps/s 18.4358 KOps/s $\color{#35bf28}+2.10\%$
test_step_mdp_speed[False-False-True-True-False] 67.1910μs 34.8531μs 28.6918 KOps/s 28.2946 KOps/s $\color{#35bf28}+1.40\%$
test_step_mdp_speed[False-False-True-False-True] 92.3810μs 32.6766μs 30.6030 KOps/s 30.1424 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[False-False-True-False-False] 52.1110μs 20.2374μs 49.4134 KOps/s 48.4531 KOps/s $\color{#35bf28}+1.98\%$
test_step_mdp_speed[False-False-False-True-True] 0.1086ms 55.0268μs 18.1729 KOps/s 17.6118 KOps/s $\color{#35bf28}+3.19\%$
test_step_mdp_speed[False-False-False-True-False] 70.8810μs 36.8797μs 27.1152 KOps/s 26.9548 KOps/s $\color{#35bf28}+0.60\%$
test_step_mdp_speed[False-False-False-False-True] 66.3210μs 34.9424μs 28.6186 KOps/s 28.3245 KOps/s $\color{#35bf28}+1.04\%$
test_step_mdp_speed[False-False-False-False-False] 54.1110μs 22.3630μs 44.7168 KOps/s 43.8168 KOps/s $\color{#35bf28}+2.05\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8481s 0.7493s 1.3346 Ops/s 1.3449 Ops/s $\color{#d91a1a}-0.77\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7052s 0.6106s 1.6376 Ops/s 1.6361 Ops/s $\color{#35bf28}+0.09\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7374s 1.6571s 0.6035 Ops/s 0.6123 Ops/s $\color{#d91a1a}-1.45\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5062s 1.4281s 0.7002 Ops/s 0.7076 Ops/s $\color{#d91a1a}-1.04\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9721s 1.8945s 0.5279 Ops/s 0.5298 Ops/s $\color{#d91a1a}-0.36\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7525s 1.6733s 0.5976 Ops/s 0.6001 Ops/s $\color{#d91a1a}-0.41\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7922s 4.6671s 0.2143 Ops/s 0.2156 Ops/s $\color{#d91a1a}-0.63\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5636s 4.4712s 0.2237 Ops/s 0.2217 Ops/s $\color{#35bf28}+0.88\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9431s 1.8698s 0.5348 Ops/s 0.5399 Ops/s $\color{#d91a1a}-0.94\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6604s 1.5677s 0.6379 Ops/s 0.6081 Ops/s $\color{#35bf28}+4.89\%$
test_values[generalized_advantage_estimate-True-True] 22.1273ms 21.3333ms 46.8750 Ops/s 45.6914 Ops/s $\color{#35bf28}+2.59\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1289s 3.5083ms 285.0346 Ops/s 289.4282 Ops/s $\color{#d91a1a}-1.52\%$
test_values[td0_return_estimate-False-False] 0.1090ms 85.0834μs 11.7532 KOps/s 11.6995 KOps/s $\color{#35bf28}+0.46\%$
test_values[td1_return_estimate-False-False] 50.7610ms 49.8217ms 20.0716 Ops/s 18.9489 Ops/s $\textbf{\color{#35bf28}+5.92\%}$
test_values[vec_td1_return_estimate-False-False] 1.2977ms 1.1015ms 907.8234 Ops/s 905.1059 Ops/s $\color{#35bf28}+0.30\%$
test_values[td_lambda_return_estimate-True-False] 85.2269ms 81.2219ms 12.3120 Ops/s 11.9115 Ops/s $\color{#35bf28}+3.36\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2745ms 1.0996ms 909.4095 Ops/s 906.4631 Ops/s $\color{#35bf28}+0.33\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.4207ms 21.2611ms 47.0343 Ops/s 44.1078 Ops/s $\textbf{\color{#35bf28}+6.63\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0380ms 0.7722ms 1.2949 KOps/s 1.2869 KOps/s $\color{#35bf28}+0.63\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7711ms 0.6989ms 1.4307 KOps/s 1.4108 KOps/s $\color{#35bf28}+1.41\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5674ms 1.5046ms 664.6261 Ops/s 662.0883 Ops/s $\color{#35bf28}+0.38\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7644ms 0.7060ms 1.4163 KOps/s 1.3937 KOps/s $\color{#35bf28}+1.63\%$
test_dqn_speed[False-None] 1.6813ms 1.5513ms 644.6394 Ops/s 619.7769 Ops/s $\color{#35bf28}+4.01\%$
test_dqn_speed[False-backward] 2.2568ms 2.2045ms 453.6160 Ops/s 450.4654 Ops/s $\color{#35bf28}+0.70\%$
test_dqn_speed[True-None] 0.6332ms 0.5579ms 1.7924 KOps/s 1.7566 KOps/s $\color{#35bf28}+2.03\%$
test_dqn_speed[True-backward] 1.2477ms 1.2156ms 822.6312 Ops/s 900.3989 Ops/s $\textbf{\color{#d91a1a}-8.64\%}$
test_dqn_speed[reduce-overhead-None] 0.6383ms 0.5791ms 1.7267 KOps/s 1.6813 KOps/s $\color{#35bf28}+2.70\%$
test_ddpg_speed[False-None] 3.2830ms 2.9079ms 343.8873 Ops/s 343.6017 Ops/s $\color{#35bf28}+0.08\%$
test_ddpg_speed[False-backward] 4.8930ms 4.3509ms 229.8374 Ops/s 237.9246 Ops/s $\color{#d91a1a}-3.40\%$
test_ddpg_speed[True-None] 1.5330ms 1.3161ms 759.8429 Ops/s 749.2552 Ops/s $\color{#35bf28}+1.41\%$
test_ddpg_speed[True-backward] 2.5977ms 2.5295ms 395.3412 Ops/s 414.0479 Ops/s $\color{#d91a1a}-4.52\%$
test_ddpg_speed[reduce-overhead-None] 1.4408ms 1.3400ms 746.2528 Ops/s 740.9660 Ops/s $\color{#35bf28}+0.71\%$
test_sac_speed[False-None] 9.0608ms 8.4283ms 118.6485 Ops/s 118.3420 Ops/s $\color{#35bf28}+0.26\%$
test_sac_speed[False-backward] 12.1695ms 11.7084ms 85.4086 Ops/s 87.1876 Ops/s $\color{#d91a1a}-2.04\%$
test_sac_speed[True-None] 1.9481ms 1.8000ms 555.5650 Ops/s 550.1410 Ops/s $\color{#35bf28}+0.99\%$
test_sac_speed[True-backward] 3.6915ms 3.5944ms 278.2128 Ops/s 287.9167 Ops/s $\color{#d91a1a}-3.37\%$
test_sac_speed[reduce-overhead-None] 19.3798ms 10.9514ms 91.3122 Ops/s 82.7672 Ops/s $\textbf{\color{#35bf28}+10.32\%}$
test_redq_deprec_speed[False-None] 10.0965ms 9.4157ms 106.2060 Ops/s 105.8973 Ops/s $\color{#35bf28}+0.29\%$
test_redq_deprec_speed[False-backward] 15.2172ms 13.0487ms 76.6357 Ops/s 79.2583 Ops/s $\color{#d91a1a}-3.31\%$
test_redq_deprec_speed[True-None] 2.7997ms 2.5313ms 395.0571 Ops/s 391.2829 Ops/s $\color{#35bf28}+0.96\%$
test_redq_deprec_speed[True-backward] 4.3584ms 4.3061ms 232.2307 Ops/s 228.3082 Ops/s $\color{#35bf28}+1.72\%$
test_redq_deprec_speed[reduce-overhead-None] 15.8655ms 9.7171ms 102.9113 Ops/s 101.4821 Ops/s $\color{#35bf28}+1.41\%$
test_td3_speed[False-None] 8.3979ms 8.2677ms 120.9528 Ops/s 120.9012 Ops/s $\color{#35bf28}+0.04\%$
test_td3_speed[False-backward] 11.2443ms 10.8949ms 91.7864 Ops/s 91.4020 Ops/s $\color{#35bf28}+0.42\%$
test_td3_speed[True-None] 1.6412ms 1.6109ms 620.7737 Ops/s 613.2629 Ops/s $\color{#35bf28}+1.22\%$
test_td3_speed[True-backward] 3.3060ms 3.2465ms 308.0238 Ops/s 301.8835 Ops/s $\color{#35bf28}+2.03\%$
test_td3_speed[reduce-overhead-None] 47.3168ms 24.3153ms 41.1264 Ops/s 40.3773 Ops/s $\color{#35bf28}+1.86\%$
test_cql_speed[False-None] 17.8728ms 17.4611ms 57.2702 Ops/s 57.1856 Ops/s $\color{#35bf28}+0.15\%$
test_cql_speed[False-backward] 23.9197ms 23.2026ms 43.0987 Ops/s 43.1899 Ops/s $\color{#d91a1a}-0.21\%$
test_cql_speed[True-None] 3.8947ms 3.2453ms 308.1421 Ops/s 303.1374 Ops/s $\color{#35bf28}+1.65\%$
test_cql_speed[True-backward] 5.4279ms 5.3095ms 188.3412 Ops/s 179.4357 Ops/s $\color{#35bf28}+4.96\%$
test_cql_speed[reduce-overhead-None] 18.9141ms 11.7922ms 84.8015 Ops/s 84.3956 Ops/s $\color{#35bf28}+0.48\%$
test_a2c_speed[False-None] 3.8298ms 3.2625ms 306.5134 Ops/s 304.1415 Ops/s $\color{#35bf28}+0.78\%$
test_a2c_speed[False-backward] 6.6161ms 6.2291ms 160.5358 Ops/s 153.8849 Ops/s $\color{#35bf28}+4.32\%$
test_a2c_speed[True-None] 1.4130ms 1.3110ms 762.7876 Ops/s 749.4505 Ops/s $\color{#35bf28}+1.78\%$
test_a2c_speed[True-backward] 2.9927ms 2.9534ms 338.5922 Ops/s 319.7593 Ops/s $\textbf{\color{#35bf28}+5.89\%}$
test_a2c_speed[reduce-overhead-None] 1.0467ms 0.9701ms 1.0309 KOps/s 1.0312 KOps/s $\color{#d91a1a}-0.04\%$
test_ppo_speed[False-None] 4.1081ms 3.9166ms 255.3249 Ops/s 255.6596 Ops/s $\color{#d91a1a}-0.13\%$
test_ppo_speed[False-backward] 7.4798ms 7.0460ms 141.9236 Ops/s 135.2346 Ops/s $\color{#35bf28}+4.95\%$
test_ppo_speed[True-None] 1.4943ms 1.4097ms 709.3819 Ops/s 706.4761 Ops/s $\color{#35bf28}+0.41\%$
test_ppo_speed[True-backward] 3.2651ms 3.0938ms 323.2243 Ops/s 306.9252 Ops/s $\textbf{\color{#35bf28}+5.31\%}$
test_ppo_speed[reduce-overhead-None] 1.1199ms 1.0130ms 987.1373 Ops/s 940.2372 Ops/s $\color{#35bf28}+4.99\%$
test_reinforce_speed[False-None] 2.6445ms 2.2913ms 436.4312 Ops/s 424.5032 Ops/s $\color{#35bf28}+2.81\%$
test_reinforce_speed[False-backward] 3.7772ms 3.3225ms 300.9773 Ops/s 285.8334 Ops/s $\textbf{\color{#35bf28}+5.30\%}$
test_reinforce_speed[True-None] 1.3519ms 1.2501ms 799.9516 Ops/s 789.5504 Ops/s $\color{#35bf28}+1.32\%$
test_reinforce_speed[True-backward] 2.9581ms 2.9177ms 342.7371 Ops/s 321.6441 Ops/s $\textbf{\color{#35bf28}+6.56\%}$
test_reinforce_speed[reduce-overhead-None] 17.3270ms 9.5757ms 104.4305 Ops/s 105.2319 Ops/s $\color{#d91a1a}-0.76\%$
test_iql_speed[False-None] 10.1139ms 9.4923ms 105.3481 Ops/s 104.2122 Ops/s $\color{#35bf28}+1.09\%$
test_iql_speed[False-backward] 13.9042ms 13.2595ms 75.4178 Ops/s 73.4012 Ops/s $\color{#35bf28}+2.75\%$
test_iql_speed[True-None] 2.3111ms 2.1476ms 465.6321 Ops/s 458.8146 Ops/s $\color{#35bf28}+1.49\%$
test_iql_speed[True-backward] 5.3509ms 4.8148ms 207.6936 Ops/s 202.9988 Ops/s $\color{#35bf28}+2.31\%$
test_iql_speed[reduce-overhead-None] 17.9580ms 10.5514ms 94.7741 Ops/s 95.6738 Ops/s $\color{#d91a1a}-0.94\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4228ms 6.0810ms 164.4475 Ops/s 166.8136 Ops/s $\color{#d91a1a}-1.42\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0691ms 0.3489ms 2.8663 KOps/s 2.8310 KOps/s $\color{#35bf28}+1.25\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6707ms 0.3282ms 3.0471 KOps/s 3.0161 KOps/s $\color{#35bf28}+1.03\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2071ms 5.9108ms 169.1809 Ops/s 172.1487 Ops/s $\color{#d91a1a}-1.72\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0038ms 0.2762ms 3.6201 KOps/s 2.9857 KOps/s $\textbf{\color{#35bf28}+21.25\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4755ms 0.2621ms 3.8151 KOps/s 3.8191 KOps/s $\color{#d91a1a}-0.10\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6560ms 1.2783ms 782.2831 Ops/s 699.2697 Ops/s $\textbf{\color{#35bf28}+11.87\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4627ms 1.1943ms 837.2778 Ops/s 738.5401 Ops/s $\textbf{\color{#35bf28}+13.37\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.4153ms 6.1206ms 163.3815 Ops/s 168.4102 Ops/s $\color{#d91a1a}-2.99\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.8673ms 0.4347ms 2.3005 KOps/s 2.0635 KOps/s $\textbf{\color{#35bf28}+11.48\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6047ms 0.4171ms 2.3975 KOps/s 2.1268 KOps/s $\textbf{\color{#35bf28}+12.72\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0992ms 5.9717ms 167.4563 Ops/s 171.6300 Ops/s $\color{#d91a1a}-2.43\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.8804ms 0.3640ms 2.7475 KOps/s 2.7310 KOps/s $\color{#35bf28}+0.60\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5397ms 0.3484ms 2.8703 KOps/s 2.9675 KOps/s $\color{#d91a1a}-3.27\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2774ms 5.9584ms 167.8300 Ops/s 172.0864 Ops/s $\color{#d91a1a}-2.47\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0440ms 0.3455ms 2.8943 KOps/s 3.0306 KOps/s $\color{#d91a1a}-4.50\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6039ms 0.3464ms 2.8872 KOps/s 3.2035 KOps/s $\textbf{\color{#d91a1a}-9.87\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.3823ms 6.1317ms 163.0866 Ops/s 166.2483 Ops/s $\color{#d91a1a}-1.90\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.3465ms 0.4904ms 2.0392 KOps/s 2.0013 KOps/s $\color{#35bf28}+1.90\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7108ms 0.4612ms 2.1680 KOps/s 2.0894 KOps/s $\color{#35bf28}+3.76\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5929s 16.8038ms 59.5102 Ops/s 51.5266 Ops/s $\textbf{\color{#35bf28}+15.49\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 11.4936ms 2.0868ms 479.2056 Ops/s 519.1640 Ops/s $\textbf{\color{#d91a1a}-7.70\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.2175ms 1.2855ms 777.9186 Ops/s 1.0652 KOps/s $\textbf{\color{#d91a1a}-26.97\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.5168ms 5.0348ms 198.6169 Ops/s 193.1683 Ops/s $\color{#35bf28}+2.82\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 13.0044ms 2.0231ms 494.2839 Ops/s 523.4076 Ops/s $\textbf{\color{#d91a1a}-5.56\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.9233ms 1.1741ms 851.7336 Ops/s 1.0337 KOps/s $\textbf{\color{#d91a1a}-17.61\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.4364ms 5.2644ms 189.9546 Ops/s 189.2740 Ops/s $\color{#35bf28}+0.36\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 7.0622ms 1.9964ms 500.9114 Ops/s 495.2700 Ops/s $\color{#35bf28}+1.14\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.2913ms 1.1025ms 907.0668 Ops/s 861.0159 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.6189ms 36.3188ms 27.5339 Ops/s 27.1940 Ops/s $\color{#35bf28}+1.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.7054ms 18.2422ms 54.8180 Ops/s 54.0395 Ops/s $\color{#35bf28}+1.44\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.6868ms 37.5510ms 26.6305 Ops/s 26.2210 Ops/s $\color{#35bf28}+1.56\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.5515ms 18.6268ms 53.6860 Ops/s 52.9896 Ops/s $\color{#35bf28}+1.31\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.9708ms 39.2782ms 25.4594 Ops/s 24.7286 Ops/s $\color{#35bf28}+2.96\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.6274ms 20.1589ms 49.6059 Ops/s 48.3221 Ops/s $\color{#35bf28}+2.66\%$
test_storage_write_lazystack[50-img_shape0-small] 0.9252ms 0.2313ms 4.3241 KOps/s 4.3931 KOps/s $\color{#d91a1a}-1.57\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.5857ms 1.3967ms 715.9899 Ops/s 727.9587 Ops/s $\color{#d91a1a}-1.64\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.6955ms 2.3175ms 431.4980 Ops/s 442.5485 Ops/s $\color{#d91a1a}-2.50\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0653ms 2.9036ms 344.4017 Ops/s 342.3438 Ops/s $\color{#35bf28}+0.60\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2534ms 0.1666ms 6.0026 KOps/s 5.8698 KOps/s $\color{#35bf28}+2.26\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.5921ms 0.2550ms 3.9220 KOps/s 4.3614 KOps/s $\textbf{\color{#d91a1a}-10.07\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9707ms 1.8275ms 547.1834 Ops/s 605.9984 Ops/s $\textbf{\color{#d91a1a}-9.71\%}$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5910ms 1.3193ms 757.9509 Ops/s 700.5931 Ops/s $\textbf{\color{#35bf28}+8.19\%}$
test_collector_stack_then_write[50-img_shape0-small] 1.4337ms 1.1639ms 859.1615 Ops/s 861.0279 Ops/s $\color{#d91a1a}-0.22\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8095ms 3.6947ms 270.6598 Ops/s 262.7613 Ops/s $\color{#35bf28}+3.01\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.2843ms 5.8796ms 170.0802 Ops/s 172.1729 Ops/s $\color{#d91a1a}-1.22\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.2808ms 7.0390ms 142.0646 Ops/s 141.2682 Ops/s $\color{#35bf28}+0.56\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4505ms 0.2871ms 3.4831 KOps/s 3.5486 KOps/s $\color{#d91a1a}-1.85\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6688ms 1.5136ms 660.6603 Ops/s 687.0513 Ops/s $\color{#d91a1a}-3.84\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8025ms 2.4221ms 412.8642 Ops/s 416.5414 Ops/s $\color{#d91a1a}-0.88\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4591ms 3.0971ms 322.8866 Ops/s 320.1158 Ops/s $\color{#35bf28}+0.87\%$
test_collector_without_rb[100-img_shape0-atari] 33.6628ms 33.1929ms 30.1269 Ops/s 29.8799 Ops/s $\color{#35bf28}+0.83\%$
test_collector_without_rb[200-img_shape1-large_batch] 65.7942ms 65.2296ms 15.3305 Ops/s 15.3989 Ops/s $\color{#d91a1a}-0.44\%$
test_collector_with_rb[100-img_shape0-atari] 38.2984ms 37.7465ms 26.4925 Ops/s 26.3787 Ops/s $\color{#35bf28}+0.43\%$
test_collector_with_rb[200-img_shape1-large_batch] 73.9023ms 73.4608ms 13.6127 Ops/s 13.5574 Ops/s $\color{#35bf28}+0.41\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 58.2483ms 56.5058ms 17.6973 Ops/s 17.3607 Ops/s $\color{#35bf28}+1.94\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1156s 0.1124s 8.8991 Ops/s 8.6919 Ops/s $\color{#35bf28}+2.38\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 58.3766ms 57.9459ms 17.2575 Ops/s 16.6591 Ops/s $\color{#35bf28}+3.59\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.7884s 0.1908s 5.2412 Ops/s 8.4102 Ops/s $\textbf{\color{#d91a1a}-37.68\%}$

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 21, 2026
)

Wires WeightSyncScheme into the server loop:
- init_on_receiver + connect at startup
- Non-blocking receive() poll between inference batches
- threading.Lock protects model during weight updates
- End-to-end tests and updated Sphinx docs with usage tutorial

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: bbb55bd
Pull-Request: #3497
Co-authored-by: Cursor <cursoragent@cursor.com>
@vmoens vmoens merged commit 575fa0b into gh/vmoens/239/base Feb 21, 2026
113 of 116 checks passed
@vmoens vmoens deleted the gh/vmoens/239/head branch February 21, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Documentation Improvements or additions to documentation Feature New feature Modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant