Skip to content

[Feature] Add SGLangWrapper policy module#3430

Merged
vmoens merged 8 commits intogh/vmoens/210/basefrom
gh/vmoens/210/head
Feb 3, 2026
Merged

[Feature] Add SGLangWrapper policy module#3430
vmoens merged 8 commits intogh/vmoens/210/basefrom
gh/vmoens/210/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Jan 31, 2026

Stack from ghstack (oldest at bottom):

Implement SGLangWrapper extending LLMWrapperBase for drop-in compatibility
with vLLMWrapper:

Input modes:

  • history: Conversation history with chat templates
  • text: Raw text prompts
  • tokens: Pre-tokenized input

Output structures:

  • Tokens: Generated token IDs with prompt/response/full
  • Text: Generated text with prompt/response/full
  • LogProbs: Per-token log probabilities (when available)
  • Masks: Attention and completion masks
  • ChatHistory: Updated conversation history

Features:

  • Batching support via async HTTP requests
  • Standardized parameter mapping to SGLang format
  • Policy version tracking for weight sync coordination
  • Compatible with ChatEnv and LLMCollector

Co-authored-by: Cursor cursoragent@cursor.com

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 31, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3430

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job, 2 Pending

As of commit 82c9ad9 with merge base 01413ca (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

  • Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
    WARNING: The directory '/github/home/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 31, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 148. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.5086μs 80.8986μs 12.3612 KOps/s 12.4683 KOps/s $\color{#d91a1a}-0.86\%$
test_tensor_to_bytestream_speed[torch.save] 0.1400ms 0.1393ms 7.1798 KOps/s 7.0887 KOps/s $\color{#35bf28}+1.28\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1113s 0.1113s 8.9883 Ops/s 7.9595 Ops/s $\textbf{\color{#35bf28}+12.93\%}$
test_tensor_to_bytestream_speed[numpy] 2.5756μs 2.5507μs 392.0432 KOps/s 380.0530 KOps/s $\color{#35bf28}+3.15\%$
test_tensor_to_bytestream_speed[safetensors] 37.2894μs 37.1035μs 26.9516 KOps/s 27.1203 KOps/s $\color{#d91a1a}-0.62\%$
test_simple 0.9168s 0.8263s 1.2102 Ops/s 1.2104 Ops/s $\color{#d91a1a}-0.02\%$
test_transformed 1.5574s 1.4661s 0.6821 Ops/s 0.6950 Ops/s $\color{#d91a1a}-1.86\%$
test_serial 2.4571s 2.3647s 0.4229 Ops/s 0.4318 Ops/s $\color{#d91a1a}-2.06\%$
test_parallel 2.0637s 1.9630s 0.5094 Ops/s 0.5021 Ops/s $\color{#35bf28}+1.45\%$
test_step_mdp_speed[True-True-True-True-True] 0.2143ms 45.2247μs 22.1118 KOps/s 21.8767 KOps/s $\color{#35bf28}+1.07\%$
test_step_mdp_speed[True-True-True-True-False] 61.8710μs 25.8304μs 38.7140 KOps/s 39.3066 KOps/s $\color{#d91a1a}-1.51\%$
test_step_mdp_speed[True-True-True-False-True] 54.4110μs 26.0839μs 38.3378 KOps/s 39.8518 KOps/s $\color{#d91a1a}-3.80\%$
test_step_mdp_speed[True-True-True-False-False] 45.0010μs 14.1985μs 70.4300 KOps/s 71.9191 KOps/s $\color{#d91a1a}-2.07\%$
test_step_mdp_speed[True-True-False-True-True] 83.5120μs 48.7486μs 20.5134 KOps/s 21.0099 KOps/s $\color{#d91a1a}-2.36\%$
test_step_mdp_speed[True-True-False-True-False] 65.4910μs 28.3630μs 35.2572 KOps/s 35.8197 KOps/s $\color{#d91a1a}-1.57\%$
test_step_mdp_speed[True-True-False-False-True] 67.1610μs 28.8526μs 34.6589 KOps/s 34.8248 KOps/s $\color{#d91a1a}-0.48\%$
test_step_mdp_speed[True-True-False-False-False] 54.7310μs 17.0382μs 58.6917 KOps/s 59.8067 KOps/s $\color{#d91a1a}-1.86\%$
test_step_mdp_speed[True-False-True-True-True] 89.1910μs 51.7659μs 19.3177 KOps/s 19.8477 KOps/s $\color{#d91a1a}-2.67\%$
test_step_mdp_speed[True-False-True-True-False] 70.9010μs 31.2241μs 32.0266 KOps/s 32.3486 KOps/s $\color{#d91a1a}-1.00\%$
test_step_mdp_speed[True-False-True-False-True] 65.9010μs 28.8769μs 34.6298 KOps/s 35.7974 KOps/s $\color{#d91a1a}-3.26\%$
test_step_mdp_speed[True-False-True-False-False] 52.1600μs 17.0432μs 58.6744 KOps/s 59.9001 KOps/s $\color{#d91a1a}-2.05\%$
test_step_mdp_speed[True-False-False-True-True] 92.4920μs 54.5452μs 18.3334 KOps/s 19.1855 KOps/s $\color{#d91a1a}-4.44\%$
test_step_mdp_speed[True-False-False-True-False] 69.5710μs 33.9375μs 29.4659 KOps/s 30.4231 KOps/s $\color{#d91a1a}-3.15\%$
test_step_mdp_speed[True-False-False-False-True] 67.4310μs 30.9058μs 32.3564 KOps/s 32.4758 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[True-False-False-False-False] 62.0210μs 19.5146μs 51.2438 KOps/s 51.6169 KOps/s $\color{#d91a1a}-0.72\%$
test_step_mdp_speed[False-True-True-True-True] 0.1754ms 51.9640μs 19.2441 KOps/s 19.4921 KOps/s $\color{#d91a1a}-1.27\%$
test_step_mdp_speed[False-True-True-True-False] 0.1242ms 31.1737μs 32.0783 KOps/s 32.1659 KOps/s $\color{#d91a1a}-0.27\%$
test_step_mdp_speed[False-True-True-False-True] 63.0310μs 32.0623μs 31.1893 KOps/s 30.8607 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[False-True-True-False-False] 63.1210μs 18.6453μs 53.6329 KOps/s 53.8605 KOps/s $\color{#d91a1a}-0.42\%$
test_step_mdp_speed[False-True-False-True-True] 2.7358ms 54.6801μs 18.2882 KOps/s 18.5510 KOps/s $\color{#d91a1a}-1.42\%$
test_step_mdp_speed[False-True-False-True-False] 75.2210μs 33.9450μs 29.4595 KOps/s 29.9569 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[False-True-False-False-True] 77.0120μs 35.0680μs 28.5160 KOps/s 28.7449 KOps/s $\color{#d91a1a}-0.80\%$
test_step_mdp_speed[False-True-False-False-False] 51.8110μs 21.3192μs 46.9062 KOps/s 46.4457 KOps/s $\color{#35bf28}+0.99\%$
test_step_mdp_speed[False-False-True-True-True] 0.1004ms 57.1624μs 17.4940 KOps/s 17.7280 KOps/s $\color{#d91a1a}-1.32\%$
test_step_mdp_speed[False-False-True-True-False] 69.3610μs 37.1528μs 26.9159 KOps/s 27.6164 KOps/s $\color{#d91a1a}-2.54\%$
test_step_mdp_speed[False-False-True-False-True] 68.5910μs 35.4194μs 28.2332 KOps/s 28.3652 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[False-False-True-False-False] 46.9910μs 21.5500μs 46.4037 KOps/s 46.9955 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[False-False-False-True-True] 0.1053ms 60.0276μs 16.6590 KOps/s 17.1990 KOps/s $\color{#d91a1a}-3.14\%$
test_step_mdp_speed[False-False-False-True-False] 92.7720μs 39.9354μs 25.0404 KOps/s 25.9136 KOps/s $\color{#d91a1a}-3.37\%$
test_step_mdp_speed[False-False-False-False-True] 84.3110μs 37.9968μs 26.3180 KOps/s 27.4655 KOps/s $\color{#d91a1a}-4.18\%$
test_step_mdp_speed[False-False-False-False-False] 55.0110μs 24.1759μs 41.3634 KOps/s 42.1468 KOps/s $\color{#d91a1a}-1.86\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7943s 0.7770s 1.2870 Ops/s 1.2990 Ops/s $\color{#d91a1a}-0.93\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7421s 0.6537s 1.5298 Ops/s 1.5818 Ops/s $\color{#d91a1a}-3.29\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.8151s 1.7343s 0.5766 Ops/s 0.5952 Ops/s $\color{#d91a1a}-3.12\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5684s 1.4922s 0.6702 Ops/s 0.6902 Ops/s $\color{#d91a1a}-2.91\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0660s 1.9789s 0.5053 Ops/s 0.5172 Ops/s $\color{#d91a1a}-2.30\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.8535s 1.7749s 0.5634 Ops/s 0.5857 Ops/s $\color{#d91a1a}-3.81\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7759s 4.7160s 0.2120 Ops/s 0.2157 Ops/s $\color{#d91a1a}-1.70\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5854s 4.5083s 0.2218 Ops/s 0.2245 Ops/s $\color{#d91a1a}-1.18\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.1103s 2.0283s 0.4930 Ops/s 0.5099 Ops/s $\color{#d91a1a}-3.30\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7684s 1.7008s 0.5879 Ops/s 0.6015 Ops/s $\color{#d91a1a}-2.26\%$
test_values[generalized_advantage_estimate-True-True] 20.7196ms 20.3959ms 49.0294 Ops/s 47.3879 Ops/s $\color{#35bf28}+3.46\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1397s 3.7217ms 268.6913 Ops/s 272.5878 Ops/s $\color{#d91a1a}-1.43\%$
test_values[td0_return_estimate-False-False] 0.1077ms 84.4136μs 11.8464 KOps/s 11.6937 KOps/s $\color{#35bf28}+1.31\%$
test_values[td1_return_estimate-False-False] 49.3988ms 48.8846ms 20.4563 Ops/s 19.4840 Ops/s $\color{#35bf28}+4.99\%$
test_values[vec_td1_return_estimate-False-False] 1.3299ms 1.0987ms 910.1337 Ops/s 902.4722 Ops/s $\color{#35bf28}+0.85\%$
test_values[td_lambda_return_estimate-True-False] 80.9842ms 80.3484ms 12.4458 Ops/s 11.9284 Ops/s $\color{#35bf28}+4.34\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2988ms 1.0939ms 914.1765 Ops/s 903.5199 Ops/s $\color{#35bf28}+1.18\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.7524ms 21.0047ms 47.6084 Ops/s 45.0405 Ops/s $\textbf{\color{#35bf28}+5.70\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0392ms 0.7635ms 1.3098 KOps/s 1.2786 KOps/s $\color{#35bf28}+2.43\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7596ms 0.6992ms 1.4303 KOps/s 1.3864 KOps/s $\color{#35bf28}+3.16\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5552ms 1.5020ms 665.7685 Ops/s 656.0582 Ops/s $\color{#35bf28}+1.48\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7828ms 0.7259ms 1.3776 KOps/s 1.3521 KOps/s $\color{#35bf28}+1.88\%$
test_dqn_speed[False-None] 1.6472ms 1.5516ms 644.4886 Ops/s 629.7958 Ops/s $\color{#35bf28}+2.33\%$
test_dqn_speed[False-backward] 2.3880ms 2.2133ms 451.8143 Ops/s 452.4972 Ops/s $\color{#d91a1a}-0.15\%$
test_dqn_speed[True-None] 1.0321ms 0.5446ms 1.8361 KOps/s 1.8180 KOps/s $\color{#35bf28}+0.99\%$
test_dqn_speed[True-backward] 1.2407ms 1.1959ms 836.1952 Ops/s 925.6138 Ops/s $\textbf{\color{#d91a1a}-9.66\%}$
test_dqn_speed[reduce-overhead-None] 0.6473ms 0.5712ms 1.7509 KOps/s 1.7123 KOps/s $\color{#35bf28}+2.25\%$
test_ddpg_speed[False-None] 3.4205ms 2.9473ms 339.2893 Ops/s 343.5349 Ops/s $\color{#d91a1a}-1.24\%$
test_ddpg_speed[False-backward] 4.8142ms 4.3477ms 230.0057 Ops/s 235.7027 Ops/s $\color{#d91a1a}-2.42\%$
test_ddpg_speed[True-None] 1.3962ms 1.2993ms 769.6647 Ops/s 745.9186 Ops/s $\color{#35bf28}+3.18\%$
test_ddpg_speed[True-backward] 2.5291ms 2.4971ms 400.4624 Ops/s 421.0746 Ops/s $\color{#d91a1a}-4.90\%$
test_ddpg_speed[reduce-overhead-None] 1.4310ms 1.3302ms 751.7854 Ops/s 747.4901 Ops/s $\color{#35bf28}+0.57\%$
test_sac_speed[False-None] 9.0447ms 8.5387ms 117.1133 Ops/s 117.7926 Ops/s $\color{#d91a1a}-0.58\%$
test_sac_speed[False-backward] 12.3694ms 11.8039ms 84.7180 Ops/s 86.3359 Ops/s $\color{#d91a1a}-1.87\%$
test_sac_speed[True-None] 1.8513ms 1.7878ms 559.3608 Ops/s 555.3547 Ops/s $\color{#35bf28}+0.72\%$
test_sac_speed[True-backward] 4.0212ms 3.5794ms 279.3754 Ops/s 283.2144 Ops/s $\color{#d91a1a}-1.36\%$
test_sac_speed[reduce-overhead-None] 18.8029ms 10.5726ms 94.5843 Ops/s 95.1620 Ops/s $\color{#d91a1a}-0.61\%$
test_redq_deprec_speed[False-None] 10.2548ms 9.5332ms 104.8968 Ops/s 105.2450 Ops/s $\color{#d91a1a}-0.33\%$
test_redq_deprec_speed[False-backward] 13.4314ms 12.9755ms 77.0681 Ops/s 78.3901 Ops/s $\color{#d91a1a}-1.69\%$
test_redq_deprec_speed[True-None] 2.8690ms 2.5376ms 394.0795 Ops/s 395.8487 Ops/s $\color{#d91a1a}-0.45\%$
test_redq_deprec_speed[True-backward] 4.6716ms 4.2758ms 233.8735 Ops/s 230.3264 Ops/s $\color{#35bf28}+1.54\%$
test_redq_deprec_speed[reduce-overhead-None] 15.1398ms 9.4391ms 105.9423 Ops/s 89.1664 Ops/s $\textbf{\color{#35bf28}+18.81\%}$
test_td3_speed[False-None] 8.4384ms 8.3132ms 120.2904 Ops/s 117.6938 Ops/s $\color{#35bf28}+2.21\%$
test_td3_speed[False-backward] 11.5583ms 11.0078ms 90.8449 Ops/s 88.9679 Ops/s $\color{#35bf28}+2.11\%$
test_td3_speed[True-None] 1.6799ms 1.6031ms 623.8022 Ops/s 620.4402 Ops/s $\color{#35bf28}+0.54\%$
test_td3_speed[True-backward] 3.6302ms 3.2377ms 308.8600 Ops/s 307.3587 Ops/s $\color{#35bf28}+0.49\%$
test_td3_speed[reduce-overhead-None] 57.0057ms 23.4216ms 42.6957 Ops/s 42.6671 Ops/s $\color{#35bf28}+0.07\%$
test_cql_speed[False-None] 17.8226ms 17.5754ms 56.8976 Ops/s 56.6921 Ops/s $\color{#35bf28}+0.36\%$
test_cql_speed[False-backward] 23.7340ms 23.3045ms 42.9101 Ops/s 42.7395 Ops/s $\color{#35bf28}+0.40\%$
test_cql_speed[True-None] 3.3170ms 3.2262ms 309.9611 Ops/s 307.9627 Ops/s $\color{#35bf28}+0.65\%$
test_cql_speed[True-backward] 5.8742ms 5.4658ms 182.9575 Ops/s 188.0276 Ops/s $\color{#d91a1a}-2.70\%$
test_cql_speed[reduce-overhead-None] 0.7094s 15.0785ms 66.3194 Ops/s 86.3298 Ops/s $\textbf{\color{#d91a1a}-23.18\%}$
test_a2c_speed[False-None] 3.9700ms 3.2976ms 303.2499 Ops/s 302.4060 Ops/s $\color{#35bf28}+0.28\%$
test_a2c_speed[False-backward] 6.9518ms 6.4953ms 153.9580 Ops/s 158.8506 Ops/s $\color{#d91a1a}-3.08\%$
test_a2c_speed[True-None] 1.4320ms 1.3249ms 754.7697 Ops/s 750.5232 Ops/s $\color{#35bf28}+0.57\%$
test_a2c_speed[True-backward] 3.1444ms 3.0839ms 324.2656 Ops/s 322.5847 Ops/s $\color{#35bf28}+0.52\%$
test_a2c_speed[reduce-overhead-None] 1.0650ms 0.9911ms 1.0089 KOps/s 1.0285 KOps/s $\color{#d91a1a}-1.90\%$
test_ppo_speed[False-None] 4.0143ms 3.9096ms 255.7818 Ops/s 255.4812 Ops/s $\color{#35bf28}+0.12\%$
test_ppo_speed[False-backward] 7.7753ms 7.3789ms 135.5212 Ops/s 135.0534 Ops/s $\color{#35bf28}+0.35\%$
test_ppo_speed[True-None] 1.6117ms 1.4028ms 712.8665 Ops/s 703.4818 Ops/s $\color{#35bf28}+1.33\%$
test_ppo_speed[True-backward] 3.3185ms 3.2496ms 307.7348 Ops/s 310.7963 Ops/s $\color{#d91a1a}-0.99\%$
test_ppo_speed[reduce-overhead-None] 1.2076ms 1.0530ms 949.6394 Ops/s 939.8222 Ops/s $\color{#35bf28}+1.04\%$
test_reinforce_speed[False-None] 2.4395ms 2.3201ms 431.0233 Ops/s 431.1666 Ops/s $\color{#d91a1a}-0.03\%$
test_reinforce_speed[False-backward] 3.9749ms 3.5107ms 284.8408 Ops/s 290.2212 Ops/s $\color{#d91a1a}-1.85\%$
test_reinforce_speed[True-None] 1.4287ms 1.2873ms 776.8365 Ops/s 790.0870 Ops/s $\color{#d91a1a}-1.68\%$
test_reinforce_speed[True-backward] 3.1048ms 3.0364ms 329.3421 Ops/s 340.3075 Ops/s $\color{#d91a1a}-3.22\%$
test_reinforce_speed[reduce-overhead-None] 16.7630ms 9.1099ms 109.7708 Ops/s 99.0223 Ops/s $\textbf{\color{#35bf28}+10.85\%}$
test_iql_speed[False-None] 10.2536ms 9.6331ms 103.8085 Ops/s 103.6891 Ops/s $\color{#35bf28}+0.12\%$
test_iql_speed[False-backward] 14.1519ms 13.6726ms 73.1391 Ops/s 74.1213 Ops/s $\color{#d91a1a}-1.33\%$
test_iql_speed[True-None] 2.2464ms 2.1549ms 464.0592 Ops/s 460.3994 Ops/s $\color{#35bf28}+0.79\%$
test_iql_speed[True-backward] 5.1895ms 4.7978ms 208.4288 Ops/s 211.0216 Ops/s $\color{#d91a1a}-1.23\%$
test_iql_speed[reduce-overhead-None] 17.9231ms 10.2651ms 97.4175 Ops/s 76.5306 Ops/s $\textbf{\color{#35bf28}+27.29\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1806ms 6.0028ms 166.5885 Ops/s 163.5871 Ops/s $\color{#35bf28}+1.83\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.3928ms 0.3847ms 2.5996 KOps/s 3.4271 KOps/s $\textbf{\color{#d91a1a}-24.14\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5440ms 0.3480ms 2.8736 KOps/s 3.6608 KOps/s $\textbf{\color{#d91a1a}-21.50\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1666ms 5.7877ms 172.7809 Ops/s 168.0156 Ops/s $\color{#35bf28}+2.84\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0137ms 0.3558ms 2.8109 KOps/s 3.3514 KOps/s $\textbf{\color{#d91a1a}-16.13\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6265ms 0.3442ms 2.9052 KOps/s 3.4643 KOps/s $\textbf{\color{#d91a1a}-16.14\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6797ms 1.4268ms 700.8595 Ops/s 769.3893 Ops/s $\textbf{\color{#d91a1a}-8.91\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.9008ms 1.3708ms 729.5036 Ops/s 813.9690 Ops/s $\textbf{\color{#d91a1a}-10.38\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1452ms 5.9439ms 168.2402 Ops/s 163.8588 Ops/s $\color{#35bf28}+2.67\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0429ms 0.4931ms 2.0282 KOps/s 2.0101 KOps/s $\color{#35bf28}+0.90\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6947ms 0.4699ms 2.1283 KOps/s 2.0908 KOps/s $\color{#35bf28}+1.79\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0845ms 5.9267ms 168.7277 Ops/s 166.7099 Ops/s $\color{#35bf28}+1.21\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5639ms 0.2861ms 3.4959 KOps/s 3.4465 KOps/s $\color{#35bf28}+1.43\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4999ms 0.2684ms 3.7260 KOps/s 3.6036 KOps/s $\color{#35bf28}+3.40\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1776ms 5.9096ms 169.2168 Ops/s 167.3369 Ops/s $\color{#35bf28}+1.12\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1047ms 0.3034ms 3.2960 KOps/s 3.4418 KOps/s $\color{#d91a1a}-4.24\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6051ms 0.3471ms 2.8814 KOps/s 3.1264 KOps/s $\textbf{\color{#d91a1a}-7.84\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 8.4253ms 6.0638ms 164.9133 Ops/s 163.4196 Ops/s $\color{#35bf28}+0.91\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3636ms 0.4967ms 2.0131 KOps/s 2.0848 KOps/s $\color{#d91a1a}-3.44\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6620ms 0.4805ms 2.0813 KOps/s 2.3455 KOps/s $\textbf{\color{#d91a1a}-11.26\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5987s 17.0993ms 58.4821 Ops/s 50.0475 Ops/s $\textbf{\color{#35bf28}+16.85\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.9875ms 1.8871ms 529.9226 Ops/s 524.9969 Ops/s $\color{#35bf28}+0.94\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 9.5259ms 1.2973ms 770.8395 Ops/s 1.0630 KOps/s $\textbf{\color{#d91a1a}-27.48\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.3289ms 5.3063ms 188.4544 Ops/s 193.3830 Ops/s $\color{#d91a1a}-2.55\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9918ms 1.7828ms 560.9061 Ops/s 526.0373 Ops/s $\textbf{\color{#35bf28}+6.63\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 9.0974ms 1.2721ms 786.0956 Ops/s 1.0655 KOps/s $\textbf{\color{#d91a1a}-26.23\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.0598ms 5.4434ms 183.7080 Ops/s 186.9671 Ops/s $\color{#d91a1a}-1.74\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 11.2825ms 2.1331ms 468.7972 Ops/s 501.6247 Ops/s $\textbf{\color{#d91a1a}-6.54\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.7201ms 1.1556ms 865.3178 Ops/s 890.8468 Ops/s $\color{#d91a1a}-2.87\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.8681ms 35.7218ms 27.9941 Ops/s 27.1762 Ops/s $\color{#35bf28}+3.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 0.5817s 29.4872ms 33.9130 Ops/s 53.6430 Ops/s $\textbf{\color{#d91a1a}-36.78\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.3345ms 37.4588ms 26.6960 Ops/s 26.3982 Ops/s $\color{#35bf28}+1.13\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.5039ms 18.6997ms 53.4768 Ops/s 52.4968 Ops/s $\color{#35bf28}+1.87\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.2544ms 38.7645ms 25.7968 Ops/s 25.3249 Ops/s $\color{#35bf28}+1.86\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 22.1592ms 20.2941ms 49.2755 Ops/s 48.8205 Ops/s $\color{#35bf28}+0.93\%$

@github-actions
Copy link
Contributor

github-actions bot commented Jan 31, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 153. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}27$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 86.1990μs 84.8967μs 11.7790 KOps/s 12.4075 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_tensor_to_bytestream_speed[torch.save] 0.1436ms 0.1417ms 7.0558 KOps/s 7.2086 KOps/s $\color{#d91a1a}-2.12\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1399s 0.1395s 7.1675 Ops/s 8.5402 Ops/s $\textbf{\color{#d91a1a}-16.07\%}$
test_tensor_to_bytestream_speed[numpy] 2.7709μs 2.7628μs 361.9524 KOps/s 377.1787 KOps/s $\color{#d91a1a}-4.04\%$
test_tensor_to_bytestream_speed[safetensors] 37.0886μs 36.8153μs 27.1626 KOps/s 26.8950 KOps/s $\color{#35bf28}+0.99\%$
test_simple 0.6756s 0.5869s 1.7038 Ops/s 1.7194 Ops/s $\color{#d91a1a}-0.91\%$
test_transformed 1.2773s 1.1859s 0.8433 Ops/s 0.8613 Ops/s $\color{#d91a1a}-2.09\%$
test_serial 1.7240s 1.7215s 0.5809 Ops/s 0.5899 Ops/s $\color{#d91a1a}-1.52\%$
test_parallel 1.2731s 1.2000s 0.8333 Ops/s 0.8516 Ops/s $\color{#d91a1a}-2.14\%$
test_step_mdp_speed[True-True-True-True-True] 0.2458ms 46.4525μs 21.5274 KOps/s 22.2829 KOps/s $\color{#d91a1a}-3.39\%$
test_step_mdp_speed[True-True-True-True-False] 60.3510μs 26.2839μs 38.0461 KOps/s 40.1001 KOps/s $\textbf{\color{#d91a1a}-5.12\%}$
test_step_mdp_speed[True-True-True-False-True] 51.4910μs 26.1926μs 38.1787 KOps/s 40.2395 KOps/s $\textbf{\color{#d91a1a}-5.12\%}$
test_step_mdp_speed[True-True-True-False-False] 45.8100μs 14.3268μs 69.7990 KOps/s 71.9738 KOps/s $\color{#d91a1a}-3.02\%$
test_step_mdp_speed[True-True-False-True-True] 0.1287ms 47.7752μs 20.9313 KOps/s 20.9053 KOps/s $\color{#35bf28}+0.12\%$
test_step_mdp_speed[True-True-False-True-False] 53.9210μs 28.6556μs 34.8972 KOps/s 35.8315 KOps/s $\color{#d91a1a}-2.61\%$
test_step_mdp_speed[True-True-False-False-True] 60.4600μs 29.1244μs 34.3355 KOps/s 36.6987 KOps/s $\textbf{\color{#d91a1a}-6.44\%}$
test_step_mdp_speed[True-True-False-False-False] 45.7410μs 17.2987μs 57.8078 KOps/s 60.2673 KOps/s $\color{#d91a1a}-4.08\%$
test_step_mdp_speed[True-False-True-True-True] 81.8510μs 52.3245μs 19.1115 KOps/s 19.7622 KOps/s $\color{#d91a1a}-3.29\%$
test_step_mdp_speed[True-False-True-True-False] 62.6400μs 32.0257μs 31.2249 KOps/s 32.8089 KOps/s $\color{#d91a1a}-4.83\%$
test_step_mdp_speed[True-False-True-False-True] 67.5510μs 29.5703μs 33.8177 KOps/s 35.4141 KOps/s $\color{#d91a1a}-4.51\%$
test_step_mdp_speed[True-False-True-False-False] 42.9400μs 17.5877μs 56.8580 KOps/s 59.9629 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_step_mdp_speed[True-False-False-True-True] 88.0410μs 55.5158μs 18.0129 KOps/s 18.6927 KOps/s $\color{#d91a1a}-3.64\%$
test_step_mdp_speed[True-False-False-True-False] 65.4110μs 35.0149μs 28.5593 KOps/s 29.6460 KOps/s $\color{#d91a1a}-3.67\%$
test_step_mdp_speed[True-False-False-False-True] 0.1033ms 31.8159μs 31.4308 KOps/s 32.9994 KOps/s $\color{#d91a1a}-4.75\%$
test_step_mdp_speed[True-False-False-False-False] 59.6700μs 20.2841μs 49.2997 KOps/s 51.2192 KOps/s $\color{#d91a1a}-3.75\%$
test_step_mdp_speed[False-True-True-True-True] 87.2410μs 52.7794μs 18.9468 KOps/s 19.7004 KOps/s $\color{#d91a1a}-3.83\%$
test_step_mdp_speed[False-True-True-True-False] 61.0910μs 32.0358μs 31.2151 KOps/s 32.5832 KOps/s $\color{#d91a1a}-4.20\%$
test_step_mdp_speed[False-True-True-False-True] 65.2910μs 33.7339μs 29.6438 KOps/s 31.3458 KOps/s $\textbf{\color{#d91a1a}-5.43\%}$
test_step_mdp_speed[False-True-True-False-False] 51.1810μs 19.2301μs 52.0018 KOps/s 54.5238 KOps/s $\color{#d91a1a}-4.63\%$
test_step_mdp_speed[False-True-False-True-True] 2.7176ms 55.6886μs 17.9570 KOps/s 18.9575 KOps/s $\textbf{\color{#d91a1a}-5.28\%}$
test_step_mdp_speed[False-True-False-True-False] 62.8510μs 34.7153μs 28.8057 KOps/s 29.9024 KOps/s $\color{#d91a1a}-3.67\%$
test_step_mdp_speed[False-True-False-False-True] 67.6310μs 36.2901μs 27.5557 KOps/s 29.5809 KOps/s $\textbf{\color{#d91a1a}-6.85\%}$
test_step_mdp_speed[False-True-False-False-False] 50.8210μs 22.0311μs 45.3904 KOps/s 47.9593 KOps/s $\textbf{\color{#d91a1a}-5.36\%}$
test_step_mdp_speed[False-False-True-True-True] 88.8020μs 58.3669μs 17.1330 KOps/s 17.7962 KOps/s $\color{#d91a1a}-3.73\%$
test_step_mdp_speed[False-False-True-True-False] 66.8710μs 37.9796μs 26.3299 KOps/s 27.5259 KOps/s $\color{#d91a1a}-4.34\%$
test_step_mdp_speed[False-False-True-False-True] 75.1310μs 35.8773μs 27.8728 KOps/s 29.0154 KOps/s $\color{#d91a1a}-3.94\%$
test_step_mdp_speed[False-False-True-False-False] 47.4110μs 22.2231μs 44.9983 KOps/s 47.2401 KOps/s $\color{#d91a1a}-4.75\%$
test_step_mdp_speed[False-False-False-True-True] 98.6910μs 60.2897μs 16.5866 KOps/s 17.2666 KOps/s $\color{#d91a1a}-3.94\%$
test_step_mdp_speed[False-False-False-True-False] 66.8010μs 40.2484μs 24.8457 KOps/s 25.7614 KOps/s $\color{#d91a1a}-3.55\%$
test_step_mdp_speed[False-False-False-False-True] 68.0510μs 37.9582μs 26.3448 KOps/s 27.3736 KOps/s $\color{#d91a1a}-3.76\%$
test_step_mdp_speed[False-False-False-False-False] 61.8310μs 24.5836μs 40.6776 KOps/s 42.3352 KOps/s $\color{#d91a1a}-3.92\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8814s 0.7861s 1.2722 Ops/s 1.2941 Ops/s $\color{#d91a1a}-1.69\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7396s 0.6443s 1.5522 Ops/s 1.5594 Ops/s $\color{#d91a1a}-0.47\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7881s 1.7134s 0.5836 Ops/s 0.5975 Ops/s $\color{#d91a1a}-2.32\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5573s 1.4813s 0.6751 Ops/s 0.6873 Ops/s $\color{#d91a1a}-1.78\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0450s 1.9671s 0.5084 Ops/s 0.5197 Ops/s $\color{#d91a1a}-2.19\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.8196s 1.7389s 0.5751 Ops/s 0.5865 Ops/s $\color{#d91a1a}-1.95\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7577s 4.6714s 0.2141 Ops/s 0.2134 Ops/s $\color{#35bf28}+0.29\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6061s 4.4540s 0.2245 Ops/s 0.2247 Ops/s $\color{#d91a1a}-0.09\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0586s 1.9827s 0.5044 Ops/s 0.5050 Ops/s $\color{#d91a1a}-0.13\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.8160s 1.7528s 0.5705 Ops/s 0.5773 Ops/s $\color{#d91a1a}-1.18\%$
test_values[generalized_advantage_estimate-True-True] 11.4995ms 11.3656ms 87.9849 Ops/s 94.3664 Ops/s $\textbf{\color{#d91a1a}-6.76\%}$
test_values[vec_generalized_advantage_estimate-True-True] 19.9324ms 17.5178ms 57.0849 Ops/s 56.1091 Ops/s $\color{#35bf28}+1.74\%$
test_values[td0_return_estimate-False-False] 0.2132ms 0.1332ms 7.5061 KOps/s 7.7469 KOps/s $\color{#d91a1a}-3.11\%$
test_values[td1_return_estimate-False-False] 32.2809ms 31.2207ms 32.0300 Ops/s 33.9100 Ops/s $\textbf{\color{#d91a1a}-5.54\%}$
test_values[vec_td1_return_estimate-False-False] 21.2943ms 17.7671ms 56.2837 Ops/s 55.7653 Ops/s $\color{#35bf28}+0.93\%$
test_values[td_lambda_return_estimate-True-False] 47.9509ms 46.4800ms 21.5146 Ops/s 22.8981 Ops/s $\textbf{\color{#d91a1a}-6.04\%}$
test_values[vec_td_lambda_return_estimate-True-False] 18.0461ms 17.5937ms 56.8387 Ops/s 55.9120 Ops/s $\color{#35bf28}+1.66\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.3464ms 10.2712ms 97.3600 Ops/s 106.1280 Ops/s $\textbf{\color{#d91a1a}-8.26\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.9262ms 1.5407ms 649.0475 Ops/s 632.2171 Ops/s $\color{#35bf28}+2.66\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4764ms 0.4365ms 2.2910 KOps/s 2.3005 KOps/s $\color{#d91a1a}-0.41\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 34.6950ms 34.0493ms 29.3692 Ops/s 28.8558 Ops/s $\color{#35bf28}+1.78\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.9263ms 1.7356ms 576.1610 Ops/s 577.4219 Ops/s $\color{#d91a1a}-0.22\%$
test_dqn_speed[False-None] 1.5739ms 1.4422ms 693.4062 Ops/s 701.9476 Ops/s $\color{#d91a1a}-1.22\%$
test_dqn_speed[False-backward] 2.0206ms 1.9579ms 510.7430 Ops/s 512.4022 Ops/s $\color{#d91a1a}-0.32\%$
test_dqn_speed[True-None] 0.7096ms 0.5536ms 1.8064 KOps/s 1.7577 KOps/s $\color{#35bf28}+2.77\%$
test_dqn_speed[True-backward] 1.0597ms 1.0089ms 991.1351 Ops/s 832.5526 Ops/s $\textbf{\color{#35bf28}+19.05\%}$
test_dqn_speed[reduce-overhead-None] 1.0311ms 0.5371ms 1.8617 KOps/s 1.8043 KOps/s $\color{#35bf28}+3.18\%$
test_ddpg_speed[False-None] 3.3103ms 2.9497ms 339.0152 Ops/s 343.1103 Ops/s $\color{#d91a1a}-1.19\%$
test_ddpg_speed[False-backward] 4.2446ms 4.1722ms 239.6833 Ops/s 241.2871 Ops/s $\color{#d91a1a}-0.66\%$
test_ddpg_speed[True-None] 1.9281ms 1.4575ms 686.1076 Ops/s 671.4035 Ops/s $\color{#35bf28}+2.19\%$
test_ddpg_speed[True-backward] 2.4946ms 2.4515ms 407.9104 Ops/s 402.9131 Ops/s $\color{#35bf28}+1.24\%$
test_ddpg_speed[reduce-overhead-None] 1.5871ms 1.4418ms 693.5663 Ops/s 693.8006 Ops/s $\color{#d91a1a}-0.03\%$
test_sac_speed[False-None] 8.6453ms 8.2048ms 121.8800 Ops/s 118.8828 Ops/s $\color{#35bf28}+2.52\%$
test_sac_speed[False-backward] 11.8247ms 11.5019ms 86.9425 Ops/s 85.4141 Ops/s $\color{#35bf28}+1.79\%$
test_sac_speed[True-None] 2.6617ms 2.2225ms 449.9384 Ops/s 456.7580 Ops/s $\color{#d91a1a}-1.49\%$
test_sac_speed[True-backward] 4.2298ms 4.1225ms 242.5722 Ops/s 213.6772 Ops/s $\textbf{\color{#35bf28}+13.52\%}$
test_sac_speed[reduce-overhead-None] 2.6737ms 2.2018ms 454.1683 Ops/s 439.2401 Ops/s $\color{#35bf28}+3.40\%$
test_redq_speed[False-None] 14.9964ms 10.9202ms 91.5735 Ops/s 90.2476 Ops/s $\color{#35bf28}+1.47\%$
test_redq_speed[False-backward] 19.4334ms 18.1817ms 55.0004 Ops/s 56.1526 Ops/s $\color{#d91a1a}-2.05\%$
test_redq_speed[True-None] 4.6777ms 4.4929ms 222.5736 Ops/s 215.0822 Ops/s $\color{#35bf28}+3.48\%$
test_redq_speed[True-backward] 10.3859ms 10.0086ms 99.9140 Ops/s 101.8032 Ops/s $\color{#d91a1a}-1.86\%$
test_redq_speed[reduce-overhead-None] 4.8918ms 4.5284ms 220.8276 Ops/s 217.1389 Ops/s $\color{#35bf28}+1.70\%$
test_redq_deprec_speed[False-None] 12.0154ms 11.4458ms 87.3682 Ops/s 86.5015 Ops/s $\color{#35bf28}+1.00\%$
test_redq_deprec_speed[False-backward] 16.7452ms 16.2877ms 61.3960 Ops/s 59.9833 Ops/s $\color{#35bf28}+2.36\%$
test_redq_deprec_speed[True-None] 4.2976ms 3.8242ms 261.4906 Ops/s 265.0294 Ops/s $\color{#d91a1a}-1.34\%$
test_redq_deprec_speed[True-backward] 8.2198ms 7.8892ms 126.7563 Ops/s 130.1184 Ops/s $\color{#d91a1a}-2.58\%$
test_redq_deprec_speed[reduce-overhead-None] 4.2311ms 3.7485ms 266.7743 Ops/s 274.5291 Ops/s $\color{#d91a1a}-2.82\%$
test_td3_speed[False-None] 8.3363ms 8.2564ms 121.1174 Ops/s 123.1380 Ops/s $\color{#d91a1a}-1.64\%$
test_td3_speed[False-backward] 11.5842ms 11.1985ms 89.2978 Ops/s 90.2226 Ops/s $\color{#d91a1a}-1.02\%$
test_td3_speed[True-None] 1.9510ms 1.8910ms 528.8296 Ops/s 527.7470 Ops/s $\color{#35bf28}+0.21\%$
test_td3_speed[True-backward] 3.9601ms 3.7831ms 264.3359 Ops/s 268.7809 Ops/s $\color{#d91a1a}-1.65\%$
test_td3_speed[reduce-overhead-None] 1.9048ms 1.8453ms 541.9279 Ops/s 544.2935 Ops/s $\color{#d91a1a}-0.43\%$
test_cql_speed[False-None] 27.4053ms 26.6288ms 37.5533 Ops/s 37.6678 Ops/s $\color{#d91a1a}-0.30\%$
test_cql_speed[False-backward] 39.8187ms 36.1863ms 27.6348 Ops/s 27.1624 Ops/s $\color{#35bf28}+1.74\%$
test_cql_speed[True-None] 13.2831ms 12.6988ms 78.7476 Ops/s 79.4757 Ops/s $\color{#d91a1a}-0.92\%$
test_cql_speed[True-backward] 19.8948ms 18.9421ms 52.7926 Ops/s 53.5159 Ops/s $\color{#d91a1a}-1.35\%$
test_cql_speed[reduce-overhead-None] 15.5627ms 12.7568ms 78.3896 Ops/s 79.3640 Ops/s $\color{#d91a1a}-1.23\%$
test_a2c_speed[False-None] 5.8071ms 5.5857ms 179.0285 Ops/s 181.4131 Ops/s $\color{#d91a1a}-1.31\%$
test_a2c_speed[False-backward] 12.4748ms 12.1306ms 82.4365 Ops/s 83.5476 Ops/s $\color{#d91a1a}-1.33\%$
test_a2c_speed[True-None] 4.0240ms 3.8124ms 262.3028 Ops/s 256.9579 Ops/s $\color{#35bf28}+2.08\%$
test_a2c_speed[True-backward] 9.3346ms 8.8471ms 113.0316 Ops/s 113.4361 Ops/s $\color{#d91a1a}-0.36\%$
test_a2c_speed[reduce-overhead-None] 4.2053ms 3.8052ms 262.7967 Ops/s 265.3364 Ops/s $\color{#d91a1a}-0.96\%$
test_ppo_speed[False-None] 6.3465ms 6.0646ms 164.8915 Ops/s 167.6625 Ops/s $\color{#d91a1a}-1.65\%$
test_ppo_speed[False-backward] 13.1606ms 12.8052ms 78.0933 Ops/s 77.9797 Ops/s $\color{#35bf28}+0.15\%$
test_ppo_speed[True-None] 4.2294ms 3.7373ms 267.5749 Ops/s 265.4430 Ops/s $\color{#35bf28}+0.80\%$
test_ppo_speed[True-backward] 9.0990ms 8.6463ms 115.6558 Ops/s 104.1464 Ops/s $\textbf{\color{#35bf28}+11.05\%}$
test_ppo_speed[reduce-overhead-None] 4.3265ms 3.6941ms 270.7041 Ops/s 271.9409 Ops/s $\color{#d91a1a}-0.45\%$
test_reinforce_speed[False-None] 4.8961ms 4.6684ms 214.2058 Ops/s 215.2819 Ops/s $\color{#d91a1a}-0.50\%$
test_reinforce_speed[False-backward] 7.7143ms 7.4711ms 133.8491 Ops/s 133.7461 Ops/s $\color{#35bf28}+0.08\%$
test_reinforce_speed[True-None] 3.1867ms 2.9625ms 337.5566 Ops/s 332.6563 Ops/s $\color{#35bf28}+1.47\%$
test_reinforce_speed[True-backward] 8.1847ms 7.9412ms 125.9254 Ops/s 127.1736 Ops/s $\color{#d91a1a}-0.98\%$
test_reinforce_speed[reduce-overhead-None] 3.0673ms 2.9259ms 341.7707 Ops/s 333.3224 Ops/s $\color{#35bf28}+2.53\%$
test_iql_speed[False-None] 26.0131ms 21.0107ms 47.5947 Ops/s 48.2078 Ops/s $\color{#d91a1a}-1.27\%$
test_iql_speed[False-backward] 37.8521ms 31.3674ms 31.8802 Ops/s 32.3541 Ops/s $\color{#d91a1a}-1.46\%$
test_iql_speed[True-None] 10.2945ms 8.8883ms 112.5072 Ops/s 114.9895 Ops/s $\color{#d91a1a}-2.16\%$
test_iql_speed[True-backward] 17.8129ms 17.1037ms 58.4670 Ops/s 56.4758 Ops/s $\color{#35bf28}+3.53\%$
test_iql_speed[reduce-overhead-None] 9.2487ms 8.7731ms 113.9849 Ops/s 112.4437 Ops/s $\color{#35bf28}+1.37\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.6726ms 6.1121ms 163.6098 Ops/s 162.9206 Ops/s $\color{#35bf28}+0.42\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.2833ms 0.3712ms 2.6937 KOps/s 2.6963 KOps/s $\color{#d91a1a}-0.10\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6303ms 0.3397ms 2.9434 KOps/s 2.8471 KOps/s $\color{#35bf28}+3.38\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.3327ms 5.8024ms 172.3437 Ops/s 170.6787 Ops/s $\color{#35bf28}+0.98\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.4286ms 0.3734ms 2.6778 KOps/s 3.0318 KOps/s $\textbf{\color{#d91a1a}-11.67\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5482ms 0.3524ms 2.8379 KOps/s 3.1097 KOps/s $\textbf{\color{#d91a1a}-8.74\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7086ms 1.4710ms 679.7918 Ops/s 768.4857 Ops/s $\textbf{\color{#d91a1a}-11.54\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.8634ms 1.3875ms 720.7046 Ops/s 820.0762 Ops/s $\textbf{\color{#d91a1a}-12.12\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2957ms 5.8954ms 169.6236 Ops/s 166.3348 Ops/s $\color{#35bf28}+1.98\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2351ms 0.5350ms 1.8692 KOps/s 2.2075 KOps/s $\textbf{\color{#d91a1a}-15.33\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7481ms 0.5172ms 1.9334 KOps/s 2.3457 KOps/s $\textbf{\color{#d91a1a}-17.58\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0324ms 5.7795ms 173.0251 Ops/s 168.9355 Ops/s $\color{#35bf28}+2.42\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6713ms 0.3755ms 2.6632 KOps/s 2.9513 KOps/s $\textbf{\color{#d91a1a}-9.76\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5558ms 0.3632ms 2.7532 KOps/s 3.0456 KOps/s $\textbf{\color{#d91a1a}-9.60\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0350ms 5.7590ms 173.6413 Ops/s 170.6106 Ops/s $\color{#35bf28}+1.78\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.5371ms 0.3730ms 2.6810 KOps/s 2.9477 KOps/s $\textbf{\color{#d91a1a}-9.05\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5397ms 0.3558ms 2.8108 KOps/s 3.1580 KOps/s $\textbf{\color{#d91a1a}-11.00\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2010ms 5.9567ms 167.8785 Ops/s 166.0159 Ops/s $\color{#35bf28}+1.12\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0950ms 0.5048ms 1.9810 KOps/s 1.8500 KOps/s $\textbf{\color{#35bf28}+7.08\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7641ms 0.4987ms 2.0052 KOps/s 1.9312 KOps/s $\color{#35bf28}+3.83\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5659s 16.4919ms 60.6358 Ops/s 55.3794 Ops/s $\textbf{\color{#35bf28}+9.49\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.9447ms 1.8738ms 533.6692 Ops/s 543.6757 Ops/s $\color{#d91a1a}-1.84\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.0970ms 0.9093ms 1.0998 KOps/s 1.0893 KOps/s $\color{#35bf28}+0.96\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.7252ms 5.3352ms 187.4353 Ops/s 193.9926 Ops/s $\color{#d91a1a}-3.38\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 11.2081ms 1.9610ms 509.9384 Ops/s 540.9641 Ops/s $\textbf{\color{#d91a1a}-5.74\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.8905ms 1.2505ms 799.6925 Ops/s 1.1059 KOps/s $\textbf{\color{#d91a1a}-27.69\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.1235ms 5.4412ms 183.7815 Ops/s 59.2096 Ops/s $\textbf{\color{#35bf28}+210.39\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.1953ms 1.9718ms 507.1576 Ops/s 521.3522 Ops/s $\color{#d91a1a}-2.72\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.8613ms 1.1401ms 877.1334 Ops/s 927.2469 Ops/s $\textbf{\color{#d91a1a}-5.40\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.3358ms 36.3011ms 27.5474 Ops/s 27.7424 Ops/s $\color{#d91a1a}-0.70\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.0018ms 18.5187ms 53.9994 Ops/s 54.8008 Ops/s $\color{#d91a1a}-1.46\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.1309ms 37.4414ms 26.7084 Ops/s 26.9105 Ops/s $\color{#d91a1a}-0.75\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.3565ms 18.7522ms 53.3271 Ops/s 54.0020 Ops/s $\color{#d91a1a}-1.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.0115ms 39.2660ms 25.4673 Ops/s 25.6014 Ops/s $\color{#d91a1a}-0.52\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.5841ms 20.3756ms 49.0783 Ops/s 50.2967 Ops/s $\color{#d91a1a}-2.42\%$

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
Implement SGLangWrapper extending LLMWrapperBase for drop-in compatibility
with vLLMWrapper:

Input modes:
- history: Conversation history with chat templates
- text: Raw text prompts
- tokens: Pre-tokenized input

Output structures:
- Tokens: Generated token IDs with prompt/response/full
- Text: Generated text with prompt/response/full
- LogProbs: Per-token log probabilities (when available)
- Masks: Attention and completion masks
- ChatHistory: Updated conversation history

Features:
- Batching support via async HTTP requests
- Standardized parameter mapping to SGLang format
- Policy version tracking for weight sync coordination
- Compatible with ChatEnv and LLMCollector

ghstack-source-id: e4da889
Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: e4da889
Pull-Request: #3430
@vmoens vmoens merged commit 82c9ad9 into gh/vmoens/210/base Feb 3, 2026
112 of 115 checks passed
@vmoens vmoens deleted the gh/vmoens/210/head branch February 3, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Feature New feature llm/ LLM-related PR, triggers LLM CI tests Modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant