Skip to content

[Feature] Add log_metrics method for efficient batch logging#3452

Merged
vmoens merged 1 commit intomainfrom
feat/log-metrics-batch
Feb 6, 2026
Merged

[Feature] Add log_metrics method for efficient batch logging#3452
vmoens merged 1 commit intomainfrom
feat/log-metrics-batch

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 5, 2026

Summary

  • Add log_metrics() method to the base Logger class for logging multiple scalar metrics at once
  • Add optimized implementations for WandbLogger and MLFlowLogger that use their native batch logging APIs (experiment.log() and mlflow.log_metrics() respectively)
  • Add _make_metrics_safe() utility that efficiently converts CUDA tensors to Python types by batching transfers

Motivation

When logging multiple tensor metrics from CUDA, calling .item() on each tensor triggers an implicit CUDA synchronization. With N metrics, that's N separate syncs.

The new implementation:

  1. Queues all CUDA→CPU transfers with non_blocking=True
  2. Synchronizes once via a CUDA event (which only waits for our transfers, not all GPU work)
  3. Converts the now-CPU tensors to Python scalars/lists

This is particularly useful when logging to services running in separate processes (e.g., Ray actors for wandb/mlflow) that may not have GPU access.

Test plan

  • Verify existing logger tests pass
  • Manual testing with wandb and mlflow loggers

Made with Cursor

Add log_metrics() method to Logger base class and optimized implementations
for WandbLogger and MLFlowLogger that use their native batch logging APIs.

The new _make_metrics_safe() utility batches CUDA->CPU tensor transfers using
non_blocking=True and synchronizes once via a CUDA event, avoiding the overhead
of multiple implicit synchronizations that would occur when calling .item() on
each CUDA tensor individually.

This is particularly useful when logging to services running in separate
processes (e.g., Ray actors) that may not have GPU access.

Co-authored-by: Cursor <[email protected]>
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3452

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending

As of commit 4a53323 with merge base 838410c (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2026
@github-actions github-actions bot added Record Feature New feature labels Feb 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.4614μs 80.5780μs 12.4103 KOps/s 12.3770 KOps/s $\color{#35bf28}+0.27\%$
test_tensor_to_bytestream_speed[torch.save] 0.1386ms 0.1379ms 7.2498 KOps/s 7.2268 KOps/s $\color{#35bf28}+0.32\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1080s 0.1076s 9.2910 Ops/s 9.3205 Ops/s $\color{#d91a1a}-0.32\%$
test_tensor_to_bytestream_speed[numpy] 2.7847μs 2.7767μs 360.1420 KOps/s 376.9557 KOps/s $\color{#d91a1a}-4.46\%$
test_tensor_to_bytestream_speed[safetensors] 38.7969μs 38.3029μs 26.1077 KOps/s 26.6963 KOps/s $\color{#d91a1a}-2.20\%$
test_simple 0.5569s 0.5522s 1.8110 Ops/s 1.7527 Ops/s $\color{#35bf28}+3.33\%$
test_transformed 1.1480s 1.1400s 0.8772 Ops/s 0.8691 Ops/s $\color{#35bf28}+0.94\%$
test_serial 1.6856s 1.6781s 0.5959 Ops/s 0.5916 Ops/s $\color{#35bf28}+0.74\%$
test_parallel 1.2034s 1.1448s 0.8735 Ops/s 0.7993 Ops/s $\textbf{\color{#35bf28}+9.29\%}$
test_step_mdp_speed[True-True-True-True-True] 0.1967ms 44.6157μs 22.4136 KOps/s 22.6675 KOps/s $\color{#d91a1a}-1.12\%$
test_step_mdp_speed[True-True-True-True-False] 61.8640μs 25.1374μs 39.7814 KOps/s 39.5165 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[True-True-True-False-True] 58.2330μs 24.5395μs 40.7505 KOps/s 40.3453 KOps/s $\color{#35bf28}+1.00\%$
test_step_mdp_speed[True-True-True-False-False] 35.2720μs 13.6327μs 73.3529 KOps/s 71.9528 KOps/s $\color{#35bf28}+1.95\%$
test_step_mdp_speed[True-True-False-True-True] 95.5760μs 47.8565μs 20.8958 KOps/s 21.0221 KOps/s $\color{#d91a1a}-0.60\%$
test_step_mdp_speed[True-True-False-True-False] 52.1730μs 27.3707μs 36.5354 KOps/s 35.8800 KOps/s $\color{#35bf28}+1.83\%$
test_step_mdp_speed[True-True-False-False-True] 56.4340μs 27.7415μs 36.0470 KOps/s 36.4015 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[True-True-False-False-False] 50.4430μs 16.4606μs 60.7513 KOps/s 60.0271 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[True-False-True-True-True] 0.1273ms 50.5206μs 19.7939 KOps/s 19.7768 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[True-False-True-True-False] 56.3330μs 30.4688μs 32.8205 KOps/s 32.1697 KOps/s $\color{#35bf28}+2.02\%$
test_step_mdp_speed[True-False-True-False-True] 63.1640μs 27.9480μs 35.7807 KOps/s 35.8689 KOps/s $\color{#d91a1a}-0.25\%$
test_step_mdp_speed[True-False-True-False-False] 43.3330μs 16.5159μs 60.5476 KOps/s 60.5138 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[True-False-False-True-True] 0.1014ms 52.3939μs 19.0862 KOps/s 19.0308 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-False-False-True-False] 68.8540μs 33.1706μs 30.1471 KOps/s 29.7469 KOps/s $\color{#35bf28}+1.35\%$
test_step_mdp_speed[True-False-False-False-True] 65.3140μs 29.7930μs 33.5649 KOps/s 33.5257 KOps/s $\color{#35bf28}+0.12\%$
test_step_mdp_speed[True-False-False-False-False] 53.6630μs 19.0774μs 52.4181 KOps/s 52.1478 KOps/s $\color{#35bf28}+0.52\%$
test_step_mdp_speed[False-True-True-True-True] 87.9250μs 50.1736μs 19.9308 KOps/s 19.9724 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-True-True-True-False] 62.3540μs 30.4328μs 32.8593 KOps/s 33.0311 KOps/s $\color{#d91a1a}-0.52\%$
test_step_mdp_speed[False-True-True-False-True] 2.3136ms 31.4851μs 31.7610 KOps/s 31.8409 KOps/s $\color{#d91a1a}-0.25\%$
test_step_mdp_speed[False-True-True-False-False] 47.6130μs 18.1649μs 55.0513 KOps/s 54.5094 KOps/s $\color{#35bf28}+0.99\%$
test_step_mdp_speed[False-True-False-True-True] 83.9550μs 53.0728μs 18.8420 KOps/s 18.8730 KOps/s $\color{#d91a1a}-0.16\%$
test_step_mdp_speed[False-True-False-True-False] 65.4140μs 33.3572μs 29.9785 KOps/s 29.7269 KOps/s $\color{#35bf28}+0.85\%$
test_step_mdp_speed[False-True-False-False-True] 56.5030μs 33.5438μs 29.8118 KOps/s 29.5509 KOps/s $\color{#35bf28}+0.88\%$
test_step_mdp_speed[False-True-False-False-False] 50.2830μs 20.8511μs 47.9592 KOps/s 47.7151 KOps/s $\color{#35bf28}+0.51\%$
test_step_mdp_speed[False-False-True-True-True] 86.5950μs 55.7126μs 17.9493 KOps/s 17.8845 KOps/s $\color{#35bf28}+0.36\%$
test_step_mdp_speed[False-False-True-True-False] 64.5440μs 36.5259μs 27.3778 KOps/s 27.8243 KOps/s $\color{#d91a1a}-1.60\%$
test_step_mdp_speed[False-False-True-False-True] 72.3440μs 34.0954μs 29.3295 KOps/s 29.5709 KOps/s $\color{#d91a1a}-0.82\%$
test_step_mdp_speed[False-False-True-False-False] 53.3730μs 21.0416μs 47.5249 KOps/s 47.5745 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[False-False-False-True-True] 87.1660μs 57.4048μs 17.4201 KOps/s 17.0730 KOps/s $\color{#35bf28}+2.03\%$
test_step_mdp_speed[False-False-False-True-False] 77.4150μs 38.3990μs 26.0423 KOps/s 25.4367 KOps/s $\color{#35bf28}+2.38\%$
test_step_mdp_speed[False-False-False-False-True] 69.8240μs 35.4924μs 28.1750 KOps/s 27.8671 KOps/s $\color{#35bf28}+1.10\%$
test_step_mdp_speed[False-False-False-False-False] 56.3130μs 23.2568μs 42.9981 KOps/s 42.2581 KOps/s $\color{#35bf28}+1.75\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7533s 0.7509s 1.3318 Ops/s 1.3002 Ops/s $\color{#35bf28}+2.43\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7343s 0.6389s 1.5651 Ops/s 1.5924 Ops/s $\color{#d91a1a}-1.72\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7699s 1.6953s 0.5899 Ops/s 0.5992 Ops/s $\color{#d91a1a}-1.56\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5477s 1.4639s 0.6831 Ops/s 0.6902 Ops/s $\color{#d91a1a}-1.03\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0159s 1.9386s 0.5158 Ops/s 0.5227 Ops/s $\color{#d91a1a}-1.32\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7947s 1.7123s 0.5840 Ops/s 0.5906 Ops/s $\color{#d91a1a}-1.13\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6890s 4.6419s 0.2154 Ops/s 0.2135 Ops/s $\color{#35bf28}+0.89\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5353s 4.4560s 0.2244 Ops/s 0.2245 Ops/s $\color{#d91a1a}-0.05\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.1580s 1.9947s 0.5013 Ops/s 0.5135 Ops/s $\color{#d91a1a}-2.38\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7294s 1.6550s 0.6042 Ops/s 0.6048 Ops/s $\color{#d91a1a}-0.09\%$
test_values[generalized_advantage_estimate-True-True] 10.4968ms 10.2505ms 97.5567 Ops/s 98.7468 Ops/s $\color{#d91a1a}-1.21\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.3543ms 17.7197ms 56.4345 Ops/s 91.3509 Ops/s $\textbf{\color{#d91a1a}-38.22\%}$
test_values[td0_return_estimate-False-False] 0.2393ms 0.1290ms 7.7518 KOps/s 8.4044 KOps/s $\textbf{\color{#d91a1a}-7.76\%}$
test_values[td1_return_estimate-False-False] 27.6980ms 27.3865ms 36.5143 Ops/s 37.7376 Ops/s $\color{#d91a1a}-3.24\%$
test_values[vec_td1_return_estimate-False-False] 17.9285ms 17.5528ms 56.9708 Ops/s 90.5211 Ops/s $\textbf{\color{#d91a1a}-37.06\%}$
test_values[td_lambda_return_estimate-True-False] 40.5893ms 39.9635ms 25.0228 Ops/s 25.5787 Ops/s $\color{#d91a1a}-2.17\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.4887ms 17.5774ms 56.8913 Ops/s 91.2741 Ops/s $\textbf{\color{#d91a1a}-37.67\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.2870ms 9.1538ms 109.2446 Ops/s 111.3785 Ops/s $\color{#d91a1a}-1.92\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.6645ms 1.4628ms 683.6061 Ops/s 670.0724 Ops/s $\color{#35bf28}+2.02\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4863ms 0.4121ms 2.4266 KOps/s 2.4718 KOps/s $\color{#d91a1a}-1.83\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.0423ms 34.3026ms 29.1523 Ops/s 32.4467 Ops/s $\textbf{\color{#d91a1a}-10.15\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8021ms 1.6893ms 591.9521 Ops/s 589.4316 Ops/s $\color{#35bf28}+0.43\%$
test_dqn_speed[False-None] 1.7853ms 1.3888ms 720.0696 Ops/s 729.9943 Ops/s $\color{#d91a1a}-1.36\%$
test_dqn_speed[False-backward] 1.9728ms 1.8847ms 530.5887 Ops/s 523.9962 Ops/s $\color{#35bf28}+1.26\%$
test_dqn_speed[True-None] 0.9462ms 0.5473ms 1.8273 KOps/s 1.8006 KOps/s $\color{#35bf28}+1.48\%$
test_dqn_speed[True-backward] 1.0556ms 0.9972ms 1.0028 KOps/s 846.0530 Ops/s $\textbf{\color{#35bf28}+18.53\%}$
test_dqn_speed[reduce-overhead-None] 0.9255ms 0.5373ms 1.8612 KOps/s 1.8102 KOps/s $\color{#35bf28}+2.82\%$
test_ddpg_speed[False-None] 3.1209ms 2.8197ms 354.6458 Ops/s 358.9023 Ops/s $\color{#d91a1a}-1.19\%$
test_ddpg_speed[False-backward] 4.2750ms 3.9903ms 250.6108 Ops/s 251.2818 Ops/s $\color{#d91a1a}-0.27\%$
test_ddpg_speed[True-None] 1.5133ms 1.4000ms 714.2749 Ops/s 689.8635 Ops/s $\color{#35bf28}+3.54\%$
test_ddpg_speed[True-backward] 2.4876ms 2.3805ms 420.0845 Ops/s 417.6643 Ops/s $\color{#35bf28}+0.58\%$
test_ddpg_speed[reduce-overhead-None] 1.7836ms 1.3945ms 717.0934 Ops/s 676.4110 Ops/s $\textbf{\color{#35bf28}+6.01\%}$
test_sac_speed[False-None] 8.5979ms 7.9317ms 126.0760 Ops/s 125.9197 Ops/s $\color{#35bf28}+0.12\%$
test_sac_speed[False-backward] 11.5769ms 11.0922ms 90.1538 Ops/s 89.6411 Ops/s $\color{#35bf28}+0.57\%$
test_sac_speed[True-None] 2.5247ms 2.1566ms 463.6845 Ops/s 462.5521 Ops/s $\color{#35bf28}+0.24\%$
test_sac_speed[True-backward] 4.1766ms 4.0079ms 249.5062 Ops/s 216.1369 Ops/s $\textbf{\color{#35bf28}+15.44\%}$
test_sac_speed[reduce-overhead-None] 2.4300ms 2.1530ms 464.4720 Ops/s 444.4382 Ops/s $\color{#35bf28}+4.51\%$
test_redq_speed[False-None] 10.8376ms 10.2834ms 97.2439 Ops/s 98.3959 Ops/s $\color{#d91a1a}-1.17\%$
test_redq_speed[False-backward] 21.9689ms 17.7581ms 56.3124 Ops/s 57.4742 Ops/s $\color{#d91a1a}-2.02\%$
test_redq_speed[True-None] 4.8693ms 4.4806ms 223.1852 Ops/s 222.0032 Ops/s $\color{#35bf28}+0.53\%$
test_redq_speed[True-backward] 10.1359ms 9.8033ms 102.0069 Ops/s 105.6767 Ops/s $\color{#d91a1a}-3.47\%$
test_redq_speed[reduce-overhead-None] 4.6983ms 4.4492ms 224.7620 Ops/s 225.9942 Ops/s $\color{#d91a1a}-0.55\%$
test_redq_deprec_speed[False-None] 13.9314ms 11.0707ms 90.3284 Ops/s 94.0136 Ops/s $\color{#d91a1a}-3.92\%$
test_redq_deprec_speed[False-backward] 16.1514ms 15.6208ms 64.0173 Ops/s 66.1503 Ops/s $\color{#d91a1a}-3.22\%$
test_redq_deprec_speed[True-None] 3.9153ms 3.6917ms 270.8756 Ops/s 266.2453 Ops/s $\color{#35bf28}+1.74\%$
test_redq_deprec_speed[True-backward] 7.7282ms 7.5121ms 133.1187 Ops/s 131.9493 Ops/s $\color{#35bf28}+0.89\%$
test_redq_deprec_speed[reduce-overhead-None] 3.8220ms 3.6116ms 276.8826 Ops/s 270.7538 Ops/s $\color{#35bf28}+2.26\%$
test_td3_speed[False-None] 8.1069ms 7.9204ms 126.2556 Ops/s 127.7908 Ops/s $\color{#d91a1a}-1.20\%$
test_td3_speed[False-backward] 11.4746ms 10.7485ms 93.0365 Ops/s 94.2192 Ops/s $\color{#d91a1a}-1.26\%$
test_td3_speed[True-None] 1.9179ms 1.8578ms 538.2754 Ops/s 532.8931 Ops/s $\color{#35bf28}+1.01\%$
test_td3_speed[True-backward] 3.7703ms 3.6636ms 272.9555 Ops/s 271.9285 Ops/s $\color{#35bf28}+0.38\%$
test_td3_speed[reduce-overhead-None] 1.8481ms 1.8145ms 551.1104 Ops/s 549.4502 Ops/s $\color{#35bf28}+0.30\%$
test_cql_speed[False-None] 29.1989ms 25.9199ms 38.5804 Ops/s 38.2811 Ops/s $\color{#35bf28}+0.78\%$
test_cql_speed[False-backward] 38.2744ms 35.5481ms 28.1309 Ops/s 28.5560 Ops/s $\color{#d91a1a}-1.49\%$
test_cql_speed[True-None] 12.6864ms 12.3145ms 81.2053 Ops/s 81.2399 Ops/s $\color{#d91a1a}-0.04\%$
test_cql_speed[True-backward] 18.9089ms 18.2759ms 54.7169 Ops/s 54.2314 Ops/s $\color{#35bf28}+0.90\%$
test_cql_speed[reduce-overhead-None] 12.7466ms 12.4364ms 80.4094 Ops/s 79.2353 Ops/s $\color{#35bf28}+1.48\%$
test_a2c_speed[False-None] 5.6860ms 5.4162ms 184.6330 Ops/s 185.8040 Ops/s $\color{#d91a1a}-0.63\%$
test_a2c_speed[False-backward] 12.0080ms 11.6674ms 85.7086 Ops/s 86.4557 Ops/s $\color{#d91a1a}-0.86\%$
test_a2c_speed[True-None] 4.2668ms 3.7409ms 267.3175 Ops/s 259.5182 Ops/s $\color{#35bf28}+3.01\%$
test_a2c_speed[True-backward] 8.7780ms 8.5660ms 116.7407 Ops/s 117.4977 Ops/s $\color{#d91a1a}-0.64\%$
test_a2c_speed[reduce-overhead-None] 4.0043ms 3.7006ms 270.2284 Ops/s 269.0018 Ops/s $\color{#35bf28}+0.46\%$
test_ppo_speed[False-None] 6.1148ms 5.9046ms 169.3603 Ops/s 169.6431 Ops/s $\color{#d91a1a}-0.17\%$
test_ppo_speed[False-backward] 12.5641ms 12.2501ms 81.6318 Ops/s 81.7433 Ops/s $\color{#d91a1a}-0.14\%$
test_ppo_speed[True-None] 3.7966ms 3.6325ms 275.2929 Ops/s 272.8051 Ops/s $\color{#35bf28}+0.91\%$
test_ppo_speed[True-backward] 8.7423ms 8.4057ms 118.9667 Ops/s 119.5534 Ops/s $\color{#d91a1a}-0.49\%$
test_ppo_speed[reduce-overhead-None] 4.0744ms 3.5838ms 279.0346 Ops/s 275.0976 Ops/s $\color{#35bf28}+1.43\%$
test_reinforce_speed[False-None] 4.9580ms 4.5386ms 220.3303 Ops/s 220.2620 Ops/s $\color{#35bf28}+0.03\%$
test_reinforce_speed[False-backward] 8.4428ms 7.2968ms 137.0464 Ops/s 136.1560 Ops/s $\color{#35bf28}+0.65\%$
test_reinforce_speed[True-None] 3.3501ms 2.9122ms 343.3813 Ops/s 352.4960 Ops/s $\color{#d91a1a}-2.59\%$
test_reinforce_speed[True-backward] 7.9546ms 7.7058ms 129.7720 Ops/s 131.7358 Ops/s $\color{#d91a1a}-1.49\%$
test_reinforce_speed[reduce-overhead-None] 3.0449ms 2.8624ms 349.3517 Ops/s 355.0034 Ops/s $\color{#d91a1a}-1.59\%$
test_iql_speed[False-None] 24.8385ms 20.0813ms 49.7975 Ops/s 49.8244 Ops/s $\color{#d91a1a}-0.05\%$
test_iql_speed[False-backward] 35.1311ms 30.2073ms 33.1046 Ops/s 33.4993 Ops/s $\color{#d91a1a}-1.18\%$
test_iql_speed[True-None] 8.8753ms 8.5128ms 117.4704 Ops/s 117.3690 Ops/s $\color{#35bf28}+0.09\%$
test_iql_speed[True-backward] 17.1279ms 16.7536ms 59.6885 Ops/s 59.3274 Ops/s $\color{#35bf28}+0.61\%$
test_iql_speed[reduce-overhead-None] 8.8476ms 8.5351ms 117.1626 Ops/s 117.8926 Ops/s $\color{#d91a1a}-0.62\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3568ms 6.1225ms 163.3310 Ops/s 166.6480 Ops/s $\color{#d91a1a}-1.99\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0766ms 0.3427ms 2.9178 KOps/s 3.2297 KOps/s $\textbf{\color{#d91a1a}-9.66\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6012ms 0.3448ms 2.9003 KOps/s 3.4132 KOps/s $\textbf{\color{#d91a1a}-15.03\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4581ms 5.9474ms 168.1410 Ops/s 171.2648 Ops/s $\color{#d91a1a}-1.82\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9598ms 0.3211ms 3.1139 KOps/s 3.7038 KOps/s $\textbf{\color{#d91a1a}-15.93\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5921ms 0.2936ms 3.4062 KOps/s 3.8135 KOps/s $\textbf{\color{#d91a1a}-10.68\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6696ms 1.4183ms 705.0547 Ops/s 813.1752 Ops/s $\textbf{\color{#d91a1a}-13.30\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6260ms 1.3372ms 747.8057 Ops/s 867.7748 Ops/s $\textbf{\color{#d91a1a}-13.82\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.0515ms 6.2054ms 161.1509 Ops/s 166.7211 Ops/s $\color{#d91a1a}-3.34\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9403ms 0.4822ms 2.0738 KOps/s 2.1851 KOps/s $\textbf{\color{#d91a1a}-5.09\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6880ms 0.4804ms 2.0817 KOps/s 2.2810 KOps/s $\textbf{\color{#d91a1a}-8.74\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1568ms 5.9210ms 168.8904 Ops/s 170.2488 Ops/s $\color{#d91a1a}-0.80\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9476ms 0.3483ms 2.8708 KOps/s 2.7157 KOps/s $\textbf{\color{#35bf28}+5.71\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6880ms 0.3305ms 3.0260 KOps/s 2.9835 KOps/s $\color{#35bf28}+1.42\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1476ms 5.8816ms 170.0207 Ops/s 170.4947 Ops/s $\color{#d91a1a}-0.28\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0077ms 0.3359ms 2.9770 KOps/s 3.2938 KOps/s $\textbf{\color{#d91a1a}-9.62\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4771ms 0.2876ms 3.4775 KOps/s 3.6308 KOps/s $\color{#d91a1a}-4.22\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.3558ms 6.1194ms 163.4134 Ops/s 167.0681 Ops/s $\color{#d91a1a}-2.19\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2033ms 0.4657ms 2.1471 KOps/s 682.7538 Ops/s $\textbf{\color{#35bf28}+214.48\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7606ms 0.4882ms 2.0483 KOps/s 2.4131 KOps/s $\textbf{\color{#d91a1a}-15.12\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.5729ms 5.0586ms 197.6835 Ops/s 197.7497 Ops/s $\color{#d91a1a}-0.03\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.8992ms 1.7722ms 564.2756 Ops/s 528.3085 Ops/s $\textbf{\color{#35bf28}+6.81\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.5655ms 0.9457ms 1.0574 KOps/s 1.0812 KOps/s $\color{#d91a1a}-2.20\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.9508ms 5.0432ms 198.2878 Ops/s 196.0083 Ops/s $\color{#35bf28}+1.16\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9626ms 1.7923ms 557.9542 Ops/s 569.8396 Ops/s $\color{#d91a1a}-2.09\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.6279ms 1.2197ms 819.8633 Ops/s 827.8539 Ops/s $\color{#d91a1a}-0.97\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5601s 16.3602ms 61.1238 Ops/s 59.8681 Ops/s $\color{#35bf28}+2.10\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.1092ms 1.9615ms 509.8046 Ops/s 454.2062 Ops/s $\textbf{\color{#35bf28}+12.24\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.6348ms 1.1025ms 907.0698 Ops/s 797.8891 Ops/s $\textbf{\color{#35bf28}+13.68\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 39.7336ms 36.5293ms 27.3752 Ops/s 27.8717 Ops/s $\color{#d91a1a}-1.78\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.2696ms 18.2801ms 54.7043 Ops/s 55.7323 Ops/s $\color{#d91a1a}-1.84\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.7569ms 37.8433ms 26.4248 Ops/s 27.0789 Ops/s $\color{#d91a1a}-2.42\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.2319ms 18.6709ms 53.5592 Ops/s 55.0542 Ops/s $\color{#d91a1a}-2.72\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.7783ms 39.0086ms 25.6354 Ops/s 25.8317 Ops/s $\color{#d91a1a}-0.76\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.3258ms 19.9349ms 50.1632 Ops/s 50.7751 Ops/s $\color{#d91a1a}-1.21\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8517ms 0.2209ms 4.5274 KOps/s 4.6894 KOps/s $\color{#d91a1a}-3.45\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.5620ms 1.3803ms 724.4869 Ops/s 720.7326 Ops/s $\color{#35bf28}+0.52\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.5251ms 2.3502ms 425.4891 Ops/s 422.6967 Ops/s $\color{#35bf28}+0.66\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.2807ms 2.9161ms 342.9293 Ops/s 348.9971 Ops/s $\color{#d91a1a}-1.74\%$
test_storage_write_contiguous[50-img_shape0-small] 0.4188ms 0.1370ms 7.3018 KOps/s 7.6413 KOps/s $\color{#d91a1a}-4.44\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3427ms 0.1842ms 5.4281 KOps/s 5.6255 KOps/s $\color{#d91a1a}-3.51\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9294ms 1.7447ms 573.1629 Ops/s 583.4312 Ops/s $\color{#d91a1a}-1.76\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5516ms 1.2810ms 780.6629 Ops/s 800.4875 Ops/s $\color{#d91a1a}-2.48\%$
test_collector_stack_then_write[50-img_shape0-small] 1.4661ms 1.1172ms 895.0589 Ops/s 903.6684 Ops/s $\color{#d91a1a}-0.95\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7765ms 3.6707ms 272.4285 Ops/s 277.7627 Ops/s $\color{#d91a1a}-1.92\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.6804ms 5.5499ms 180.1841 Ops/s 179.9117 Ops/s $\color{#35bf28}+0.15\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.0314ms 6.8953ms 145.0257 Ops/s 138.6958 Ops/s $\color{#35bf28}+4.56\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4773ms 0.2765ms 3.6171 KOps/s 3.6053 KOps/s $\color{#35bf28}+0.33\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6326ms 1.4953ms 668.7505 Ops/s 658.9280 Ops/s $\color{#35bf28}+1.49\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8106ms 2.4008ms 416.5246 Ops/s 403.1834 Ops/s $\color{#35bf28}+3.31\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3874ms 3.1051ms 322.0558 Ops/s 324.7921 Ops/s $\color{#d91a1a}-0.84\%$
test_collector_without_rb[100-img_shape0-atari] 34.8052ms 33.9353ms 29.4679 Ops/s 29.9861 Ops/s $\color{#d91a1a}-1.73\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.9799ms 66.5235ms 15.0323 Ops/s 15.0743 Ops/s $\color{#d91a1a}-0.28\%$
test_collector_with_rb[100-img_shape0-atari] 39.7356ms 38.4777ms 25.9891 Ops/s 26.2655 Ops/s $\color{#d91a1a}-1.05\%$
test_collector_with_rb[200-img_shape1-large_batch] 0.6688s 0.1185s 8.4421 Ops/s 13.3530 Ops/s $\textbf{\color{#d91a1a}-36.78\%}$

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.0003μs 80.0950μs 12.4852 KOps/s 12.1906 KOps/s $\color{#35bf28}+2.42\%$
test_tensor_to_bytestream_speed[torch.save] 0.1386ms 0.1382ms 7.2379 KOps/s 7.1847 KOps/s $\color{#35bf28}+0.74\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1164s 0.1162s 8.6084 Ops/s 8.8163 Ops/s $\color{#d91a1a}-2.36\%$
test_tensor_to_bytestream_speed[numpy] 2.4695μs 2.4640μs 405.8452 KOps/s 402.2750 KOps/s $\color{#35bf28}+0.89\%$
test_tensor_to_bytestream_speed[safetensors] 40.2926μs 39.8557μs 25.0905 KOps/s 26.4724 KOps/s $\textbf{\color{#d91a1a}-5.22\%}$
test_simple 0.7982s 0.7934s 1.2604 Ops/s 1.2210 Ops/s $\color{#35bf28}+3.22\%$
test_transformed 1.5393s 1.4459s 0.6916 Ops/s 0.6879 Ops/s $\color{#35bf28}+0.54\%$
test_serial 2.3996s 2.3096s 0.4330 Ops/s 0.4308 Ops/s $\color{#35bf28}+0.50\%$
test_parallel 2.0329s 1.9770s 0.5058 Ops/s 0.5232 Ops/s $\color{#d91a1a}-3.31\%$
test_step_mdp_speed[True-True-True-True-True] 0.3661ms 44.4186μs 22.5131 KOps/s 22.1219 KOps/s $\color{#35bf28}+1.77\%$
test_step_mdp_speed[True-True-True-True-False] 59.9410μs 24.7828μs 40.3506 KOps/s 39.8349 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[True-True-True-False-True] 56.6110μs 24.2152μs 41.2964 KOps/s 40.8638 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[True-True-True-False-False] 48.8610μs 13.6721μs 73.1419 KOps/s 72.0430 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[True-True-False-True-True] 78.2410μs 47.0469μs 21.2554 KOps/s 20.8049 KOps/s $\color{#35bf28}+2.17\%$
test_step_mdp_speed[True-True-False-True-False] 60.5710μs 27.2640μs 36.6784 KOps/s 35.8026 KOps/s $\color{#35bf28}+2.45\%$
test_step_mdp_speed[True-True-False-False-True] 57.0710μs 26.5971μs 37.5981 KOps/s 36.6255 KOps/s $\color{#35bf28}+2.66\%$
test_step_mdp_speed[True-True-False-False-False] 41.0110μs 16.1079μs 62.0815 KOps/s 60.5099 KOps/s $\color{#35bf28}+2.60\%$
test_step_mdp_speed[True-False-True-True-True] 87.0220μs 49.1969μs 20.3265 KOps/s 19.5645 KOps/s $\color{#35bf28}+3.89\%$
test_step_mdp_speed[True-False-True-True-False] 60.8910μs 29.9892μs 33.3454 KOps/s 32.7669 KOps/s $\color{#35bf28}+1.77\%$
test_step_mdp_speed[True-False-True-False-True] 59.0810μs 26.9200μs 37.1472 KOps/s 36.6480 KOps/s $\color{#35bf28}+1.36\%$
test_step_mdp_speed[True-False-True-False-False] 46.8810μs 16.2820μs 61.4176 KOps/s 60.1663 KOps/s $\color{#35bf28}+2.08\%$
test_step_mdp_speed[True-False-False-True-True] 0.1286ms 51.5475μs 19.3996 KOps/s 18.9607 KOps/s $\color{#35bf28}+2.31\%$
test_step_mdp_speed[True-False-False-True-False] 68.7110μs 32.6303μs 30.6463 KOps/s 30.5226 KOps/s $\color{#35bf28}+0.41\%$
test_step_mdp_speed[True-False-False-False-True] 63.5910μs 29.7297μs 33.6364 KOps/s 33.4321 KOps/s $\color{#35bf28}+0.61\%$
test_step_mdp_speed[True-False-False-False-False] 41.3400μs 18.8129μs 53.1550 KOps/s 51.7802 KOps/s $\color{#35bf28}+2.66\%$
test_step_mdp_speed[False-True-True-True-True] 97.6920μs 49.7595μs 20.0967 KOps/s 19.6140 KOps/s $\color{#35bf28}+2.46\%$
test_step_mdp_speed[False-True-True-True-False] 61.6510μs 30.4830μs 32.8051 KOps/s 32.8134 KOps/s $\color{#d91a1a}-0.03\%$
test_step_mdp_speed[False-True-True-False-True] 2.2717ms 31.4815μs 31.7646 KOps/s 31.7717 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[False-True-True-False-False] 45.1610μs 18.5747μs 53.8366 KOps/s 56.6835 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_step_mdp_speed[False-True-False-True-True] 88.9810μs 52.2829μs 19.1267 KOps/s 19.0149 KOps/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[False-True-False-True-False] 59.2910μs 33.2841μs 30.0444 KOps/s 30.2462 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[False-True-False-False-True] 65.8610μs 33.5140μs 29.8383 KOps/s 29.7078 KOps/s $\color{#35bf28}+0.44\%$
test_step_mdp_speed[False-True-False-False-False] 51.7410μs 20.8239μs 48.0218 KOps/s 47.4112 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[False-False-True-True-True] 86.6720μs 55.2736μs 18.0918 KOps/s 17.7863 KOps/s $\color{#35bf28}+1.72\%$
test_step_mdp_speed[False-False-True-True-False] 66.2310μs 36.1747μs 27.6436 KOps/s 27.9689 KOps/s $\color{#d91a1a}-1.16\%$
test_step_mdp_speed[False-False-True-False-True] 61.9810μs 33.4990μs 29.8517 KOps/s 29.3568 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[False-False-True-False-False] 52.1410μs 20.8626μs 47.9328 KOps/s 47.5102 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[False-False-False-True-True] 89.2610μs 57.4984μs 17.3918 KOps/s 17.2062 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[False-False-False-True-False] 79.6910μs 38.7975μs 25.7749 KOps/s 26.0135 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[False-False-False-False-True] 69.7610μs 35.8969μs 27.8576 KOps/s 27.2672 KOps/s $\color{#35bf28}+2.17\%$
test_step_mdp_speed[False-False-False-False-False] 55.9910μs 23.2077μs 43.0891 KOps/s 42.5811 KOps/s $\color{#35bf28}+1.19\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8606s 0.7676s 1.3028 Ops/s 1.2932 Ops/s $\color{#35bf28}+0.74\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7245s 0.6325s 1.5810 Ops/s 1.5660 Ops/s $\color{#35bf28}+0.96\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7425s 1.6601s 0.6024 Ops/s 0.5947 Ops/s $\color{#35bf28}+1.29\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5179s 1.4384s 0.6952 Ops/s 0.6834 Ops/s $\color{#35bf28}+1.73\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9852s 1.9042s 0.5252 Ops/s 0.5157 Ops/s $\color{#35bf28}+1.84\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7746s 1.6933s 0.5906 Ops/s 0.5838 Ops/s $\color{#35bf28}+1.16\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.8696s 4.7057s 0.2125 Ops/s 0.2153 Ops/s $\color{#d91a1a}-1.29\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5929s 4.4456s 0.2249 Ops/s 0.2247 Ops/s $\color{#35bf28}+0.10\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0824s 1.9697s 0.5077 Ops/s 0.5062 Ops/s $\color{#35bf28}+0.30\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.8023s 1.6866s 0.5929 Ops/s 0.5958 Ops/s $\color{#d91a1a}-0.49\%$
test_values[generalized_advantage_estimate-True-True] 21.6118ms 20.2421ms 49.4020 Ops/s 48.0454 Ops/s $\color{#35bf28}+2.82\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1309s 3.5383ms 282.6224 Ops/s 282.8666 Ops/s $\color{#d91a1a}-0.09\%$
test_values[td0_return_estimate-False-False] 0.1071ms 82.7867μs 12.0792 KOps/s 11.7819 KOps/s $\color{#35bf28}+2.52\%$
test_values[td1_return_estimate-False-False] 51.5248ms 48.4629ms 20.6343 Ops/s 20.3187 Ops/s $\color{#35bf28}+1.55\%$
test_values[vec_td1_return_estimate-False-False] 1.3009ms 1.0893ms 918.0590 Ops/s 911.0753 Ops/s $\color{#35bf28}+0.77\%$
test_values[td_lambda_return_estimate-True-False] 84.6244ms 79.5216ms 12.5752 Ops/s 12.4359 Ops/s $\color{#35bf28}+1.12\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2643ms 1.0824ms 923.8556 Ops/s 913.7560 Ops/s $\color{#35bf28}+1.11\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.1461ms 21.0998ms 47.3938 Ops/s 46.5301 Ops/s $\color{#35bf28}+1.86\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0564ms 0.7541ms 1.3261 KOps/s 1.3107 KOps/s $\color{#35bf28}+1.17\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7233ms 0.6780ms 1.4749 KOps/s 1.4114 KOps/s $\color{#35bf28}+4.50\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5293ms 1.4862ms 672.8766 Ops/s 666.2191 Ops/s $\color{#35bf28}+1.00\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7742ms 0.7247ms 1.3800 KOps/s 1.4366 KOps/s $\color{#d91a1a}-3.94\%$
test_dqn_speed[False-None] 1.6247ms 1.5284ms 654.2658 Ops/s 647.9954 Ops/s $\color{#35bf28}+0.97\%$
test_dqn_speed[False-backward] 2.3048ms 2.1829ms 458.1108 Ops/s 453.0176 Ops/s $\color{#35bf28}+1.12\%$
test_dqn_speed[True-None] 0.6562ms 0.5766ms 1.7344 KOps/s 1.7628 KOps/s $\color{#d91a1a}-1.61\%$
test_dqn_speed[True-backward] 1.2844ms 1.1930ms 838.1958 Ops/s 834.1254 Ops/s $\color{#35bf28}+0.49\%$
test_dqn_speed[reduce-overhead-None] 0.6438ms 0.5773ms 1.7322 KOps/s 1.6583 KOps/s $\color{#35bf28}+4.46\%$
test_ddpg_speed[False-None] 3.2422ms 2.8801ms 347.2065 Ops/s 343.9342 Ops/s $\color{#35bf28}+0.95\%$
test_ddpg_speed[False-backward] 4.8048ms 4.3344ms 230.7108 Ops/s 233.0162 Ops/s $\color{#d91a1a}-0.99\%$
test_ddpg_speed[True-None] 1.3644ms 1.3000ms 769.2418 Ops/s 762.8725 Ops/s $\color{#35bf28}+0.83\%$
test_ddpg_speed[True-backward] 2.5501ms 2.4683ms 405.1305 Ops/s 398.5632 Ops/s $\color{#35bf28}+1.65\%$
test_ddpg_speed[reduce-overhead-None] 1.4371ms 1.3294ms 752.2143 Ops/s 746.9666 Ops/s $\color{#35bf28}+0.70\%$
test_sac_speed[False-None] 8.9097ms 8.3806ms 119.3226 Ops/s 118.2848 Ops/s $\color{#35bf28}+0.88\%$
test_sac_speed[False-backward] 12.1259ms 11.6777ms 85.6334 Ops/s 84.9555 Ops/s $\color{#35bf28}+0.80\%$
test_sac_speed[True-None] 1.8788ms 1.7885ms 559.1241 Ops/s 553.8273 Ops/s $\color{#35bf28}+0.96\%$
test_sac_speed[True-backward] 3.6318ms 3.5424ms 282.2937 Ops/s 279.6252 Ops/s $\color{#35bf28}+0.95\%$
test_sac_speed[reduce-overhead-None] 19.4597ms 10.9691ms 91.1652 Ops/s 90.0749 Ops/s $\color{#35bf28}+1.21\%$
test_redq_deprec_speed[False-None] 9.9457ms 9.3565ms 106.8774 Ops/s 106.0164 Ops/s $\color{#35bf28}+0.81\%$
test_redq_deprec_speed[False-backward] 13.1243ms 12.7396ms 78.4955 Ops/s 78.0138 Ops/s $\color{#35bf28}+0.62\%$
test_redq_deprec_speed[True-None] 2.6482ms 2.4941ms 400.9505 Ops/s 400.0199 Ops/s $\color{#35bf28}+0.23\%$
test_redq_deprec_speed[True-backward] 4.6441ms 4.2417ms 235.7570 Ops/s 239.3974 Ops/s $\color{#d91a1a}-1.52\%$
test_redq_deprec_speed[reduce-overhead-None] 16.5882ms 10.1015ms 98.9948 Ops/s 100.1244 Ops/s $\color{#d91a1a}-1.13\%$
test_td3_speed[False-None] 8.5412ms 8.2996ms 120.4871 Ops/s 113.6213 Ops/s $\textbf{\color{#35bf28}+6.04\%}$
test_td3_speed[False-backward] 11.1766ms 10.7294ms 93.2018 Ops/s 92.6403 Ops/s $\color{#35bf28}+0.61\%$
test_td3_speed[True-None] 1.7006ms 1.6453ms 607.7872 Ops/s 595.8577 Ops/s $\color{#35bf28}+2.00\%$
test_td3_speed[True-backward] 3.1270ms 3.0602ms 326.7711 Ops/s 309.3798 Ops/s $\textbf{\color{#35bf28}+5.62\%}$
test_td3_speed[reduce-overhead-None] 73.2894ms 25.2829ms 39.5524 Ops/s 39.4165 Ops/s $\color{#35bf28}+0.34\%$
test_cql_speed[False-None] 17.6278ms 17.3647ms 57.5880 Ops/s 57.1692 Ops/s $\color{#35bf28}+0.73\%$
test_cql_speed[False-backward] 23.3844ms 22.7811ms 43.8961 Ops/s 42.8423 Ops/s $\color{#35bf28}+2.46\%$
test_cql_speed[True-None] 3.3632ms 3.2102ms 311.5030 Ops/s 310.7186 Ops/s $\color{#35bf28}+0.25\%$
test_cql_speed[True-backward] 5.7424ms 5.3031ms 188.5672 Ops/s 187.8279 Ops/s $\color{#35bf28}+0.39\%$
test_cql_speed[reduce-overhead-None] 19.2929ms 11.9380ms 83.7662 Ops/s 83.1431 Ops/s $\color{#35bf28}+0.75\%$
test_a2c_speed[False-None] 4.0038ms 3.2549ms 307.2266 Ops/s 305.5579 Ops/s $\color{#35bf28}+0.55\%$
test_a2c_speed[False-backward] 6.6021ms 6.2171ms 160.8477 Ops/s 158.7648 Ops/s $\color{#35bf28}+1.31\%$
test_a2c_speed[True-None] 1.5607ms 1.3107ms 762.9356 Ops/s 742.8000 Ops/s $\color{#35bf28}+2.71\%$
test_a2c_speed[True-backward] 3.0312ms 2.9403ms 340.0966 Ops/s 321.4463 Ops/s $\textbf{\color{#35bf28}+5.80\%}$
test_a2c_speed[reduce-overhead-None] 1.0560ms 0.9846ms 1.0157 KOps/s 1.0091 KOps/s $\color{#35bf28}+0.65\%$
test_ppo_speed[False-None] 4.2073ms 3.8950ms 256.7393 Ops/s 254.7061 Ops/s $\color{#35bf28}+0.80\%$
test_ppo_speed[False-backward] 7.4370ms 7.0475ms 141.8938 Ops/s 142.8319 Ops/s $\color{#d91a1a}-0.66\%$
test_ppo_speed[True-None] 1.4811ms 1.4226ms 702.9268 Ops/s 703.2675 Ops/s $\color{#d91a1a}-0.05\%$
test_ppo_speed[True-backward] 3.3448ms 3.0714ms 325.5846 Ops/s 307.4965 Ops/s $\textbf{\color{#35bf28}+5.88\%}$
test_ppo_speed[reduce-overhead-None] 1.1143ms 1.0406ms 960.9493 Ops/s 919.8490 Ops/s $\color{#35bf28}+4.47\%$
test_reinforce_speed[False-None] 2.4513ms 2.3349ms 428.2927 Ops/s 433.8249 Ops/s $\color{#d91a1a}-1.28\%$
test_reinforce_speed[False-backward] 3.7665ms 3.3199ms 301.2165 Ops/s 299.8139 Ops/s $\color{#35bf28}+0.47\%$
test_reinforce_speed[True-None] 1.3814ms 1.2864ms 777.3552 Ops/s 789.0235 Ops/s $\color{#d91a1a}-1.48\%$
test_reinforce_speed[True-backward] 2.9270ms 2.8651ms 349.0288 Ops/s 325.1733 Ops/s $\textbf{\color{#35bf28}+7.34\%}$
test_reinforce_speed[reduce-overhead-None] 0.4435s 10.4988ms 95.2491 Ops/s 104.1610 Ops/s $\textbf{\color{#d91a1a}-8.56\%}$
test_iql_speed[False-None] 10.0806ms 9.4862ms 105.4168 Ops/s 104.3061 Ops/s $\color{#35bf28}+1.06\%$
test_iql_speed[False-backward] 13.6978ms 13.2253ms 75.6126 Ops/s 72.4944 Ops/s $\color{#35bf28}+4.30\%$
test_iql_speed[True-None] 2.2740ms 2.1528ms 464.5126 Ops/s 458.3970 Ops/s $\color{#35bf28}+1.33\%$
test_iql_speed[True-backward] 5.1298ms 4.6580ms 214.6833 Ops/s 203.2246 Ops/s $\textbf{\color{#35bf28}+5.64\%}$
test_iql_speed[reduce-overhead-None] 18.1803ms 10.6895ms 93.5497 Ops/s 95.0727 Ops/s $\color{#d91a1a}-1.60\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4731ms 5.9510ms 168.0386 Ops/s 166.3218 Ops/s $\color{#35bf28}+1.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9858ms 0.3974ms 2.5166 KOps/s 2.7337 KOps/s $\textbf{\color{#d91a1a}-7.94\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6093ms 0.3766ms 2.6553 KOps/s 2.8194 KOps/s $\textbf{\color{#d91a1a}-5.82\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0564ms 5.8086ms 172.1573 Ops/s 171.1457 Ops/s $\color{#35bf28}+0.59\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7060ms 0.3166ms 3.1589 KOps/s 3.4200 KOps/s $\textbf{\color{#d91a1a}-7.63\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6122ms 0.3395ms 2.9451 KOps/s 3.6365 KOps/s $\textbf{\color{#d91a1a}-19.01\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6136ms 1.3577ms 736.5563 Ops/s 744.4566 Ops/s $\color{#d91a1a}-1.06\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4926ms 1.2900ms 775.2063 Ops/s 789.7126 Ops/s $\color{#d91a1a}-1.84\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2523ms 5.9905ms 166.9321 Ops/s 168.1143 Ops/s $\color{#d91a1a}-0.70\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9971ms 0.4360ms 2.2938 KOps/s 2.2735 KOps/s $\color{#35bf28}+0.89\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5821ms 0.4192ms 2.3857 KOps/s 2.1234 KOps/s $\textbf{\color{#35bf28}+12.35\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0483ms 5.8517ms 170.8896 Ops/s 170.5738 Ops/s $\color{#35bf28}+0.19\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.1374ms 0.3611ms 2.7694 KOps/s 3.0089 KOps/s $\textbf{\color{#d91a1a}-7.96\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6484ms 0.3160ms 3.1645 KOps/s 3.0023 KOps/s $\textbf{\color{#35bf28}+5.40\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4354ms 5.8201ms 171.8174 Ops/s 172.6690 Ops/s $\color{#d91a1a}-0.49\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9931ms 0.3410ms 2.9328 KOps/s 2.5794 KOps/s $\textbf{\color{#35bf28}+13.70\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5825ms 0.3542ms 2.8235 KOps/s 3.4002 KOps/s $\textbf{\color{#d91a1a}-16.96\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1088ms 5.9871ms 167.0255 Ops/s 166.1542 Ops/s $\color{#35bf28}+0.52\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2104ms 0.4870ms 2.0532 KOps/s 2.0246 KOps/s $\color{#35bf28}+1.41\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7037ms 0.4704ms 2.1258 KOps/s 2.1104 KOps/s $\color{#35bf28}+0.73\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.6393s 17.8035ms 56.1689 Ops/s 193.5151 Ops/s $\textbf{\color{#d91a1a}-70.97\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 7.4583ms 1.9505ms 512.6825 Ops/s 518.0612 Ops/s $\color{#d91a1a}-1.04\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 11.6266ms 1.3489ms 741.3598 Ops/s 768.5983 Ops/s $\color{#d91a1a}-3.54\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 8.0121ms 5.0916ms 196.4028 Ops/s 193.6694 Ops/s $\color{#35bf28}+1.41\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.8049ms 1.9362ms 516.4829 Ops/s 535.2377 Ops/s $\color{#d91a1a}-3.50\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.1127ms 0.9419ms 1.0617 KOps/s 1.0248 KOps/s $\color{#35bf28}+3.60\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5883s 16.9742ms 58.9128 Ops/s 48.6946 Ops/s $\textbf{\color{#35bf28}+20.98\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.4247ms 2.0936ms 477.6358 Ops/s 510.9299 Ops/s $\textbf{\color{#d91a1a}-6.52\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.2112ms 1.1113ms 899.8846 Ops/s 913.2107 Ops/s $\color{#d91a1a}-1.46\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.0903ms 35.2401ms 28.3768 Ops/s 27.5509 Ops/s $\color{#35bf28}+3.00\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.7753ms 18.0658ms 55.3532 Ops/s 56.0790 Ops/s $\color{#d91a1a}-1.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.8770ms 36.7745ms 27.1928 Ops/s 26.8959 Ops/s $\color{#35bf28}+1.10\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.3231ms 18.3049ms 54.6302 Ops/s 54.9345 Ops/s $\color{#d91a1a}-0.55\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.9024ms 38.9647ms 25.6643 Ops/s 25.6018 Ops/s $\color{#35bf28}+0.24\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.9949ms 20.7415ms 48.2126 Ops/s 50.6674 Ops/s $\color{#d91a1a}-4.85\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8985ms 0.2192ms 4.5628 KOps/s 4.4941 KOps/s $\color{#35bf28}+1.53\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.5590ms 1.3888ms 720.0299 Ops/s 728.8695 Ops/s $\color{#d91a1a}-1.21\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.6233ms 2.2817ms 438.2611 Ops/s 440.9556 Ops/s $\color{#d91a1a}-0.61\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1399ms 2.9457ms 339.4743 Ops/s 341.5865 Ops/s $\color{#d91a1a}-0.62\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2203ms 0.1490ms 6.7093 KOps/s 6.7110 KOps/s $\color{#d91a1a}-0.03\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3710ms 0.2161ms 4.6272 KOps/s 4.3660 KOps/s $\textbf{\color{#35bf28}+5.98\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.8031ms 1.6444ms 608.1356 Ops/s 577.5729 Ops/s $\textbf{\color{#35bf28}+5.29\%}$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5089ms 1.3671ms 731.4504 Ops/s 742.4487 Ops/s $\color{#d91a1a}-1.48\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2638ms 1.1368ms 879.6427 Ops/s 879.6855 Ops/s $-0.00\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8222ms 3.6028ms 277.5639 Ops/s 274.1564 Ops/s $\color{#35bf28}+1.24\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.0982ms 5.7299ms 174.5236 Ops/s 176.5821 Ops/s $\color{#d91a1a}-1.17\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.2304ms 6.9840ms 143.1845 Ops/s 141.1333 Ops/s $\color{#35bf28}+1.45\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4756ms 0.2721ms 3.6747 KOps/s 3.6256 KOps/s $\color{#35bf28}+1.35\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7208ms 1.5484ms 645.8263 Ops/s 697.7407 Ops/s $\textbf{\color{#d91a1a}-7.44\%}$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.5379ms 2.4136ms 414.3118 Ops/s 414.7988 Ops/s $\color{#d91a1a}-0.12\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3039ms 3.1595ms 316.5086 Ops/s 318.4765 Ops/s $\color{#d91a1a}-0.62\%$
test_collector_without_rb[100-img_shape0-atari] 35.0446ms 33.9789ms 29.4301 Ops/s 29.3365 Ops/s $\color{#35bf28}+0.32\%$
test_collector_without_rb[200-img_shape1-large_batch] 67.4988ms 66.6880ms 14.9952 Ops/s 14.9163 Ops/s $\color{#35bf28}+0.53\%$
test_collector_with_rb[100-img_shape0-atari] 39.2256ms 38.4274ms 26.0231 Ops/s 25.9833 Ops/s $\color{#35bf28}+0.15\%$
test_collector_with_rb[200-img_shape1-large_batch] 76.7554ms 75.0184ms 13.3301 Ops/s 13.1548 Ops/s $\color{#35bf28}+1.33\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 58.1531ms 57.1462ms 17.4990 Ops/s 17.6262 Ops/s $\color{#d91a1a}-0.72\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1157s 0.1124s 8.8996 Ops/s 8.8509 Ops/s $\color{#35bf28}+0.55\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 60.0714ms 58.8954ms 16.9793 Ops/s 16.9865 Ops/s $\color{#d91a1a}-0.04\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.8050s 0.1928s 5.1865 Ops/s 8.5329 Ops/s $\textbf{\color{#d91a1a}-39.22\%}$

@vmoens vmoens merged commit 190a43d into main Feb 6, 2026
136 of 140 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Feature New feature Record

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant