Skip to content

[Dev] Add prof instrumentation to collector and env pipeline#3460

Closed
vmoens wants to merge 2 commits intogh/vmoens/221/basefrom
gh/vmoens/221/head
Closed

[Dev] Add prof instrumentation to collector and env pipeline#3460
vmoens wants to merge 2 commits intogh/vmoens/221/basefrom
gh/vmoens/221/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 6, 2026

Stack from ghstack (oldest at bottom):

Add optional, zero-cost profiling instrumentation using the prof
library (conditionally imported, no hard dependency). When prof is not
installed or not initialised, all _prof_ctx() calls return
contextlib.nullcontext() with no overhead.

Instrumented phases:

  • ParallelEnv parent: penv.write_inputs, penv.sync_m2w,
    penv.send_commands, penv.wait_for_workers, penv.read_outputs,
    penv.sync_w2m
  • ParallelEnv worker: worker.env_step, worker.write_output,
    worker.cuda_sync, worker.signal_done
  • SerialEnv: env.reset, env.step (via decorator)
  • Collector: collector.policy_call, collector.env_step,
    collector.to_device, collector.stack_results
  • Runner (async): worker.buffer_extend, worker.queue_put,
    worker.share_memory
  • BaseCollector: collector.sync_weights

Workers receive prof_shm_name from the parent process and call
prof.prepare() to join the distributed profiling session.

Co-authored-by: Cursor [email protected]

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3460

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit f6255e8 with merge base ab49b59 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

⚠️ PR Title Label Error

Unknown or invalid prefix [Dev].

Current title: [Dev] Add prof instrumentation to collector and env pipeline

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

⚠️ PR Title Label Error

Unknown or invalid prefix [Dev].

Current title: [Dev] Add prof instrumentation to collector and env pipeline

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 6, 2026
[ghstack-poisoned]
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

⚠️ PR Title Label Error

Unknown or invalid prefix [Dev].

Current title: [Dev] Add prof instrumentation to collector and env pipeline

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 84.6248μs 81.5414μs 12.2637 KOps/s 12.5764 KOps/s $\color{#d91a1a}-2.49\%$
test_tensor_to_bytestream_speed[torch.save] 0.1352ms 0.1341ms 7.4575 KOps/s 7.3637 KOps/s $\color{#35bf28}+1.27\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1017s 0.1016s 9.8473 Ops/s 9.7982 Ops/s $\color{#35bf28}+0.50\%$
test_tensor_to_bytestream_speed[numpy] 2.4875μs 2.4706μs 404.7529 KOps/s 402.9455 KOps/s $\color{#35bf28}+0.45\%$
test_tensor_to_bytestream_speed[safetensors] 36.0586μs 35.8398μs 27.9019 KOps/s 26.1796 KOps/s $\textbf{\color{#35bf28}+6.58\%}$
test_simple 0.5318s 0.5315s 1.8816 Ops/s 1.7890 Ops/s $\textbf{\color{#35bf28}+5.18\%}$
test_transformed 1.1112s 1.1057s 0.9044 Ops/s 0.8830 Ops/s $\color{#35bf28}+2.42\%$
test_serial 1.6612s 1.6457s 0.6076 Ops/s 0.6050 Ops/s $\color{#35bf28}+0.44\%$
test_parallel 1.1190s 1.0341s 0.9670 Ops/s 0.9663 Ops/s $\color{#35bf28}+0.07\%$
test_step_mdp_speed[True-True-True-True-True] 0.3230ms 43.6430μs 22.9132 KOps/s 23.6818 KOps/s $\color{#d91a1a}-3.25\%$
test_step_mdp_speed[True-True-True-True-False] 71.2510μs 24.4062μs 40.9732 KOps/s 41.2652 KOps/s $\color{#d91a1a}-0.71\%$
test_step_mdp_speed[True-True-True-False-True] 59.2110μs 24.3700μs 41.0341 KOps/s 40.7934 KOps/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[True-True-True-False-False] 57.0110μs 13.4700μs 74.2390 KOps/s 74.6363 KOps/s $\color{#d91a1a}-0.53\%$
test_step_mdp_speed[True-True-False-True-True] 90.6620μs 46.7076μs 21.4098 KOps/s 21.4293 KOps/s $\color{#d91a1a}-0.09\%$
test_step_mdp_speed[True-True-False-True-False] 0.4276ms 26.3141μs 38.0025 KOps/s 36.4726 KOps/s $\color{#35bf28}+4.19\%$
test_step_mdp_speed[True-True-False-False-True] 62.7710μs 26.6723μs 37.4921 KOps/s 36.5225 KOps/s $\color{#35bf28}+2.65\%$
test_step_mdp_speed[True-True-False-False-False] 50.2600μs 16.0048μs 62.4813 KOps/s 61.4347 KOps/s $\color{#35bf28}+1.70\%$
test_step_mdp_speed[True-False-True-True-True] 0.4565ms 49.9474μs 20.0211 KOps/s 20.6962 KOps/s $\color{#d91a1a}-3.26\%$
test_step_mdp_speed[True-False-True-True-False] 0.4495ms 29.5824μs 33.8039 KOps/s 33.2363 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[True-False-True-False-True] 60.2510μs 26.8238μs 37.2804 KOps/s 36.5359 KOps/s $\color{#35bf28}+2.04\%$
test_step_mdp_speed[True-False-True-False-False] 0.4199ms 16.0729μs 62.2164 KOps/s 61.4191 KOps/s $\color{#35bf28}+1.30\%$
test_step_mdp_speed[True-False-False-True-True] 0.4580ms 50.5517μs 19.7817 KOps/s 19.1460 KOps/s $\color{#35bf28}+3.32\%$
test_step_mdp_speed[True-False-False-True-False] 76.1710μs 32.0596μs 31.1919 KOps/s 30.7226 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[True-False-False-False-True] 0.4471ms 29.0867μs 34.3799 KOps/s 33.7895 KOps/s $\color{#35bf28}+1.75\%$
test_step_mdp_speed[True-False-False-False-False] 0.4336ms 18.4698μs 54.1423 KOps/s 52.7354 KOps/s $\color{#35bf28}+2.67\%$
test_step_mdp_speed[False-True-True-True-True] 85.6620μs 48.8218μs 20.4827 KOps/s 20.2628 KOps/s $\color{#35bf28}+1.09\%$
test_step_mdp_speed[False-True-True-True-False] 0.4463ms 28.8011μs 34.7209 KOps/s 33.0727 KOps/s $\color{#35bf28}+4.98\%$
test_step_mdp_speed[False-True-True-False-True] 2.4411ms 30.7285μs 32.5431 KOps/s 31.6969 KOps/s $\color{#35bf28}+2.67\%$
test_step_mdp_speed[False-True-True-False-False] 50.3700μs 17.6694μs 56.5950 KOps/s 55.3241 KOps/s $\color{#35bf28}+2.30\%$
test_step_mdp_speed[False-True-False-True-True] 0.4733ms 50.9016μs 19.6458 KOps/s 19.0567 KOps/s $\color{#35bf28}+3.09\%$
test_step_mdp_speed[False-True-False-True-False] 0.4501ms 31.6306μs 31.6150 KOps/s 30.3970 KOps/s $\color{#35bf28}+4.01\%$
test_step_mdp_speed[False-True-False-False-True] 0.4557ms 32.7955μs 30.4920 KOps/s 29.6992 KOps/s $\color{#35bf28}+2.67\%$
test_step_mdp_speed[False-True-False-False-False] 49.3210μs 20.0445μs 49.8889 KOps/s 48.5451 KOps/s $\color{#35bf28}+2.77\%$
test_step_mdp_speed[False-False-True-True-True] 0.1179ms 54.3610μs 18.3956 KOps/s 18.4383 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[False-False-True-True-False] 0.4645ms 34.4826μs 29.0001 KOps/s 27.8861 KOps/s $\color{#35bf28}+4.00\%$
test_step_mdp_speed[False-False-True-False-True] 0.4532ms 33.5313μs 29.8229 KOps/s 29.4718 KOps/s $\color{#35bf28}+1.19\%$
test_step_mdp_speed[False-False-True-False-False] 50.2610μs 19.9332μs 50.1675 KOps/s 48.0809 KOps/s $\color{#35bf28}+4.34\%$
test_step_mdp_speed[False-False-False-True-True] 0.4738ms 55.0181μs 18.1759 KOps/s 17.7527 KOps/s $\color{#35bf28}+2.38\%$
test_step_mdp_speed[False-False-False-True-False] 0.4528ms 37.0375μs 26.9997 KOps/s 26.3465 KOps/s $\color{#35bf28}+2.48\%$
test_step_mdp_speed[False-False-False-False-True] 0.4501ms 35.4440μs 28.2135 KOps/s 28.0175 KOps/s $\color{#35bf28}+0.70\%$
test_step_mdp_speed[False-False-False-False-False] 0.4378ms 22.2736μs 44.8961 KOps/s 43.1458 KOps/s $\color{#35bf28}+4.06\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7243s 0.7213s 1.3864 Ops/s 1.3366 Ops/s $\color{#35bf28}+3.73\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7109s 0.6140s 1.6286 Ops/s 1.6200 Ops/s $\color{#35bf28}+0.53\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7083s 1.6305s 0.6133 Ops/s 0.6102 Ops/s $\color{#35bf28}+0.51\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4956s 1.4146s 0.7069 Ops/s 0.7056 Ops/s $\color{#35bf28}+0.19\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9592s 1.8790s 0.5322 Ops/s 0.5326 Ops/s $\color{#d91a1a}-0.07\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7429s 1.6594s 0.6026 Ops/s 0.6030 Ops/s $\color{#d91a1a}-0.06\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7606s 4.6198s 0.2165 Ops/s 0.2190 Ops/s $\color{#d91a1a}-1.14\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5119s 4.3859s 0.2280 Ops/s 0.2290 Ops/s $\color{#d91a1a}-0.43\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9224s 1.8527s 0.5398 Ops/s 0.5355 Ops/s $\color{#35bf28}+0.79\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7095s 1.5995s 0.6252 Ops/s 0.6193 Ops/s $\color{#35bf28}+0.96\%$
test_values[generalized_advantage_estimate-True-True] 9.7326ms 9.5992ms 104.1757 Ops/s 103.9089 Ops/s $\color{#35bf28}+0.26\%$
test_values[vec_generalized_advantage_estimate-True-True] 20.3402ms 17.6689ms 56.5968 Ops/s 56.2144 Ops/s $\color{#35bf28}+0.68\%$
test_values[td0_return_estimate-False-False] 0.1960ms 0.1226ms 8.1566 KOps/s 7.8790 KOps/s $\color{#35bf28}+3.52\%$
test_values[td1_return_estimate-False-False] 25.7746ms 25.4236ms 39.3335 Ops/s 37.9966 Ops/s $\color{#35bf28}+3.52\%$
test_values[vec_td1_return_estimate-False-False] 18.6860ms 17.7122ms 56.4584 Ops/s 55.9354 Ops/s $\color{#35bf28}+0.94\%$
test_values[td_lambda_return_estimate-True-False] 38.3413ms 37.7221ms 26.5097 Ops/s 25.9332 Ops/s $\color{#35bf28}+2.22\%$
test_values[vec_td_lambda_return_estimate-True-False] 17.8889ms 17.6163ms 56.7656 Ops/s 55.9142 Ops/s $\color{#35bf28}+1.52\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.5279ms 8.4397ms 118.4875 Ops/s 116.9571 Ops/s $\color{#35bf28}+1.31\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.8586ms 1.4823ms 674.6249 Ops/s 681.4200 Ops/s $\color{#d91a1a}-1.00\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4895ms 0.4088ms 2.4464 KOps/s 2.4760 KOps/s $\color{#d91a1a}-1.20\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.0739ms 34.5295ms 28.9608 Ops/s 28.8732 Ops/s $\color{#35bf28}+0.30\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8146ms 1.6893ms 591.9535 Ops/s 586.3951 Ops/s $\color{#35bf28}+0.95\%$
test_dqn_speed[False-None] 1.7768ms 1.3628ms 733.7944 Ops/s 750.5548 Ops/s $\color{#d91a1a}-2.23\%$
test_dqn_speed[False-backward] 1.9295ms 1.8700ms 534.7686 Ops/s 543.6064 Ops/s $\color{#d91a1a}-1.63\%$
test_dqn_speed[True-None] 0.9549ms 0.5470ms 1.8281 KOps/s 1.8506 KOps/s $\color{#d91a1a}-1.22\%$
test_dqn_speed[True-backward] 1.1309ms 0.9915ms 1.0086 KOps/s 1.0007 KOps/s $\color{#35bf28}+0.78\%$
test_dqn_speed[reduce-overhead-None] 0.9453ms 0.5286ms 1.8917 KOps/s 1.8611 KOps/s $\color{#35bf28}+1.64\%$
test_ddpg_speed[False-None] 3.1472ms 2.7369ms 365.3708 Ops/s 359.7313 Ops/s $\color{#35bf28}+1.57\%$
test_ddpg_speed[False-backward] 3.9784ms 3.8829ms 257.5424 Ops/s 254.4025 Ops/s $\color{#35bf28}+1.23\%$
test_ddpg_speed[True-None] 1.7651ms 1.3876ms 720.6633 Ops/s 713.0627 Ops/s $\color{#35bf28}+1.07\%$
test_ddpg_speed[True-backward] 2.4188ms 2.3563ms 424.3965 Ops/s 342.2545 Ops/s $\textbf{\color{#35bf28}+24.00\%}$
test_ddpg_speed[reduce-overhead-None] 1.5071ms 1.3759ms 726.7914 Ops/s 715.0496 Ops/s $\color{#35bf28}+1.64\%$
test_sac_speed[False-None] 8.4026ms 7.7170ms 129.5845 Ops/s 129.1272 Ops/s $\color{#35bf28}+0.35\%$
test_sac_speed[False-backward] 13.7330ms 11.3633ms 88.0025 Ops/s 91.8738 Ops/s $\color{#d91a1a}-4.21\%$
test_sac_speed[True-None] 2.4864ms 2.1288ms 469.7570 Ops/s 465.0575 Ops/s $\color{#35bf28}+1.01\%$
test_sac_speed[True-backward] 4.1002ms 4.0151ms 249.0604 Ops/s 225.9870 Ops/s $\textbf{\color{#35bf28}+10.21\%}$
test_sac_speed[reduce-overhead-None] 2.5274ms 2.1224ms 471.1573 Ops/s 464.8601 Ops/s $\color{#35bf28}+1.35\%$
test_redq_speed[False-None] 14.8271ms 10.4647ms 95.5591 Ops/s 92.8562 Ops/s $\color{#35bf28}+2.91\%$
test_redq_speed[False-backward] 21.6681ms 17.8100ms 56.1482 Ops/s 57.3823 Ops/s $\color{#d91a1a}-2.15\%$
test_redq_speed[True-None] 5.4550ms 4.4541ms 224.5099 Ops/s 220.9189 Ops/s $\color{#35bf28}+1.63\%$
test_redq_speed[True-backward] 10.7969ms 9.8929ms 101.0822 Ops/s 103.1178 Ops/s $\color{#d91a1a}-1.97\%$
test_redq_speed[reduce-overhead-None] 4.8816ms 4.4926ms 222.5878 Ops/s 221.7896 Ops/s $\color{#35bf28}+0.36\%$
test_redq_deprec_speed[False-None] 11.4786ms 10.8596ms 92.0845 Ops/s 91.2367 Ops/s $\color{#35bf28}+0.93\%$
test_redq_deprec_speed[False-backward] 16.2362ms 15.5415ms 64.3438 Ops/s 63.0615 Ops/s $\color{#35bf28}+2.03\%$
test_redq_deprec_speed[True-None] 4.1240ms 3.7339ms 267.8193 Ops/s 272.1730 Ops/s $\color{#d91a1a}-1.60\%$
test_redq_deprec_speed[True-backward] 7.8693ms 7.6579ms 130.5847 Ops/s 128.0118 Ops/s $\color{#35bf28}+2.01\%$
test_redq_deprec_speed[reduce-overhead-None] 4.1236ms 3.6556ms 273.5517 Ops/s 267.7118 Ops/s $\color{#35bf28}+2.18\%$
test_td3_speed[False-None] 7.9966ms 7.7419ms 129.1671 Ops/s 128.3804 Ops/s $\color{#35bf28}+0.61\%$
test_td3_speed[False-backward] 11.2129ms 10.5902ms 94.4268 Ops/s 93.7976 Ops/s $\color{#35bf28}+0.67\%$
test_td3_speed[True-None] 1.8953ms 1.8324ms 545.7187 Ops/s 543.7179 Ops/s $\color{#35bf28}+0.37\%$
test_td3_speed[True-backward] 3.7492ms 3.6529ms 273.7559 Ops/s 261.8990 Ops/s $\color{#35bf28}+4.53\%$
test_td3_speed[reduce-overhead-None] 1.8953ms 1.8177ms 550.1487 Ops/s 555.1957 Ops/s $\color{#d91a1a}-0.91\%$
test_cql_speed[False-None] 29.0409ms 25.6144ms 39.0405 Ops/s 39.0357 Ops/s $\color{#35bf28}+0.01\%$
test_cql_speed[False-backward] 35.0728ms 34.4731ms 29.0081 Ops/s 28.9179 Ops/s $\color{#35bf28}+0.31\%$
test_cql_speed[True-None] 12.7721ms 12.4975ms 80.0158 Ops/s 78.9217 Ops/s $\color{#35bf28}+1.39\%$
test_cql_speed[True-backward] 19.0844ms 18.5666ms 53.8602 Ops/s 55.3680 Ops/s $\color{#d91a1a}-2.72\%$
test_cql_speed[reduce-overhead-None] 13.1605ms 12.5190ms 79.8786 Ops/s 80.3127 Ops/s $\color{#d91a1a}-0.54\%$
test_a2c_speed[False-None] 5.6753ms 5.3324ms 187.5327 Ops/s 188.1961 Ops/s $\color{#d91a1a}-0.35\%$
test_a2c_speed[False-backward] 11.9718ms 11.6655ms 85.7232 Ops/s 85.5932 Ops/s $\color{#35bf28}+0.15\%$
test_a2c_speed[True-None] 4.1609ms 3.7250ms 268.4566 Ops/s 264.8699 Ops/s $\color{#35bf28}+1.35\%$
test_a2c_speed[True-backward] 9.1055ms 8.5993ms 116.2892 Ops/s 111.6856 Ops/s $\color{#35bf28}+4.12\%$
test_a2c_speed[reduce-overhead-None] 4.1237ms 3.7155ms 269.1425 Ops/s 264.5837 Ops/s $\color{#35bf28}+1.72\%$
test_ppo_speed[False-None] 6.2658ms 5.8145ms 171.9824 Ops/s 169.4412 Ops/s $\color{#35bf28}+1.50\%$
test_ppo_speed[False-backward] 12.8434ms 12.2776ms 81.4494 Ops/s 79.9747 Ops/s $\color{#35bf28}+1.84\%$
test_ppo_speed[True-None] 4.0411ms 3.6285ms 275.5965 Ops/s 265.8247 Ops/s $\color{#35bf28}+3.68\%$
test_ppo_speed[True-backward] 8.6326ms 8.4239ms 118.7105 Ops/s 110.6979 Ops/s $\textbf{\color{#35bf28}+7.24\%}$
test_ppo_speed[reduce-overhead-None] 3.7437ms 3.6029ms 277.5553 Ops/s 267.1054 Ops/s $\color{#35bf28}+3.91\%$
test_reinforce_speed[False-None] 4.8726ms 4.3941ms 227.5754 Ops/s 218.2248 Ops/s $\color{#35bf28}+4.28\%$
test_reinforce_speed[False-backward] 7.7238ms 7.2798ms 137.3659 Ops/s 135.2741 Ops/s $\color{#35bf28}+1.55\%$
test_reinforce_speed[True-None] 3.2769ms 2.8702ms 348.4030 Ops/s 365.2633 Ops/s $\color{#d91a1a}-4.62\%$
test_reinforce_speed[True-backward] 8.3101ms 7.8554ms 127.3006 Ops/s 117.1101 Ops/s $\textbf{\color{#35bf28}+8.70\%}$
test_reinforce_speed[reduce-overhead-None] 3.0777ms 2.8908ms 345.9258 Ops/s 365.6211 Ops/s $\textbf{\color{#d91a1a}-5.39\%}$
test_iql_speed[False-None] 20.2183ms 19.5943ms 51.0354 Ops/s 51.0153 Ops/s $\color{#35bf28}+0.04\%$
test_iql_speed[False-backward] 30.9401ms 30.1818ms 33.1325 Ops/s 34.0072 Ops/s $\color{#d91a1a}-2.57\%$
test_iql_speed[True-None] 8.9464ms 8.5673ms 116.7223 Ops/s 122.9246 Ops/s $\textbf{\color{#d91a1a}-5.05\%}$
test_iql_speed[True-backward] 17.2520ms 16.8471ms 59.3573 Ops/s 64.4971 Ops/s $\textbf{\color{#d91a1a}-7.97\%}$
test_iql_speed[reduce-overhead-None] 11.5946ms 8.8047ms 113.5753 Ops/s 116.3881 Ops/s $\color{#d91a1a}-2.42\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0270ms 5.9045ms 169.3635 Ops/s 168.7647 Ops/s $\color{#35bf28}+0.35\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.9302ms 0.3177ms 3.1478 KOps/s 3.0917 KOps/s $\color{#35bf28}+1.81\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6954ms 0.3227ms 3.0992 KOps/s 3.4521 KOps/s $\textbf{\color{#d91a1a}-10.22\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8962ms 5.6760ms 176.1809 Ops/s 176.6498 Ops/s $\color{#d91a1a}-0.27\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1362ms 0.3190ms 3.1345 KOps/s 3.2649 KOps/s $\color{#d91a1a}-3.99\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7293ms 0.3039ms 3.2900 KOps/s 3.4304 KOps/s $\color{#d91a1a}-4.09\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6611ms 1.3401ms 746.2124 Ops/s 821.6556 Ops/s $\textbf{\color{#d91a1a}-9.18\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5117ms 1.2727ms 785.7608 Ops/s 884.8158 Ops/s $\textbf{\color{#d91a1a}-11.19\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 11.6427ms 5.9407ms 168.3300 Ops/s 173.1900 Ops/s $\color{#d91a1a}-2.81\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2187ms 0.4461ms 2.2417 KOps/s 2.3645 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8325ms 0.4289ms 2.3313 KOps/s 2.4705 KOps/s $\textbf{\color{#d91a1a}-5.63\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8338ms 5.6727ms 176.2835 Ops/s 175.1499 Ops/s $\color{#35bf28}+0.65\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.5517ms 0.3095ms 3.2313 KOps/s 3.6141 KOps/s $\textbf{\color{#d91a1a}-10.59\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4897ms 0.3052ms 3.2769 KOps/s 3.8199 KOps/s $\textbf{\color{#d91a1a}-14.22\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9929ms 5.6619ms 176.6184 Ops/s 178.6517 Ops/s $\color{#d91a1a}-1.14\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7438ms 0.3045ms 3.2845 KOps/s 3.4059 KOps/s $\color{#d91a1a}-3.56\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5445ms 0.3068ms 3.2600 KOps/s 3.6117 KOps/s $\textbf{\color{#d91a1a}-9.74\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1292ms 5.8051ms 172.2616 Ops/s 173.5814 Ops/s $\color{#d91a1a}-0.76\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8656ms 0.4658ms 2.1471 KOps/s 1.8133 KOps/s $\textbf{\color{#35bf28}+18.40\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6753ms 0.4602ms 2.1728 KOps/s 2.3003 KOps/s $\textbf{\color{#d91a1a}-5.54\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3353ms 4.8498ms 206.1921 Ops/s 58.7136 Ops/s $\textbf{\color{#35bf28}+251.18\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.8688ms 2.1045ms 475.1610 Ops/s 552.5213 Ops/s $\textbf{\color{#d91a1a}-14.00\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.1006ms 1.0983ms 910.4591 Ops/s 1.1438 KOps/s $\textbf{\color{#d91a1a}-20.40\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5402s 15.7294ms 63.5754 Ops/s 198.4478 Ops/s $\textbf{\color{#d91a1a}-67.96\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.8605ms 1.7220ms 580.7229 Ops/s 556.1321 Ops/s $\color{#35bf28}+4.42\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.1339ms 0.8621ms 1.1599 KOps/s 1.1991 KOps/s $\color{#d91a1a}-3.27\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.4185ms 5.2004ms 192.2920 Ops/s 60.6735 Ops/s $\textbf{\color{#35bf28}+216.93\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.1307ms 2.0124ms 496.9072 Ops/s 491.9592 Ops/s $\color{#35bf28}+1.01\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.2128ms 1.1699ms 854.7587 Ops/s 971.3924 Ops/s $\textbf{\color{#d91a1a}-12.01\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 39.9036ms 35.2482ms 28.3703 Ops/s 28.5568 Ops/s $\color{#d91a1a}-0.65\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.4651ms 17.9853ms 55.6011 Ops/s 58.1745 Ops/s $\color{#d91a1a}-4.42\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.2742ms 36.1097ms 27.6934 Ops/s 27.1873 Ops/s $\color{#35bf28}+1.86\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.2913ms 18.0820ms 55.3035 Ops/s 56.9282 Ops/s $\color{#d91a1a}-2.85\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 39.7742ms 37.8184ms 26.4422 Ops/s 26.3306 Ops/s $\color{#35bf28}+0.42\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.9895ms 19.5404ms 51.1760 Ops/s 52.8472 Ops/s $\color{#d91a1a}-3.16\%$
test_storage_write_lazystack[50-img_shape0-small] 0.9415ms 0.2250ms 4.4441 KOps/s 4.5637 KOps/s $\color{#d91a1a}-2.62\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7068ms 1.3981ms 715.2430 Ops/s 715.1571 Ops/s $\color{#35bf28}+0.01\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.4694ms 2.3022ms 434.3747 Ops/s 412.8749 Ops/s $\textbf{\color{#35bf28}+5.21\%}$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1125ms 2.9423ms 339.8720 Ops/s 339.8553 Ops/s $+0.00\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2125ms 0.1309ms 7.6419 KOps/s 7.6688 KOps/s $\color{#d91a1a}-0.35\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.5239ms 0.1832ms 5.4591 KOps/s 5.6259 KOps/s $\color{#d91a1a}-2.96\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.8678ms 1.7308ms 577.7689 Ops/s 567.1466 Ops/s $\color{#35bf28}+1.87\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4312ms 1.3058ms 765.7959 Ops/s 770.4438 Ops/s $\color{#d91a1a}-0.60\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2049ms 1.0906ms 916.9170 Ops/s 910.8727 Ops/s $\color{#35bf28}+0.66\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7239ms 3.4796ms 287.3871 Ops/s 280.4412 Ops/s $\color{#35bf28}+2.48\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.2282ms 5.6958ms 175.5666 Ops/s 176.1628 Ops/s $\color{#d91a1a}-0.34\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.5064ms 7.1816ms 139.2444 Ops/s 140.1090 Ops/s $\color{#d91a1a}-0.62\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4230ms 0.2687ms 3.7219 KOps/s 3.7183 KOps/s $\color{#35bf28}+0.10\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6914ms 1.5146ms 660.2350 Ops/s 665.7389 Ops/s $\color{#d91a1a}-0.83\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.7854ms 2.4160ms 413.9148 Ops/s 391.1416 Ops/s $\textbf{\color{#35bf28}+5.82\%}$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3336ms 3.1437ms 318.0957 Ops/s 317.4713 Ops/s $\color{#35bf28}+0.20\%$
test_collector_without_rb[100-img_shape0-atari] 34.8531ms 33.6292ms 29.7361 Ops/s 29.7201 Ops/s $\color{#35bf28}+0.05\%$
test_collector_without_rb[200-img_shape1-large_batch] 67.7709ms 65.9578ms 15.1612 Ops/s 15.1176 Ops/s $\color{#35bf28}+0.29\%$
test_collector_with_rb[100-img_shape0-atari] 41.9326ms 41.3000ms 24.2131 Ops/s 24.1549 Ops/s $\color{#35bf28}+0.24\%$
test_collector_with_rb[200-img_shape1-large_batch] 81.3549ms 80.5838ms 12.4094 Ops/s 12.3323 Ops/s $\color{#35bf28}+0.63\%$

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.4490μs 80.0188μs 12.4971 KOps/s 12.4697 KOps/s $\color{#35bf28}+0.22\%$
test_tensor_to_bytestream_speed[torch.save] 0.1392ms 0.1389ms 7.2019 KOps/s 7.1946 KOps/s $\color{#35bf28}+0.10\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1187s 0.1183s 8.4515 Ops/s 8.9782 Ops/s $\textbf{\color{#d91a1a}-5.87\%}$
test_tensor_to_bytestream_speed[numpy] 2.5633μs 2.5593μs 390.7250 KOps/s 379.4445 KOps/s $\color{#35bf28}+2.97\%$
test_tensor_to_bytestream_speed[safetensors] 39.0442μs 37.9473μs 26.3524 KOps/s 26.6239 KOps/s $\color{#d91a1a}-1.02\%$
test_simple 0.7956s 0.7951s 1.2578 Ops/s 1.2230 Ops/s $\color{#35bf28}+2.85\%$
test_transformed 1.5428s 1.4495s 0.6899 Ops/s 0.6851 Ops/s $\color{#35bf28}+0.70\%$
test_serial 2.4179s 2.3229s 0.4305 Ops/s 0.4294 Ops/s $\color{#35bf28}+0.24\%$
test_parallel 1.9898s 1.8565s 0.5387 Ops/s 0.5483 Ops/s $\color{#d91a1a}-1.76\%$
test_step_mdp_speed[True-True-True-True-True] 0.4685ms 44.9970μs 22.2237 KOps/s 22.3329 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[True-True-True-True-False] 0.4420ms 25.5265μs 39.1749 KOps/s 40.0200 KOps/s $\color{#d91a1a}-2.11\%$
test_step_mdp_speed[True-True-True-False-True] 69.5710μs 25.5112μs 39.1984 KOps/s 40.2275 KOps/s $\color{#d91a1a}-2.56\%$
test_step_mdp_speed[True-True-True-False-False] 41.3610μs 13.8650μs 72.1238 KOps/s 72.6020 KOps/s $\color{#d91a1a}-0.66\%$
test_step_mdp_speed[True-True-False-True-True] 0.4711ms 47.6677μs 20.9785 KOps/s 21.0194 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[True-True-False-True-False] 58.7410μs 27.7805μs 35.9964 KOps/s 36.3321 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[True-True-False-False-True] 0.4526ms 27.8667μs 35.8851 KOps/s 36.2459 KOps/s $\color{#d91a1a}-1.00\%$
test_step_mdp_speed[True-True-False-False-False] 52.2910μs 16.6355μs 60.1122 KOps/s 59.9976 KOps/s $\color{#35bf28}+0.19\%$
test_step_mdp_speed[True-False-True-True-True] 0.4700ms 50.6728μs 19.7345 KOps/s 19.6070 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[True-False-True-True-False] 78.0920μs 30.5602μs 32.7223 KOps/s 32.9291 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[True-False-True-False-True] 0.4534ms 27.6485μs 36.1683 KOps/s 36.4781 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[True-False-True-False-False] 0.4455ms 16.7202μs 59.8079 KOps/s 59.9162 KOps/s $\color{#d91a1a}-0.18\%$
test_step_mdp_speed[True-False-False-True-True] 0.4819ms 52.7450μs 18.9592 KOps/s 18.8435 KOps/s $\color{#35bf28}+0.61\%$
test_step_mdp_speed[True-False-False-True-False] 68.8410μs 33.5701μs 29.7884 KOps/s 30.2972 KOps/s $\color{#d91a1a}-1.68\%$
test_step_mdp_speed[True-False-False-False-True] 0.4493ms 30.5343μs 32.7500 KOps/s 33.2864 KOps/s $\color{#d91a1a}-1.61\%$
test_step_mdp_speed[True-False-False-False-False] 0.4357ms 19.0800μs 52.4108 KOps/s 51.0936 KOps/s $\color{#35bf28}+2.58\%$
test_step_mdp_speed[False-True-True-True-True] 80.4510μs 50.5811μs 19.7702 KOps/s 19.7201 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[False-True-True-True-False] 0.4555ms 30.5293μs 32.7554 KOps/s 32.4485 KOps/s $\color{#35bf28}+0.95\%$
test_step_mdp_speed[False-True-True-False-True] 2.3433ms 32.1120μs 31.1410 KOps/s 31.9702 KOps/s $\color{#d91a1a}-2.59\%$
test_step_mdp_speed[False-True-True-False-False] 0.4544ms 18.3230μs 54.5763 KOps/s 55.3552 KOps/s $\color{#d91a1a}-1.41\%$
test_step_mdp_speed[False-True-False-True-True] 85.8120μs 53.3977μs 18.7274 KOps/s 19.2241 KOps/s $\color{#d91a1a}-2.58\%$
test_step_mdp_speed[False-True-False-True-False] 0.4480ms 32.8054μs 30.4828 KOps/s 30.0844 KOps/s $\color{#35bf28}+1.32\%$
test_step_mdp_speed[False-True-False-False-True] 0.4477ms 34.5519μs 28.9420 KOps/s 29.3108 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[False-True-False-False-False] 47.8310μs 20.5947μs 48.5562 KOps/s 47.3744 KOps/s $\color{#35bf28}+2.49\%$
test_step_mdp_speed[False-False-True-True-True] 91.8520μs 55.5599μs 17.9986 KOps/s 17.8513 KOps/s $\color{#35bf28}+0.83\%$
test_step_mdp_speed[False-False-True-True-False] 0.4552ms 35.7916μs 27.9395 KOps/s 27.7980 KOps/s $\color{#35bf28}+0.51\%$
test_step_mdp_speed[False-False-True-False-True] 0.4472ms 33.6663μs 29.7033 KOps/s 29.6369 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[False-False-True-False-False] 0.4363ms 20.9901μs 47.6415 KOps/s 47.8900 KOps/s $\color{#d91a1a}-0.52\%$
test_step_mdp_speed[False-False-False-True-True] 98.6110μs 58.0429μs 17.2286 KOps/s 17.3977 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[False-False-False-True-False] 0.4488ms 38.4546μs 26.0047 KOps/s 25.6806 KOps/s $\color{#35bf28}+1.26\%$
test_step_mdp_speed[False-False-False-False-True] 0.4490ms 35.8633μs 27.8837 KOps/s 27.2797 KOps/s $\color{#35bf28}+2.21\%$
test_step_mdp_speed[False-False-False-False-False] 0.4328ms 23.1503μs 43.1960 KOps/s 42.4992 KOps/s $\color{#35bf28}+1.64\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8740s 0.7749s 1.2905 Ops/s 1.2845 Ops/s $\color{#35bf28}+0.47\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7306s 0.6339s 1.5775 Ops/s 1.5666 Ops/s $\color{#35bf28}+0.69\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7722s 1.6970s 0.5893 Ops/s 0.5876 Ops/s $\color{#35bf28}+0.29\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5457s 1.4648s 0.6827 Ops/s 0.6792 Ops/s $\color{#35bf28}+0.50\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0230s 1.9451s 0.5141 Ops/s 0.5120 Ops/s $\color{#35bf28}+0.40\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.8011s 1.7213s 0.5809 Ops/s 0.5792 Ops/s $\color{#35bf28}+0.31\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7386s 4.6424s 0.2154 Ops/s 0.2124 Ops/s $\color{#35bf28}+1.42\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5849s 4.4523s 0.2246 Ops/s 0.2214 Ops/s $\color{#35bf28}+1.46\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0215s 1.9205s 0.5207 Ops/s 0.5227 Ops/s $\color{#d91a1a}-0.39\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7224s 1.6641s 0.6009 Ops/s 0.6014 Ops/s $\color{#d91a1a}-0.07\%$
test_values[generalized_advantage_estimate-True-True] 21.0885ms 20.7477ms 48.1982 Ops/s 47.7899 Ops/s $\color{#35bf28}+0.85\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1564s 4.0477ms 247.0565 Ops/s 285.3646 Ops/s $\textbf{\color{#d91a1a}-13.42\%}$
test_values[td0_return_estimate-False-False] 0.1086ms 82.6738μs 12.0957 KOps/s 12.0341 KOps/s $\color{#35bf28}+0.51\%$
test_values[td1_return_estimate-False-False] 49.5077ms 49.0265ms 20.3971 Ops/s 20.2622 Ops/s $\color{#35bf28}+0.67\%$
test_values[vec_td1_return_estimate-False-False] 1.3310ms 1.0878ms 919.3035 Ops/s 916.2214 Ops/s $\color{#35bf28}+0.34\%$
test_values[td_lambda_return_estimate-True-False] 80.6227ms 80.2832ms 12.4559 Ops/s 12.3955 Ops/s $\color{#35bf28}+0.49\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3377ms 1.0873ms 919.7033 Ops/s 920.4723 Ops/s $\color{#d91a1a}-0.08\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.2615ms 20.9485ms 47.7361 Ops/s 44.4422 Ops/s $\textbf{\color{#35bf28}+7.41\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0357ms 0.7566ms 1.3216 KOps/s 1.3160 KOps/s $\color{#35bf28}+0.42\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7347ms 0.6819ms 1.4664 KOps/s 1.4663 KOps/s $+0.01\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5314ms 1.4929ms 669.8357 Ops/s 668.7844 Ops/s $\color{#35bf28}+0.16\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7453ms 0.6974ms 1.4340 KOps/s 1.4355 KOps/s $\color{#d91a1a}-0.11\%$
test_dqn_speed[False-None] 1.7585ms 1.5313ms 653.0347 Ops/s 648.0240 Ops/s $\color{#35bf28}+0.77\%$
test_dqn_speed[False-backward] 2.4809ms 2.1934ms 455.9087 Ops/s 454.0925 Ops/s $\color{#35bf28}+0.40\%$
test_dqn_speed[True-None] 0.6419ms 0.5658ms 1.7673 KOps/s 1.7314 KOps/s $\color{#35bf28}+2.07\%$
test_dqn_speed[True-backward] 1.2755ms 1.2221ms 818.2879 Ops/s 881.6167 Ops/s $\textbf{\color{#d91a1a}-7.18\%}$
test_dqn_speed[reduce-overhead-None] 0.7262ms 0.6046ms 1.6539 KOps/s 1.5748 KOps/s $\textbf{\color{#35bf28}+5.02\%}$
test_ddpg_speed[False-None] 3.3084ms 2.9112ms 343.5053 Ops/s 342.3773 Ops/s $\color{#35bf28}+0.33\%$
test_ddpg_speed[False-backward] 4.7428ms 4.3359ms 230.6327 Ops/s 238.1091 Ops/s $\color{#d91a1a}-3.14\%$
test_ddpg_speed[True-None] 1.4163ms 1.3286ms 752.6538 Ops/s 744.5261 Ops/s $\color{#35bf28}+1.09\%$
test_ddpg_speed[True-backward] 2.6179ms 2.5402ms 393.6739 Ops/s 411.7870 Ops/s $\color{#d91a1a}-4.40\%$
test_ddpg_speed[reduce-overhead-None] 1.6049ms 1.3896ms 719.6507 Ops/s 729.7472 Ops/s $\color{#d91a1a}-1.38\%$
test_sac_speed[False-None] 9.5821ms 8.6046ms 116.2168 Ops/s 120.0762 Ops/s $\color{#d91a1a}-3.21\%$
test_sac_speed[False-backward] 12.2406ms 11.7604ms 85.0314 Ops/s 87.9814 Ops/s $\color{#d91a1a}-3.35\%$
test_sac_speed[True-None] 1.9051ms 1.8303ms 546.3476 Ops/s 536.7360 Ops/s $\color{#35bf28}+1.79\%$
test_sac_speed[True-backward] 3.7011ms 3.6128ms 276.7935 Ops/s 271.7592 Ops/s $\color{#35bf28}+1.85\%$
test_sac_speed[reduce-overhead-None] 18.9903ms 10.8648ms 92.0406 Ops/s 83.1571 Ops/s $\textbf{\color{#35bf28}+10.68\%}$
test_redq_deprec_speed[False-None] 9.8103ms 9.3432ms 107.0298 Ops/s 105.7311 Ops/s $\color{#35bf28}+1.23\%$
test_redq_deprec_speed[False-backward] 13.1944ms 12.7347ms 78.5259 Ops/s 77.3996 Ops/s $\color{#35bf28}+1.46\%$
test_redq_deprec_speed[True-None] 2.6218ms 2.5547ms 391.4349 Ops/s 389.5513 Ops/s $\color{#35bf28}+0.48\%$
test_redq_deprec_speed[True-backward] 4.3586ms 4.3136ms 231.8257 Ops/s 233.2146 Ops/s $\color{#d91a1a}-0.60\%$
test_redq_deprec_speed[reduce-overhead-None] 17.0566ms 10.0244ms 99.7568 Ops/s 101.2905 Ops/s $\color{#d91a1a}-1.51\%$
test_td3_speed[False-None] 8.4786ms 8.2344ms 121.4418 Ops/s 121.4005 Ops/s $\color{#35bf28}+0.03\%$
test_td3_speed[False-backward] 11.6273ms 10.9023ms 91.7242 Ops/s 93.6924 Ops/s $\color{#d91a1a}-2.10\%$
test_td3_speed[True-None] 1.6796ms 1.6543ms 604.4866 Ops/s 569.0958 Ops/s $\textbf{\color{#35bf28}+6.22\%}$
test_td3_speed[True-backward] 3.7177ms 3.2669ms 306.0965 Ops/s 312.5064 Ops/s $\color{#d91a1a}-2.05\%$
test_td3_speed[reduce-overhead-None] 71.3106ms 24.9176ms 40.1322 Ops/s 40.4350 Ops/s $\color{#d91a1a}-0.75\%$
test_cql_speed[False-None] 17.7372ms 17.2723ms 57.8960 Ops/s 57.5102 Ops/s $\color{#35bf28}+0.67\%$
test_cql_speed[False-backward] 23.5113ms 22.9249ms 43.6207 Ops/s 43.3351 Ops/s $\color{#35bf28}+0.66\%$
test_cql_speed[True-None] 3.4228ms 3.2735ms 305.4848 Ops/s 300.6772 Ops/s $\color{#35bf28}+1.60\%$
test_cql_speed[True-backward] 5.8227ms 5.3899ms 185.5336 Ops/s 175.8178 Ops/s $\textbf{\color{#35bf28}+5.53\%}$
test_cql_speed[reduce-overhead-None] 19.0159ms 11.8752ms 84.2093 Ops/s 83.1387 Ops/s $\color{#35bf28}+1.29\%$
test_a2c_speed[False-None] 3.9549ms 3.2510ms 307.6003 Ops/s 305.1354 Ops/s $\color{#35bf28}+0.81\%$
test_a2c_speed[False-backward] 6.6507ms 6.1829ms 161.7374 Ops/s 154.9932 Ops/s $\color{#35bf28}+4.35\%$
test_a2c_speed[True-None] 1.4207ms 1.3257ms 754.3050 Ops/s 745.2296 Ops/s $\color{#35bf28}+1.22\%$
test_a2c_speed[True-backward] 3.0370ms 2.9866ms 334.8335 Ops/s 332.0253 Ops/s $\color{#35bf28}+0.85\%$
test_a2c_speed[reduce-overhead-None] 1.2053ms 1.0044ms 995.6228 Ops/s 995.7738 Ops/s $\color{#d91a1a}-0.02\%$
test_ppo_speed[False-None] 3.9745ms 3.8597ms 259.0855 Ops/s 257.9643 Ops/s $\color{#35bf28}+0.43\%$
test_ppo_speed[False-backward] 7.5392ms 7.0455ms 141.9348 Ops/s 143.1049 Ops/s $\color{#d91a1a}-0.82\%$
test_ppo_speed[True-None] 1.6190ms 1.4440ms 692.5252 Ops/s 697.0942 Ops/s $\color{#d91a1a}-0.66\%$
test_ppo_speed[True-backward] 3.1479ms 3.0916ms 323.4615 Ops/s 314.8874 Ops/s $\color{#35bf28}+2.72\%$
test_ppo_speed[reduce-overhead-None] 1.1586ms 1.0573ms 945.7739 Ops/s 930.6005 Ops/s $\color{#35bf28}+1.63\%$
test_reinforce_speed[False-None] 2.4947ms 2.3146ms 432.0494 Ops/s 420.5286 Ops/s $\color{#35bf28}+2.74\%$
test_reinforce_speed[False-backward] 3.8896ms 3.3923ms 294.7870 Ops/s 284.4410 Ops/s $\color{#35bf28}+3.64\%$
test_reinforce_speed[True-None] 1.4155ms 1.2994ms 769.5593 Ops/s 762.9844 Ops/s $\color{#35bf28}+0.86\%$
test_reinforce_speed[True-backward] 2.9802ms 2.9248ms 341.9004 Ops/s 323.1620 Ops/s $\textbf{\color{#35bf28}+5.80\%}$
test_reinforce_speed[reduce-overhead-None] 17.6479ms 9.5677ms 104.5184 Ops/s 106.0571 Ops/s $\color{#d91a1a}-1.45\%$
test_iql_speed[False-None] 10.2711ms 9.4707ms 105.5884 Ops/s 105.5303 Ops/s $\color{#35bf28}+0.06\%$
test_iql_speed[False-backward] 13.7412ms 13.1621ms 75.9760 Ops/s 75.1504 Ops/s $\color{#35bf28}+1.10\%$
test_iql_speed[True-None] 2.5161ms 2.1973ms 455.1091 Ops/s 450.1519 Ops/s $\color{#35bf28}+1.10\%$
test_iql_speed[True-backward] 5.2463ms 4.7707ms 209.6120 Ops/s 199.2707 Ops/s $\textbf{\color{#35bf28}+5.19\%}$
test_iql_speed[reduce-overhead-None] 17.8835ms 10.6186ms 94.1747 Ops/s 95.5738 Ops/s $\color{#d91a1a}-1.46\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1182ms 5.9419ms 168.2955 Ops/s 166.0902 Ops/s $\color{#35bf28}+1.33\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9524ms 0.3871ms 2.5836 KOps/s 2.6191 KOps/s $\color{#d91a1a}-1.36\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5895ms 0.3456ms 2.8938 KOps/s 2.7407 KOps/s $\textbf{\color{#35bf28}+5.59\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9531ms 5.6669ms 176.4641 Ops/s 170.0741 Ops/s $\color{#35bf28}+3.76\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8825ms 0.3553ms 2.8147 KOps/s 3.3028 KOps/s $\textbf{\color{#d91a1a}-14.78\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5103ms 0.2654ms 3.7684 KOps/s 3.5235 KOps/s $\textbf{\color{#35bf28}+6.95\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5612ms 1.3046ms 766.4943 Ops/s 769.4766 Ops/s $\color{#d91a1a}-0.39\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4179ms 1.2135ms 824.0596 Ops/s 817.3065 Ops/s $\color{#35bf28}+0.83\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1774ms 5.9280ms 168.6909 Ops/s 166.3451 Ops/s $\color{#35bf28}+1.41\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.6888ms 0.4401ms 2.2723 KOps/s 2.2932 KOps/s $\color{#d91a1a}-0.91\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7353ms 0.4815ms 2.0770 KOps/s 1.9537 KOps/s $\textbf{\color{#35bf28}+6.31\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9320ms 5.8723ms 170.2912 Ops/s 169.4486 Ops/s $\color{#35bf28}+0.50\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0779ms 0.3592ms 2.7841 KOps/s 2.7848 KOps/s $\color{#d91a1a}-0.02\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6771ms 0.3898ms 2.5657 KOps/s 3.7184 KOps/s $\textbf{\color{#d91a1a}-31.00\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0529ms 5.8450ms 171.0850 Ops/s 171.3912 Ops/s $\color{#d91a1a}-0.18\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9427ms 0.3057ms 3.2715 KOps/s 3.0676 KOps/s $\textbf{\color{#35bf28}+6.65\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4490ms 0.2834ms 3.5288 KOps/s 3.1785 KOps/s $\textbf{\color{#35bf28}+11.02\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0330ms 5.9277ms 168.6995 Ops/s 167.5567 Ops/s $\color{#35bf28}+0.68\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.3054ms 0.5361ms 1.8653 KOps/s 1.9689 KOps/s $\textbf{\color{#d91a1a}-5.26\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6921ms 0.5004ms 1.9984 KOps/s 1.7867 KOps/s $\textbf{\color{#35bf28}+11.85\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5932s 16.8224ms 59.4446 Ops/s 49.2580 Ops/s $\textbf{\color{#35bf28}+20.68\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.3831ms 1.9815ms 504.6799 Ops/s 495.8515 Ops/s $\color{#35bf28}+1.78\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 10.5857ms 1.3159ms 759.9632 Ops/s 733.9749 Ops/s $\color{#35bf28}+3.54\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 8.1701ms 5.1387ms 194.6033 Ops/s 194.7096 Ops/s $\color{#d91a1a}-0.05\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 9.4740ms 1.9933ms 501.6723 Ops/s 508.3236 Ops/s $\color{#d91a1a}-1.31\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 9.2782ms 1.3244ms 755.0719 Ops/s 1.0455 KOps/s $\textbf{\color{#d91a1a}-27.78\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.0094ms 5.3388ms 187.3096 Ops/s 186.6065 Ops/s $\color{#35bf28}+0.38\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.0857ms 2.0078ms 498.0470 Ops/s 481.8271 Ops/s $\color{#35bf28}+3.37\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.0521ms 1.1874ms 842.1986 Ops/s 843.8022 Ops/s $\color{#d91a1a}-0.19\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.2437ms 36.0929ms 27.7063 Ops/s 27.3395 Ops/s $\color{#35bf28}+1.34\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.6715ms 18.2702ms 54.7339 Ops/s 53.2321 Ops/s $\color{#35bf28}+2.82\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.5165ms 37.4408ms 26.7089 Ops/s 26.4473 Ops/s $\color{#35bf28}+0.99\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.1564ms 18.6944ms 53.4920 Ops/s 52.8858 Ops/s $\color{#35bf28}+1.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.0666ms 39.1829ms 25.5213 Ops/s 25.1231 Ops/s $\color{#35bf28}+1.59\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.7409ms 20.2620ms 49.3535 Ops/s 48.4858 Ops/s $\color{#35bf28}+1.79\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8776ms 0.2257ms 4.4303 KOps/s 4.4138 KOps/s $\color{#35bf28}+0.37\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7511ms 1.4175ms 705.4513 Ops/s 700.6929 Ops/s $\color{#35bf28}+0.68\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7141ms 2.3445ms 426.5299 Ops/s 431.1534 Ops/s $\color{#d91a1a}-1.07\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1524ms 2.9554ms 338.3616 Ops/s 337.3370 Ops/s $\color{#35bf28}+0.30\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2499ms 0.1674ms 5.9732 KOps/s 6.0800 KOps/s $\color{#d91a1a}-1.76\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3898ms 0.2313ms 4.3238 KOps/s 4.5662 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 2.2411ms 1.8881ms 529.6465 Ops/s 557.1878 Ops/s $\color{#d91a1a}-4.94\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5318ms 1.3815ms 723.8481 Ops/s 711.7561 Ops/s $\color{#35bf28}+1.70\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2089ms 1.1574ms 863.9934 Ops/s 863.0531 Ops/s $\color{#35bf28}+0.11\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.6864ms 3.6767ms 271.9827 Ops/s 271.7038 Ops/s $\color{#35bf28}+0.10\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.1709ms 5.9502ms 168.0625 Ops/s 170.2686 Ops/s $\color{#d91a1a}-1.30\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 15.1455ms 7.2091ms 138.7133 Ops/s 139.1262 Ops/s $\color{#d91a1a}-0.30\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4308ms 0.2725ms 3.6697 KOps/s 3.6617 KOps/s $\color{#35bf28}+0.22\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6644ms 1.5185ms 658.5238 Ops/s 650.7790 Ops/s $\color{#35bf28}+1.19\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.9710ms 2.4775ms 403.6405 Ops/s 410.8720 Ops/s $\color{#d91a1a}-1.76\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.5717ms 3.1591ms 316.5484 Ops/s 315.9580 Ops/s $\color{#35bf28}+0.19\%$
test_collector_without_rb[100-img_shape0-atari] 35.7852ms 35.1291ms 28.4664 Ops/s 28.4315 Ops/s $\color{#35bf28}+0.12\%$
test_collector_without_rb[200-img_shape1-large_batch] 70.0662ms 69.1257ms 14.4664 Ops/s 14.5834 Ops/s $\color{#d91a1a}-0.80\%$
test_collector_with_rb[100-img_shape0-atari] 44.2394ms 43.1711ms 23.1636 Ops/s 23.4587 Ops/s $\color{#d91a1a}-1.26\%$
test_collector_with_rb[200-img_shape1-large_batch] 86.0971ms 84.3683ms 11.8528 Ops/s 11.8358 Ops/s $\color{#35bf28}+0.14\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 59.1841ms 57.4382ms 17.4100 Ops/s 17.4483 Ops/s $\color{#d91a1a}-0.22\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1179s 0.1149s 8.7058 Ops/s 8.7816 Ops/s $\color{#d91a1a}-0.86\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 64.4858ms 62.8472ms 15.9116 Ops/s 15.9826 Ops/s $\color{#d91a1a}-0.44\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1271s 0.1254s 7.9774 Ops/s 8.0232 Ops/s $\color{#d91a1a}-0.57\%$

@vmoens
Copy link
Collaborator Author

vmoens commented Feb 7, 2026

Closing: prof instrumentation not needed in the final stack.

@vmoens vmoens closed this Feb 7, 2026
@vmoens vmoens deleted the gh/vmoens/221/head branch February 7, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant