Skip to content

[Example] Dreamer: SerialEnv mode and collector compile config#3459

Closed
vmoens wants to merge 3 commits intogh/vmoens/220/basefrom
gh/vmoens/220/head
Closed

[Example] Dreamer: SerialEnv mode and collector compile config#3459
vmoens wants to merge 3 commits intogh/vmoens/220/basefrom
gh/vmoens/220/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 6, 2026

Add two configuration knobs to the Dreamer example:

  1. env.parallel_env_mode ("parallel" | "serial"): switches the train
    environment between ParallelEnv (uses IPC) and SerialEnv (no IPC
    overhead, better for cheap envs or when GPU contention between
    collector workers degrades throughput). Parallel mode remains the default.

  2. collector.compile block (enabled, backend, cudagraphs): passes
    compilation settings to MultiCollector via compile_policy and
    cudagraph_policy kwargs, enabling torch.compile + CUDA graphs for
    the policy in collector workers.

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3459

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 3821082 with merge base 73b853b (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

⚠️ PR Title Label Error

Unknown or invalid prefix [Example].

Current title: [Example] Dreamer: SerialEnv mode and collector compile config

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

⚠️ PR Title Label Error

Unknown or invalid prefix [Example].

Current title: [Example] Dreamer: SerialEnv mode and collector compile config

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

⚠️ PR Title Label Error

Unknown or invalid prefix [Example].

Current title: [Example] Dreamer: SerialEnv mode and collector compile config

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.7410μs 80.1110μs 12.4827 KOps/s 11.8077 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_tensor_to_bytestream_speed[torch.save] 0.1470ms 0.1424ms 7.0244 KOps/s 7.1205 KOps/s $\color{#d91a1a}-1.35\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1299s 0.1293s 7.7327 Ops/s 8.5998 Ops/s $\textbf{\color{#d91a1a}-10.08\%}$
test_tensor_to_bytestream_speed[numpy] 2.7699μs 2.7612μs 362.1553 KOps/s 377.1386 KOps/s $\color{#d91a1a}-3.97\%$
test_tensor_to_bytestream_speed[safetensors] 39.5596μs 38.7450μs 25.8098 KOps/s 25.0637 KOps/s $\color{#35bf28}+2.98\%$
test_simple 0.5612s 0.5555s 1.8002 Ops/s 1.7081 Ops/s $\textbf{\color{#35bf28}+5.40\%}$
test_transformed 1.2543s 1.1635s 0.8595 Ops/s 0.8560 Ops/s $\color{#35bf28}+0.40\%$
test_serial 1.7110s 1.7048s 0.5866 Ops/s 0.5827 Ops/s $\color{#35bf28}+0.66\%$
test_parallel 1.1492s 1.0574s 0.9457 Ops/s 0.9398 Ops/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[True-True-True-True-True] 0.3413ms 44.0442μs 22.7045 KOps/s 22.8973 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[True-True-True-True-False] 48.6910μs 25.0129μs 39.9794 KOps/s 38.9386 KOps/s $\color{#35bf28}+2.67\%$
test_step_mdp_speed[True-True-True-False-True] 56.4110μs 24.6183μs 40.6201 KOps/s 39.4179 KOps/s $\color{#35bf28}+3.05\%$
test_step_mdp_speed[True-True-True-False-False] 44.1310μs 13.7164μs 72.9053 KOps/s 71.9032 KOps/s $\color{#35bf28}+1.39\%$
test_step_mdp_speed[True-True-False-True-True] 76.1420μs 47.5843μs 21.0153 KOps/s 20.9140 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[True-True-False-True-False] 58.2110μs 27.5543μs 36.2919 KOps/s 34.9772 KOps/s $\color{#35bf28}+3.76\%$
test_step_mdp_speed[True-True-False-False-True] 64.0010μs 27.6647μs 36.1472 KOps/s 35.7622 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[True-True-False-False-False] 43.2110μs 16.3884μs 61.0187 KOps/s 59.6555 KOps/s $\color{#35bf28}+2.29\%$
test_step_mdp_speed[True-False-True-True-True] 81.9920μs 50.2065μs 19.9178 KOps/s 19.3674 KOps/s $\color{#35bf28}+2.84\%$
test_step_mdp_speed[True-False-True-True-False] 55.9010μs 30.5469μs 32.7365 KOps/s 31.7125 KOps/s $\color{#35bf28}+3.23\%$
test_step_mdp_speed[True-False-True-False-True] 60.0910μs 27.8068μs 35.9625 KOps/s 36.3456 KOps/s $\color{#d91a1a}-1.05\%$
test_step_mdp_speed[True-False-True-False-False] 45.6010μs 16.5727μs 60.3402 KOps/s 59.5256 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[True-False-False-True-True] 82.2710μs 52.4528μs 19.0648 KOps/s 18.7822 KOps/s $\color{#35bf28}+1.50\%$
test_step_mdp_speed[True-False-False-True-False] 66.4320μs 32.8256μs 30.4640 KOps/s 29.6421 KOps/s $\color{#35bf28}+2.77\%$
test_step_mdp_speed[True-False-False-False-True] 64.6810μs 29.9604μs 33.3774 KOps/s 32.5733 KOps/s $\color{#35bf28}+2.47\%$
test_step_mdp_speed[True-False-False-False-False] 53.9010μs 19.1915μs 52.1063 KOps/s 52.2897 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[False-True-True-True-True] 87.1620μs 50.2927μs 19.8836 KOps/s 19.8621 KOps/s $\color{#35bf28}+0.11\%$
test_step_mdp_speed[False-True-True-True-False] 58.5120μs 30.6854μs 32.5888 KOps/s 32.0102 KOps/s $\color{#35bf28}+1.81\%$
test_step_mdp_speed[False-True-True-False-True] 2.3809ms 31.9376μs 31.3110 KOps/s 31.5699 KOps/s $\color{#d91a1a}-0.82\%$
test_step_mdp_speed[False-True-True-False-False] 46.9910μs 18.2160μs 54.8969 KOps/s 54.1606 KOps/s $\color{#35bf28}+1.36\%$
test_step_mdp_speed[False-True-False-True-True] 0.1555ms 50.4988μs 19.8025 KOps/s 18.9769 KOps/s $\color{#35bf28}+4.35\%$
test_step_mdp_speed[False-True-False-True-False] 56.7210μs 32.8086μs 30.4798 KOps/s 29.6532 KOps/s $\color{#35bf28}+2.79\%$
test_step_mdp_speed[False-True-False-False-True] 67.1310μs 34.0519μs 29.3670 KOps/s 29.3418 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[False-True-False-False-False] 46.2810μs 20.8353μs 47.9955 KOps/s 47.4971 KOps/s $\color{#35bf28}+1.05\%$
test_step_mdp_speed[False-False-True-True-True] 93.3620μs 55.1586μs 18.1295 KOps/s 17.8470 KOps/s $\color{#35bf28}+1.58\%$
test_step_mdp_speed[False-False-True-True-False] 83.2520μs 36.1576μs 27.6567 KOps/s 27.3643 KOps/s $\color{#35bf28}+1.07\%$
test_step_mdp_speed[False-False-True-False-True] 61.5020μs 33.2416μs 30.0828 KOps/s 29.3152 KOps/s $\color{#35bf28}+2.62\%$
test_step_mdp_speed[False-False-True-False-False] 0.1666ms 20.6902μs 48.3320 KOps/s 46.9658 KOps/s $\color{#35bf28}+2.91\%$
test_step_mdp_speed[False-False-False-True-True] 93.2920μs 56.6288μs 17.6589 KOps/s 17.0071 KOps/s $\color{#35bf28}+3.83\%$
test_step_mdp_speed[False-False-False-True-False] 71.9310μs 38.4141μs 26.0321 KOps/s 25.4714 KOps/s $\color{#35bf28}+2.20\%$
test_step_mdp_speed[False-False-False-False-True] 71.0110μs 35.5468μs 28.1320 KOps/s 27.0656 KOps/s $\color{#35bf28}+3.94\%$
test_step_mdp_speed[False-False-False-False-False] 48.9710μs 23.3010μs 42.9167 KOps/s 42.4548 KOps/s $\color{#35bf28}+1.09\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8647s 0.7693s 1.2998 Ops/s 1.2911 Ops/s $\color{#35bf28}+0.68\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7307s 0.6360s 1.5723 Ops/s 1.5705 Ops/s $\color{#35bf28}+0.11\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7553s 1.6842s 0.5938 Ops/s 0.5896 Ops/s $\color{#35bf28}+0.71\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5451s 1.4653s 0.6824 Ops/s 0.6816 Ops/s $\color{#35bf28}+0.12\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0221s 1.9424s 0.5148 Ops/s 0.5129 Ops/s $\color{#35bf28}+0.38\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7976s 1.7189s 0.5818 Ops/s 0.5832 Ops/s $\color{#d91a1a}-0.25\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.8158s 4.6914s 0.2132 Ops/s 0.2115 Ops/s $\color{#35bf28}+0.79\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6322s 4.5295s 0.2208 Ops/s 0.2217 Ops/s $\color{#d91a1a}-0.42\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0622s 1.9210s 0.5206 Ops/s 0.5161 Ops/s $\color{#35bf28}+0.86\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7980s 1.6369s 0.6109 Ops/s 0.6186 Ops/s $\color{#d91a1a}-1.24\%$
test_values[generalized_advantage_estimate-True-True] 11.3584ms 11.1766ms 89.4729 Ops/s 93.9097 Ops/s $\color{#d91a1a}-4.72\%$
test_values[vec_generalized_advantage_estimate-True-True] 13.5288ms 11.0435ms 90.5510 Ops/s 87.0922 Ops/s $\color{#35bf28}+3.97\%$
test_values[td0_return_estimate-False-False] 0.2251ms 0.1312ms 7.6202 KOps/s 7.6364 KOps/s $\color{#d91a1a}-0.21\%$
test_values[td1_return_estimate-False-False] 30.9869ms 30.5566ms 32.7262 Ops/s 34.4845 Ops/s $\textbf{\color{#d91a1a}-5.10\%}$
test_values[vec_td1_return_estimate-False-False] 11.5082ms 11.1257ms 89.8821 Ops/s 91.0159 Ops/s $\color{#d91a1a}-1.25\%$
test_values[td_lambda_return_estimate-True-False] 45.9731ms 45.4234ms 22.0151 Ops/s 23.2892 Ops/s $\textbf{\color{#d91a1a}-5.47\%}$
test_values[vec_td_lambda_return_estimate-True-False] 12.0987ms 11.1584ms 89.6182 Ops/s 91.3964 Ops/s $\color{#d91a1a}-1.95\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.1036ms 9.9930ms 100.0700 Ops/s 104.9391 Ops/s $\color{#d91a1a}-4.64\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7729ms 1.5773ms 633.9755 Ops/s 660.0740 Ops/s $\color{#d91a1a}-3.95\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4753ms 0.4368ms 2.2894 KOps/s 2.2645 KOps/s $\color{#35bf28}+1.10\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 29.8874ms 26.3122ms 38.0051 Ops/s 52.9964 Ops/s $\textbf{\color{#d91a1a}-28.29\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.9294ms 1.7043ms 586.7545 Ops/s 585.2096 Ops/s $\color{#35bf28}+0.26\%$
test_dqn_speed[False-None] 1.6185ms 1.4363ms 696.2538 Ops/s 706.1845 Ops/s $\color{#d91a1a}-1.41\%$
test_dqn_speed[False-backward] 2.0209ms 1.9470ms 513.6005 Ops/s 515.7591 Ops/s $\color{#d91a1a}-0.42\%$
test_dqn_speed[True-None] 0.6156ms 0.5559ms 1.7989 KOps/s 1.7906 KOps/s $\color{#35bf28}+0.46\%$
test_dqn_speed[True-backward] 1.0605ms 1.0212ms 979.2357 Ops/s 976.8853 Ops/s $\color{#35bf28}+0.24\%$
test_dqn_speed[reduce-overhead-None] 0.6847ms 0.5457ms 1.8327 KOps/s 1.8058 KOps/s $\color{#35bf28}+1.49\%$
test_ddpg_speed[False-None] 3.2790ms 2.8861ms 346.4921 Ops/s 347.7361 Ops/s $\color{#d91a1a}-0.36\%$
test_ddpg_speed[False-backward] 4.3483ms 4.1231ms 242.5363 Ops/s 242.7113 Ops/s $\color{#d91a1a}-0.07\%$
test_ddpg_speed[True-None] 1.6953ms 1.4366ms 696.0703 Ops/s 692.5522 Ops/s $\color{#35bf28}+0.51\%$
test_ddpg_speed[True-backward] 2.5386ms 2.4508ms 408.0247 Ops/s 410.2820 Ops/s $\color{#d91a1a}-0.55\%$
test_ddpg_speed[reduce-overhead-None] 1.5495ms 1.4136ms 707.3969 Ops/s 702.1881 Ops/s $\color{#35bf28}+0.74\%$
test_sac_speed[False-None] 8.7401ms 8.1585ms 122.5722 Ops/s 123.1197 Ops/s $\color{#d91a1a}-0.44\%$
test_sac_speed[False-backward] 11.8714ms 11.4042ms 87.6868 Ops/s 87.4200 Ops/s $\color{#35bf28}+0.31\%$
test_sac_speed[True-None] 2.7424ms 2.1696ms 460.9122 Ops/s 458.9877 Ops/s $\color{#35bf28}+0.42\%$
test_sac_speed[True-backward] 4.2288ms 4.0953ms 244.1846 Ops/s 192.3427 Ops/s $\textbf{\color{#35bf28}+26.95\%}$
test_sac_speed[reduce-overhead-None] 2.3106ms 2.1751ms 459.7581 Ops/s 458.2367 Ops/s $\color{#35bf28}+0.33\%$
test_redq_speed[False-None] 10.9768ms 10.5546ms 94.7455 Ops/s 94.7674 Ops/s $\color{#d91a1a}-0.02\%$
test_redq_speed[False-backward] 18.8013ms 17.9815ms 55.6128 Ops/s 56.1285 Ops/s $\color{#d91a1a}-0.92\%$
test_redq_speed[True-None] 4.6380ms 4.4670ms 223.8661 Ops/s 223.6544 Ops/s $\color{#35bf28}+0.09\%$
test_redq_speed[True-backward] 10.1771ms 9.8866ms 101.1471 Ops/s 102.1692 Ops/s $\color{#d91a1a}-1.00\%$
test_redq_speed[reduce-overhead-None] 4.7154ms 4.4657ms 223.9292 Ops/s 199.2463 Ops/s $\textbf{\color{#35bf28}+12.39\%}$
test_redq_deprec_speed[False-None] 11.8687ms 11.3612ms 88.0192 Ops/s 89.3784 Ops/s $\color{#d91a1a}-1.52\%$
test_redq_deprec_speed[False-backward] 16.9184ms 16.2971ms 61.3606 Ops/s 62.2166 Ops/s $\color{#d91a1a}-1.38\%$
test_redq_deprec_speed[True-None] 3.8817ms 3.7220ms 268.6698 Ops/s 266.9444 Ops/s $\color{#35bf28}+0.65\%$
test_redq_deprec_speed[True-backward] 8.0258ms 7.7428ms 129.1525 Ops/s 122.8908 Ops/s $\textbf{\color{#35bf28}+5.10\%}$
test_redq_deprec_speed[reduce-overhead-None] 3.8021ms 3.6709ms 272.4156 Ops/s 260.6343 Ops/s $\color{#35bf28}+4.52\%$
test_td3_speed[False-None] 8.4243ms 8.1741ms 122.3383 Ops/s 123.3898 Ops/s $\color{#d91a1a}-0.85\%$
test_td3_speed[False-backward] 11.6151ms 11.1034ms 90.0625 Ops/s 90.9083 Ops/s $\color{#d91a1a}-0.93\%$
test_td3_speed[True-None] 1.9337ms 1.8666ms 535.7390 Ops/s 528.4883 Ops/s $\color{#35bf28}+1.37\%$
test_td3_speed[True-backward] 4.2554ms 3.7410ms 267.3084 Ops/s 248.9343 Ops/s $\textbf{\color{#35bf28}+7.38\%}$
test_td3_speed[reduce-overhead-None] 1.9590ms 1.8378ms 544.1347 Ops/s 548.7210 Ops/s $\color{#d91a1a}-0.84\%$
test_cql_speed[False-None] 29.8942ms 26.5482ms 37.6674 Ops/s 37.9808 Ops/s $\color{#d91a1a}-0.83\%$
test_cql_speed[False-backward] 38.7504ms 35.9161ms 27.8426 Ops/s 28.2025 Ops/s $\color{#d91a1a}-1.28\%$
test_cql_speed[True-None] 15.1352ms 12.5406ms 79.7408 Ops/s 77.7450 Ops/s $\color{#35bf28}+2.57\%$
test_cql_speed[True-backward] 19.1692ms 18.6593ms 53.5925 Ops/s 55.6573 Ops/s $\color{#d91a1a}-3.71\%$
test_cql_speed[reduce-overhead-None] 12.7334ms 12.4776ms 80.1435 Ops/s 78.5062 Ops/s $\color{#35bf28}+2.09\%$
test_a2c_speed[False-None] 5.6693ms 5.5191ms 181.1901 Ops/s 182.0806 Ops/s $\color{#d91a1a}-0.49\%$
test_a2c_speed[False-backward] 12.3122ms 12.0686ms 82.8597 Ops/s 84.1177 Ops/s $\color{#d91a1a}-1.50\%$
test_a2c_speed[True-None] 3.8989ms 3.7355ms 267.7043 Ops/s 265.7272 Ops/s $\color{#35bf28}+0.74\%$
test_a2c_speed[True-backward] 8.8251ms 8.6450ms 115.6743 Ops/s 116.0023 Ops/s $\color{#d91a1a}-0.28\%$
test_a2c_speed[reduce-overhead-None] 3.8591ms 3.7484ms 266.7771 Ops/s 268.5529 Ops/s $\color{#d91a1a}-0.66\%$
test_ppo_speed[False-None] 6.1927ms 6.0086ms 166.4293 Ops/s 166.2338 Ops/s $\color{#35bf28}+0.12\%$
test_ppo_speed[False-backward] 12.9567ms 12.7169ms 78.6353 Ops/s 79.2761 Ops/s $\color{#d91a1a}-0.81\%$
test_ppo_speed[True-None] 3.8365ms 3.6737ms 272.2051 Ops/s 271.8522 Ops/s $\color{#35bf28}+0.13\%$
test_ppo_speed[True-backward] 8.6897ms 8.4869ms 117.8291 Ops/s 118.7639 Ops/s $\color{#d91a1a}-0.79\%$
test_ppo_speed[reduce-overhead-None] 3.7716ms 3.6537ms 273.6915 Ops/s 271.1671 Ops/s $\color{#35bf28}+0.93\%$
test_reinforce_speed[False-None] 4.8410ms 4.5380ms 220.3624 Ops/s 214.2805 Ops/s $\color{#35bf28}+2.84\%$
test_reinforce_speed[False-backward] 7.5555ms 7.3755ms 135.5849 Ops/s 134.5215 Ops/s $\color{#35bf28}+0.79\%$
test_reinforce_speed[True-None] 3.0462ms 2.9174ms 342.7754 Ops/s 339.4848 Ops/s $\color{#35bf28}+0.97\%$
test_reinforce_speed[True-backward] 8.0220ms 7.8211ms 127.8591 Ops/s 129.7306 Ops/s $\color{#d91a1a}-1.44\%$
test_reinforce_speed[reduce-overhead-None] 3.0381ms 2.8991ms 344.9292 Ops/s 343.5962 Ops/s $\color{#35bf28}+0.39\%$
test_iql_speed[False-None] 27.5565ms 20.4973ms 48.7869 Ops/s 49.8437 Ops/s $\color{#d91a1a}-2.12\%$
test_iql_speed[False-backward] 35.2710ms 30.7283ms 32.5433 Ops/s 32.8520 Ops/s $\color{#d91a1a}-0.94\%$
test_iql_speed[True-None] 9.1084ms 8.6123ms 116.1124 Ops/s 112.3479 Ops/s $\color{#35bf28}+3.35\%$
test_iql_speed[True-backward] 17.2291ms 16.8512ms 59.3429 Ops/s 59.1445 Ops/s $\color{#35bf28}+0.34\%$
test_iql_speed[reduce-overhead-None] 8.8885ms 8.6157ms 116.0665 Ops/s 114.3327 Ops/s $\color{#35bf28}+1.52\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3028ms 6.1018ms 163.8863 Ops/s 163.8213 Ops/s $\color{#35bf28}+0.04\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.7714ms 0.3466ms 2.8855 KOps/s 3.1267 KOps/s $\textbf{\color{#d91a1a}-7.71\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6283ms 0.3297ms 3.0328 KOps/s 3.1944 KOps/s $\textbf{\color{#d91a1a}-5.06\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0963ms 5.8669ms 170.4484 Ops/s 172.1881 Ops/s $\color{#d91a1a}-1.01\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0310ms 0.3505ms 2.8531 KOps/s 2.9835 KOps/s $\color{#d91a1a}-4.37\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6762ms 0.3239ms 3.0876 KOps/s 2.6879 KOps/s $\textbf{\color{#35bf28}+14.87\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6291ms 1.3730ms 728.3142 Ops/s 757.4304 Ops/s $\color{#d91a1a}-3.84\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5728ms 1.3381ms 747.3514 Ops/s 723.9300 Ops/s $\color{#35bf28}+3.24\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.1301ms 6.1958ms 161.3988 Ops/s 166.2646 Ops/s $\color{#d91a1a}-2.93\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1254ms 0.4674ms 2.1394 KOps/s 1.8436 KOps/s $\textbf{\color{#35bf28}+16.04\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8532ms 0.4415ms 2.2648 KOps/s 1.9271 KOps/s $\textbf{\color{#35bf28}+17.53\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9695ms 5.8853ms 169.9162 Ops/s 171.2749 Ops/s $\color{#d91a1a}-0.79\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.7188ms 0.3383ms 2.9556 KOps/s 3.2083 KOps/s $\textbf{\color{#d91a1a}-7.88\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5723ms 0.3379ms 2.9595 KOps/s 3.0070 KOps/s $\color{#d91a1a}-1.58\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1122ms 5.8063ms 172.2280 Ops/s 171.8255 Ops/s $\color{#35bf28}+0.23\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0519ms 0.3340ms 2.9944 KOps/s 3.5086 KOps/s $\textbf{\color{#d91a1a}-14.66\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4834ms 0.2783ms 3.5934 KOps/s 3.7519 KOps/s $\color{#d91a1a}-4.23\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2538ms 6.0161ms 166.2213 Ops/s 166.8205 Ops/s $\color{#d91a1a}-0.36\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8130ms 0.4814ms 2.0771 KOps/s 2.0306 KOps/s $\color{#35bf28}+2.29\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8426ms 0.4574ms 2.1864 KOps/s 2.0963 KOps/s $\color{#35bf28}+4.30\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3536ms 4.9479ms 202.1050 Ops/s 57.6747 Ops/s $\textbf{\color{#35bf28}+250.42\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.3855ms 2.0668ms 483.8384 Ops/s 507.8408 Ops/s $\color{#d91a1a}-4.73\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.1001ms 0.8947ms 1.1177 KOps/s 813.7066 Ops/s $\textbf{\color{#35bf28}+37.36\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5197s 15.4070ms 64.9054 Ops/s 197.8571 Ops/s $\textbf{\color{#d91a1a}-67.20\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 7.2794ms 1.9561ms 511.2244 Ops/s 474.8699 Ops/s $\textbf{\color{#35bf28}+7.66\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 10.9920ms 1.2768ms 783.2338 Ops/s 861.9290 Ops/s $\textbf{\color{#d91a1a}-9.13\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.7634ms 5.2174ms 191.6664 Ops/s 187.4113 Ops/s $\color{#35bf28}+2.27\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.8676ms 2.0428ms 489.5236 Ops/s 75.5616 Ops/s $\textbf{\color{#35bf28}+547.85\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.2167ms 1.0795ms 926.3354 Ops/s 919.4630 Ops/s $\color{#35bf28}+0.75\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 38.2428ms 36.5261ms 27.3777 Ops/s 27.3976 Ops/s $\color{#d91a1a}-0.07\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.3084ms 18.5369ms 53.9464 Ops/s 54.1727 Ops/s $\color{#d91a1a}-0.42\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 41.9362ms 37.8741ms 26.4033 Ops/s 26.7128 Ops/s $\color{#d91a1a}-1.16\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.6333ms 18.9445ms 52.7858 Ops/s 53.2730 Ops/s $\color{#d91a1a}-0.91\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.4222ms 40.0446ms 24.9722 Ops/s 25.5252 Ops/s $\color{#d91a1a}-2.17\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.4390ms 20.3014ms 49.2578 Ops/s 49.9747 Ops/s $\color{#d91a1a}-1.43\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8864ms 0.2208ms 4.5289 KOps/s 4.5930 KOps/s $\color{#d91a1a}-1.39\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7912ms 1.4152ms 706.5938 Ops/s 723.6730 Ops/s $\color{#d91a1a}-2.36\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.4846ms 2.3198ms 431.0799 Ops/s 421.0712 Ops/s $\color{#35bf28}+2.38\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1372ms 2.9332ms 340.9283 Ops/s 342.5039 Ops/s $\color{#d91a1a}-0.46\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2217ms 0.1402ms 7.1302 KOps/s 7.4652 KOps/s $\color{#d91a1a}-4.49\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3415ms 0.1805ms 5.5388 KOps/s 5.5075 KOps/s $\color{#35bf28}+0.57\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9190ms 1.7815ms 561.3092 Ops/s 554.9994 Ops/s $\color{#35bf28}+1.14\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5160ms 1.3460ms 742.9452 Ops/s 756.2999 Ops/s $\color{#d91a1a}-1.77\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2723ms 1.1355ms 880.6319 Ops/s 902.3457 Ops/s $\color{#d91a1a}-2.41\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8077ms 3.5875ms 278.7469 Ops/s 285.7362 Ops/s $\color{#d91a1a}-2.45\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.8826ms 5.6679ms 176.4335 Ops/s 172.8745 Ops/s $\color{#35bf28}+2.06\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.5767ms 7.3328ms 136.3731 Ops/s 138.6792 Ops/s $\color{#d91a1a}-1.66\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4339ms 0.2772ms 3.6074 KOps/s 3.6289 KOps/s $\color{#d91a1a}-0.59\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7396ms 1.5279ms 654.4769 Ops/s 671.6455 Ops/s $\color{#d91a1a}-2.56\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6659ms 2.4674ms 405.2875 Ops/s 400.4949 Ops/s $\color{#35bf28}+1.20\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3292ms 3.1411ms 318.3576 Ops/s 320.5882 Ops/s $\color{#d91a1a}-0.70\%$
test_collector_without_rb[100-img_shape0-atari] 35.4735ms 34.5200ms 28.9687 Ops/s 29.2869 Ops/s $\color{#d91a1a}-1.09\%$
test_collector_without_rb[200-img_shape1-large_batch] 68.6543ms 67.9917ms 14.7077 Ops/s 14.8069 Ops/s $\color{#d91a1a}-0.67\%$
test_collector_with_rb[100-img_shape0-atari] 40.2360ms 39.2800ms 25.4583 Ops/s 25.7892 Ops/s $\color{#d91a1a}-1.28\%$
test_collector_with_rb[200-img_shape1-large_batch] 77.0070ms 76.3268ms 13.1016 Ops/s 13.0740 Ops/s $\color{#35bf28}+0.21\%$

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}26$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 83.3901μs 82.1304μs 12.1758 KOps/s 12.1174 KOps/s $\color{#35bf28}+0.48\%$
test_tensor_to_bytestream_speed[torch.save] 0.1459ms 0.1433ms 6.9799 KOps/s 7.1534 KOps/s $\color{#d91a1a}-2.43\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1129s 0.1105s 9.0471 Ops/s 8.9686 Ops/s $\color{#35bf28}+0.88\%$
test_tensor_to_bytestream_speed[numpy] 2.6846μs 2.6821μs 372.8414 KOps/s 386.1623 KOps/s $\color{#d91a1a}-3.45\%$
test_tensor_to_bytestream_speed[safetensors] 38.6420μs 38.2161μs 26.1670 KOps/s 25.4016 KOps/s $\color{#35bf28}+3.01\%$
test_simple 0.8094s 0.7969s 1.2548 Ops/s 1.2246 Ops/s $\color{#35bf28}+2.47\%$
test_transformed 1.5478s 1.4531s 0.6882 Ops/s 0.6889 Ops/s $\color{#d91a1a}-0.10\%$
test_serial 2.4553s 2.3507s 0.4254 Ops/s 0.4315 Ops/s $\color{#d91a1a}-1.41\%$
test_parallel 1.9378s 1.8336s 0.5454 Ops/s 0.5594 Ops/s $\color{#d91a1a}-2.50\%$
test_step_mdp_speed[True-True-True-True-True] 0.2920ms 45.9434μs 21.7659 KOps/s 22.9598 KOps/s $\textbf{\color{#d91a1a}-5.20\%}$
test_step_mdp_speed[True-True-True-True-False] 59.3710μs 26.7569μs 37.3735 KOps/s 40.2417 KOps/s $\textbf{\color{#d91a1a}-7.13\%}$
test_step_mdp_speed[True-True-True-False-True] 69.0810μs 27.7016μs 36.0990 KOps/s 41.2409 KOps/s $\textbf{\color{#d91a1a}-12.47\%}$
test_step_mdp_speed[True-True-True-False-False] 55.4610μs 15.4360μs 64.7838 KOps/s 72.7343 KOps/s $\textbf{\color{#d91a1a}-10.93\%}$
test_step_mdp_speed[True-True-False-True-True] 98.2220μs 50.7368μs 19.7096 KOps/s 21.2441 KOps/s $\textbf{\color{#d91a1a}-7.22\%}$
test_step_mdp_speed[True-True-False-True-False] 65.8910μs 29.8539μs 33.4964 KOps/s 36.9442 KOps/s $\textbf{\color{#d91a1a}-9.33\%}$
test_step_mdp_speed[True-True-False-False-True] 77.6710μs 29.9795μs 33.3562 KOps/s 36.7128 KOps/s $\textbf{\color{#d91a1a}-9.14\%}$
test_step_mdp_speed[True-True-False-False-False] 49.8410μs 17.3663μs 57.5829 KOps/s 60.5743 KOps/s $\color{#d91a1a}-4.94\%$
test_step_mdp_speed[True-False-True-True-True] 96.2320μs 51.8533μs 19.2852 KOps/s 20.1089 KOps/s $\color{#d91a1a}-4.10\%$
test_step_mdp_speed[True-False-True-True-False] 70.7110μs 31.1665μs 32.0858 KOps/s 33.0069 KOps/s $\color{#d91a1a}-2.79\%$
test_step_mdp_speed[True-False-True-False-True] 72.1710μs 30.2844μs 33.0203 KOps/s 36.5667 KOps/s $\textbf{\color{#d91a1a}-9.70\%}$
test_step_mdp_speed[True-False-True-False-False] 55.9610μs 17.3473μs 57.6458 KOps/s 60.7066 KOps/s $\textbf{\color{#d91a1a}-5.04\%}$
test_step_mdp_speed[True-False-False-True-True] 96.5410μs 55.7850μs 17.9260 KOps/s 18.9857 KOps/s $\textbf{\color{#d91a1a}-5.58\%}$
test_step_mdp_speed[True-False-False-True-False] 91.6310μs 34.0875μs 29.3363 KOps/s 30.5064 KOps/s $\color{#d91a1a}-3.84\%$
test_step_mdp_speed[True-False-False-False-True] 78.9310μs 32.5479μs 30.7240 KOps/s 33.8877 KOps/s $\textbf{\color{#d91a1a}-9.34\%}$
test_step_mdp_speed[True-False-False-False-False] 53.5110μs 20.4833μs 48.8202 KOps/s 52.1178 KOps/s $\textbf{\color{#d91a1a}-6.33\%}$
test_step_mdp_speed[False-True-True-True-True] 93.4720μs 51.9464μs 19.2506 KOps/s 19.9794 KOps/s $\color{#d91a1a}-3.65\%$
test_step_mdp_speed[False-True-True-True-False] 73.9610μs 31.6654μs 31.5802 KOps/s 32.5241 KOps/s $\color{#d91a1a}-2.90\%$
test_step_mdp_speed[False-True-True-False-True] 2.2124ms 33.8917μs 29.5057 KOps/s 31.8018 KOps/s $\textbf{\color{#d91a1a}-7.22\%}$
test_step_mdp_speed[False-True-True-False-False] 61.9910μs 19.0898μs 52.3841 KOps/s 55.1736 KOps/s $\textbf{\color{#d91a1a}-5.06\%}$
test_step_mdp_speed[False-True-False-True-True] 99.4920μs 54.8228μs 18.2406 KOps/s 18.6754 KOps/s $\color{#d91a1a}-2.33\%$
test_step_mdp_speed[False-True-False-True-False] 85.1910μs 34.2082μs 29.2328 KOps/s 29.9187 KOps/s $\color{#d91a1a}-2.29\%$
test_step_mdp_speed[False-True-False-False-True] 79.5810μs 35.4132μs 28.2381 KOps/s 29.7675 KOps/s $\textbf{\color{#d91a1a}-5.14\%}$
test_step_mdp_speed[False-True-False-False-False] 58.6010μs 21.7067μs 46.0687 KOps/s 48.1553 KOps/s $\color{#d91a1a}-4.33\%$
test_step_mdp_speed[False-False-True-True-True] 0.1051ms 57.0572μs 17.5263 KOps/s 18.0189 KOps/s $\color{#d91a1a}-2.73\%$
test_step_mdp_speed[False-False-True-True-False] 0.1090ms 36.5707μs 27.3443 KOps/s 27.6662 KOps/s $\color{#d91a1a}-1.16\%$
test_step_mdp_speed[False-False-True-False-True] 76.1510μs 35.2295μs 28.3853 KOps/s 29.5687 KOps/s $\color{#d91a1a}-4.00\%$
test_step_mdp_speed[False-False-True-False-False] 53.6010μs 21.1352μs 47.3145 KOps/s 48.4825 KOps/s $\color{#d91a1a}-2.41\%$
test_step_mdp_speed[False-False-False-True-True] 0.1036ms 57.9856μs 17.2457 KOps/s 17.5320 KOps/s $\color{#d91a1a}-1.63\%$
test_step_mdp_speed[False-False-False-True-False] 78.1110μs 39.3857μs 25.3899 KOps/s 25.8471 KOps/s $\color{#d91a1a}-1.77\%$
test_step_mdp_speed[False-False-False-False-True] 0.1149ms 36.7539μs 27.2080 KOps/s 27.6898 KOps/s $\color{#d91a1a}-1.74\%$
test_step_mdp_speed[False-False-False-False-False] 58.9210μs 23.5483μs 42.4659 KOps/s 43.0791 KOps/s $\color{#d91a1a}-1.42\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8917s 0.8006s 1.2491 Ops/s 1.2912 Ops/s $\color{#d91a1a}-3.26\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7372s 0.6480s 1.5432 Ops/s 1.5713 Ops/s $\color{#d91a1a}-1.79\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7950s 1.7222s 0.5807 Ops/s 0.5940 Ops/s $\color{#d91a1a}-2.25\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.6069s 1.4982s 0.6674 Ops/s 0.6834 Ops/s $\color{#d91a1a}-2.33\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0503s 1.9745s 0.5065 Ops/s 0.5117 Ops/s $\color{#d91a1a}-1.03\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.8599s 1.7745s 0.5635 Ops/s 0.5858 Ops/s $\color{#d91a1a}-3.80\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6750s 4.6258s 0.2162 Ops/s 0.2133 Ops/s $\color{#35bf28}+1.33\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5221s 4.4293s 0.2258 Ops/s 0.2261 Ops/s $\color{#d91a1a}-0.14\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9984s 1.9124s 0.5229 Ops/s 0.5277 Ops/s $\color{#d91a1a}-0.92\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7275s 1.6163s 0.6187 Ops/s 0.6241 Ops/s $\color{#d91a1a}-0.87\%$
test_values[generalized_advantage_estimate-True-True] 22.0253ms 21.4814ms 46.5519 Ops/s 49.1741 Ops/s $\textbf{\color{#d91a1a}-5.33\%}$
test_values[vec_generalized_advantage_estimate-True-True] 0.1256s 3.4345ms 291.1648 Ops/s 278.6910 Ops/s $\color{#35bf28}+4.48\%$
test_values[td0_return_estimate-False-False] 0.1112ms 84.2399μs 11.8709 KOps/s 11.9303 KOps/s $\color{#d91a1a}-0.50\%$
test_values[td1_return_estimate-False-False] 52.0021ms 50.7898ms 19.6890 Ops/s 20.4149 Ops/s $\color{#d91a1a}-3.56\%$
test_values[vec_td1_return_estimate-False-False] 1.3171ms 1.0908ms 916.7956 Ops/s 918.5752 Ops/s $\color{#d91a1a}-0.19\%$
test_values[td_lambda_return_estimate-True-False] 84.5117ms 83.2585ms 12.0108 Ops/s 12.4889 Ops/s $\color{#d91a1a}-3.83\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3094ms 1.1027ms 906.8574 Ops/s 923.2966 Ops/s $\color{#d91a1a}-1.78\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.3545ms 21.9552ms 45.5474 Ops/s 48.5765 Ops/s $\textbf{\color{#d91a1a}-6.24\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0388ms 0.7587ms 1.3180 KOps/s 1.3129 KOps/s $\color{#35bf28}+0.39\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8674ms 0.7061ms 1.4162 KOps/s 1.4722 KOps/s $\color{#d91a1a}-3.80\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.6187ms 1.5064ms 663.8208 Ops/s 670.6609 Ops/s $\color{#d91a1a}-1.02\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7529ms 0.6975ms 1.4337 KOps/s 1.4371 KOps/s $\color{#d91a1a}-0.24\%$
test_dqn_speed[False-None] 1.7041ms 1.5472ms 646.3451 Ops/s 649.6435 Ops/s $\color{#d91a1a}-0.51\%$
test_dqn_speed[False-backward] 2.3802ms 2.1709ms 460.6458 Ops/s 460.3754 Ops/s $\color{#35bf28}+0.06\%$
test_dqn_speed[True-None] 1.0536ms 0.5706ms 1.7525 KOps/s 1.6742 KOps/s $\color{#35bf28}+4.68\%$
test_dqn_speed[True-backward] 1.1467ms 1.1056ms 904.4799 Ops/s 825.0757 Ops/s $\textbf{\color{#35bf28}+9.62\%}$
test_dqn_speed[reduce-overhead-None] 0.7038ms 0.6206ms 1.6114 KOps/s 1.6010 KOps/s $\color{#35bf28}+0.65\%$
test_ddpg_speed[False-None] 3.2902ms 2.9663ms 337.1202 Ops/s 343.6536 Ops/s $\color{#d91a1a}-1.90\%$
test_ddpg_speed[False-backward] 4.5528ms 4.1504ms 240.9431 Ops/s 233.5103 Ops/s $\color{#35bf28}+3.18\%$
test_ddpg_speed[True-None] 1.4697ms 1.3341ms 749.5615 Ops/s 748.2629 Ops/s $\color{#35bf28}+0.17\%$
test_ddpg_speed[True-backward] 2.5604ms 2.3965ms 417.2786 Ops/s 393.8400 Ops/s $\textbf{\color{#35bf28}+5.95\%}$
test_ddpg_speed[reduce-overhead-None] 1.4862ms 1.3723ms 728.7253 Ops/s 735.0869 Ops/s $\color{#d91a1a}-0.87\%$
test_sac_speed[False-None] 8.9377ms 8.3842ms 119.2723 Ops/s 121.0274 Ops/s $\color{#d91a1a}-1.45\%$
test_sac_speed[False-backward] 11.9099ms 11.2920ms 88.5585 Ops/s 87.4422 Ops/s $\color{#35bf28}+1.28\%$
test_sac_speed[True-None] 2.0029ms 1.8395ms 543.6170 Ops/s 518.7265 Ops/s $\color{#35bf28}+4.80\%$
test_sac_speed[True-backward] 4.0438ms 3.6400ms 274.7287 Ops/s 273.9519 Ops/s $\color{#35bf28}+0.28\%$
test_sac_speed[reduce-overhead-None] 19.4361ms 10.8105ms 92.5024 Ops/s 82.3908 Ops/s $\textbf{\color{#35bf28}+12.27\%}$
test_redq_deprec_speed[False-None] 9.8169ms 9.2927ms 107.6114 Ops/s 107.6450 Ops/s $\color{#d91a1a}-0.03\%$
test_redq_deprec_speed[False-backward] 13.1573ms 12.5969ms 79.3849 Ops/s 79.2559 Ops/s $\color{#35bf28}+0.16\%$
test_redq_deprec_speed[True-None] 3.0185ms 2.5384ms 393.9556 Ops/s 378.2631 Ops/s $\color{#35bf28}+4.15\%$
test_redq_deprec_speed[True-backward] 4.7879ms 4.3661ms 229.0357 Ops/s 238.4026 Ops/s $\color{#d91a1a}-3.93\%$
test_redq_deprec_speed[reduce-overhead-None] 15.7665ms 9.7917ms 102.1278 Ops/s 101.7915 Ops/s $\color{#35bf28}+0.33\%$
test_td3_speed[False-None] 47.6986ms 8.5883ms 116.4378 Ops/s 121.1402 Ops/s $\color{#d91a1a}-3.88\%$
test_td3_speed[False-backward] 11.6227ms 10.8133ms 92.4786 Ops/s 92.6127 Ops/s $\color{#d91a1a}-0.14\%$
test_td3_speed[True-None] 1.8041ms 1.7405ms 574.5575 Ops/s 607.7198 Ops/s $\textbf{\color{#d91a1a}-5.46\%}$
test_td3_speed[True-backward] 3.3791ms 3.2934ms 303.6384 Ops/s 303.3251 Ops/s $\color{#35bf28}+0.10\%$
test_td3_speed[reduce-overhead-None] 61.7145ms 24.6218ms 40.6145 Ops/s 41.0374 Ops/s $\color{#d91a1a}-1.03\%$
test_cql_speed[False-None] 18.2252ms 17.3298ms 57.7042 Ops/s 58.0475 Ops/s $\color{#d91a1a}-0.59\%$
test_cql_speed[False-backward] 23.5662ms 22.8809ms 43.7046 Ops/s 30.6999 Ops/s $\textbf{\color{#35bf28}+42.36\%}$
test_cql_speed[True-None] 3.4639ms 3.3063ms 302.4557 Ops/s 303.1505 Ops/s $\color{#d91a1a}-0.23\%$
test_cql_speed[True-backward] 5.9315ms 5.5642ms 179.7217 Ops/s 177.2865 Ops/s $\color{#35bf28}+1.37\%$
test_cql_speed[reduce-overhead-None] 0.6855s 15.3647ms 65.0841 Ops/s 83.9618 Ops/s $\textbf{\color{#d91a1a}-22.48\%}$
test_a2c_speed[False-None] 3.9086ms 3.2454ms 308.1251 Ops/s 305.2437 Ops/s $\color{#35bf28}+0.94\%$
test_a2c_speed[False-backward] 6.7586ms 6.3641ms 157.1318 Ops/s 157.7261 Ops/s $\color{#d91a1a}-0.38\%$
test_a2c_speed[True-None] 1.4368ms 1.3437ms 744.2286 Ops/s 742.5772 Ops/s $\color{#35bf28}+0.22\%$
test_a2c_speed[True-backward] 3.1941ms 3.1419ms 318.2836 Ops/s 333.1554 Ops/s $\color{#d91a1a}-4.46\%$
test_a2c_speed[reduce-overhead-None] 1.1513ms 0.9961ms 1.0040 KOps/s 997.8429 Ops/s $\color{#35bf28}+0.61\%$
test_ppo_speed[False-None] 4.0161ms 3.8605ms 259.0317 Ops/s 258.0381 Ops/s $\color{#35bf28}+0.39\%$
test_ppo_speed[False-backward] 7.5356ms 7.1541ms 139.7801 Ops/s 145.0088 Ops/s $\color{#d91a1a}-3.61\%$
test_ppo_speed[True-None] 1.6841ms 1.4563ms 686.6682 Ops/s 696.7777 Ops/s $\color{#d91a1a}-1.45\%$
test_ppo_speed[True-backward] 3.5405ms 3.2973ms 303.2752 Ops/s 301.4842 Ops/s $\color{#35bf28}+0.59\%$
test_ppo_speed[reduce-overhead-None] 1.4779ms 1.0517ms 950.8275 Ops/s 924.5459 Ops/s $\color{#35bf28}+2.84\%$
test_reinforce_speed[False-None] 2.7361ms 2.2986ms 435.0519 Ops/s 434.9750 Ops/s $\color{#35bf28}+0.02\%$
test_reinforce_speed[False-backward] 3.8311ms 3.4185ms 292.5296 Ops/s 294.4584 Ops/s $\color{#d91a1a}-0.66\%$
test_reinforce_speed[True-None] 1.7445ms 1.3051ms 766.2258 Ops/s 773.7379 Ops/s $\color{#d91a1a}-0.97\%$
test_reinforce_speed[True-backward] 3.1248ms 3.0695ms 325.7908 Ops/s 317.6479 Ops/s $\color{#35bf28}+2.56\%$
test_reinforce_speed[reduce-overhead-None] 0.5711s 10.6680ms 93.7387 Ops/s 106.4299 Ops/s $\textbf{\color{#d91a1a}-11.92\%}$
test_iql_speed[False-None] 10.2864ms 9.4346ms 105.9925 Ops/s 105.7386 Ops/s $\color{#35bf28}+0.24\%$
test_iql_speed[False-backward] 13.7915ms 13.3819ms 74.7279 Ops/s 74.7336 Ops/s $-0.01\%$
test_iql_speed[True-None] 2.6205ms 2.2064ms 453.2288 Ops/s 450.6690 Ops/s $\color{#35bf28}+0.57\%$
test_iql_speed[True-backward] 5.3311ms 4.8963ms 204.2340 Ops/s 201.8753 Ops/s $\color{#35bf28}+1.17\%$
test_iql_speed[reduce-overhead-None] 17.6590ms 10.4587ms 95.6137 Ops/s 75.2150 Ops/s $\textbf{\color{#35bf28}+27.12\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3947ms 5.9221ms 168.8600 Ops/s 165.4683 Ops/s $\color{#35bf28}+2.05\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8274ms 0.2859ms 3.4975 KOps/s 3.5285 KOps/s $\color{#d91a1a}-0.88\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5004ms 0.2679ms 3.7322 KOps/s 3.7556 KOps/s $\color{#d91a1a}-0.62\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1536ms 5.6629ms 176.5895 Ops/s 171.2533 Ops/s $\color{#35bf28}+3.12\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.2617ms 0.3062ms 3.2662 KOps/s 3.2958 KOps/s $\color{#d91a1a}-0.90\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5386ms 0.3277ms 3.0513 KOps/s 3.2465 KOps/s $\textbf{\color{#d91a1a}-6.01\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7956ms 1.2802ms 781.1041 Ops/s 798.1099 Ops/s $\color{#d91a1a}-2.13\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5987ms 1.1754ms 850.7477 Ops/s 842.5013 Ops/s $\color{#35bf28}+0.98\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2880ms 5.8340ms 171.4085 Ops/s 165.2467 Ops/s $\color{#35bf28}+3.73\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0205ms 0.4351ms 2.2982 KOps/s 2.3115 KOps/s $\color{#d91a1a}-0.58\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8570ms 0.4136ms 2.4177 KOps/s 1.9157 KOps/s $\textbf{\color{#35bf28}+26.20\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1959ms 5.7891ms 172.7372 Ops/s 169.2744 Ops/s $\color{#35bf28}+2.05\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.1399ms 0.2868ms 3.4872 KOps/s 2.8300 KOps/s $\textbf{\color{#35bf28}+23.22\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4809ms 0.2694ms 3.7114 KOps/s 2.9800 KOps/s $\textbf{\color{#35bf28}+24.54\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.3614ms 5.7748ms 173.1674 Ops/s 171.7937 Ops/s $\color{#35bf28}+0.80\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1085ms 0.2845ms 3.5150 KOps/s 3.1034 KOps/s $\textbf{\color{#35bf28}+13.26\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4640ms 0.2755ms 3.6301 KOps/s 3.2973 KOps/s $\textbf{\color{#35bf28}+10.09\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1405ms 5.9785ms 167.2672 Ops/s 166.1122 Ops/s $\color{#35bf28}+0.70\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1860ms 0.4439ms 2.2526 KOps/s 651.5792 Ops/s $\textbf{\color{#35bf28}+245.72\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6156ms 0.4278ms 2.3374 KOps/s 2.1723 KOps/s $\textbf{\color{#35bf28}+7.60\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3896ms 4.9519ms 201.9420 Ops/s 197.6712 Ops/s $\color{#35bf28}+2.16\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.4183ms 2.1155ms 472.7009 Ops/s 509.4766 Ops/s $\textbf{\color{#d91a1a}-7.22\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.5744ms 0.9879ms 1.0123 KOps/s 1.0493 KOps/s $\color{#d91a1a}-3.52\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5903s 16.8055ms 59.5045 Ops/s 196.3271 Ops/s $\textbf{\color{#d91a1a}-69.69\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 10.0593ms 1.9974ms 500.6590 Ops/s 496.7615 Ops/s $\color{#35bf28}+0.78\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 10.1696ms 1.3348ms 749.1814 Ops/s 1.0401 KOps/s $\textbf{\color{#d91a1a}-27.97\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.1843ms 5.2340ms 191.0584 Ops/s 51.7900 Ops/s $\textbf{\color{#35bf28}+268.91\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.3481ms 2.0908ms 478.2971 Ops/s 496.0294 Ops/s $\color{#d91a1a}-3.57\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.1753ms 1.1833ms 845.1254 Ops/s 878.3999 Ops/s $\color{#d91a1a}-3.79\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.8482ms 35.8719ms 27.8770 Ops/s 27.5673 Ops/s $\color{#35bf28}+1.12\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.9941ms 18.3848ms 54.3928 Ops/s 55.1962 Ops/s $\color{#d91a1a}-1.46\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 42.4274ms 37.1638ms 26.9079 Ops/s 26.5029 Ops/s $\color{#35bf28}+1.53\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 0.5527s 29.3649ms 34.0543 Ops/s 54.1843 Ops/s $\textbf{\color{#d91a1a}-37.15\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.8875ms 39.0580ms 25.6029 Ops/s 25.2755 Ops/s $\color{#35bf28}+1.30\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.4856ms 19.9496ms 50.1263 Ops/s 50.4413 Ops/s $\color{#d91a1a}-0.62\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8789ms 0.2258ms 4.4280 KOps/s 4.5522 KOps/s $\color{#d91a1a}-2.73\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.8347ms 1.3535ms 738.8081 Ops/s 686.3056 Ops/s $\textbf{\color{#35bf28}+7.65\%}$
test_storage_write_lazystack[100-img_shape2-large_img] 2.9824ms 2.3704ms 421.8714 Ops/s 431.3540 Ops/s $\color{#d91a1a}-2.20\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.3994ms 2.9444ms 339.6242 Ops/s 337.7618 Ops/s $\color{#35bf28}+0.55\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2513ms 0.1670ms 5.9890 KOps/s 6.0917 KOps/s $\color{#d91a1a}-1.69\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.6687ms 0.2283ms 4.3803 KOps/s 4.2318 KOps/s $\color{#35bf28}+3.51\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9704ms 1.8577ms 538.2895 Ops/s 534.6035 Ops/s $\color{#35bf28}+0.69\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5954ms 1.4035ms 712.4836 Ops/s 739.4067 Ops/s $\color{#d91a1a}-3.64\%$
test_collector_stack_then_write[50-img_shape0-small] 1.5918ms 1.1654ms 858.0637 Ops/s 871.6496 Ops/s $\color{#d91a1a}-1.56\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7084ms 3.6215ms 276.1310 Ops/s 273.0836 Ops/s $\color{#35bf28}+1.12\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.9718ms 5.8792ms 170.0925 Ops/s 175.1174 Ops/s $\color{#d91a1a}-2.87\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.6591ms 7.4954ms 133.4159 Ops/s 142.1125 Ops/s $\textbf{\color{#d91a1a}-6.12\%}$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4600ms 0.2717ms 3.6806 KOps/s 3.6133 KOps/s $\color{#35bf28}+1.86\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6904ms 1.4646ms 682.7867 Ops/s 637.4237 Ops/s $\textbf{\color{#35bf28}+7.12\%}$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6728ms 2.4971ms 400.4624 Ops/s 409.7371 Ops/s $\color{#d91a1a}-2.26\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3224ms 3.1857ms 313.8984 Ops/s 317.6811 Ops/s $\color{#d91a1a}-1.19\%$
test_collector_without_rb[100-img_shape0-atari] 35.1205ms 34.3045ms 29.1507 Ops/s 29.3452 Ops/s $\color{#d91a1a}-0.66\%$
test_collector_without_rb[200-img_shape1-large_batch] 69.6800ms 68.1851ms 14.6660 Ops/s 14.9899 Ops/s $\color{#d91a1a}-2.16\%$
test_collector_with_rb[100-img_shape0-atari] 39.9600ms 39.2535ms 25.4755 Ops/s 25.8875 Ops/s $\color{#d91a1a}-1.59\%$
test_collector_with_rb[200-img_shape1-large_batch] 78.7476ms 77.2681ms 12.9419 Ops/s 13.3243 Ops/s $\color{#d91a1a}-2.87\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 59.9985ms 59.4346ms 16.8252 Ops/s 17.1584 Ops/s $\color{#d91a1a}-1.94\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1186s 0.1175s 8.5112 Ops/s 8.7291 Ops/s $\color{#d91a1a}-2.50\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 61.9554ms 60.9052ms 16.4190 Ops/s 16.9203 Ops/s $\color{#d91a1a}-2.96\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1233s 0.1224s 8.1697 Ops/s 8.5299 Ops/s $\color{#d91a1a}-4.22\%$

vmoens and others added 3 commits February 7, 2026 15:38
Replace multiprocessing.Event (futex-based syscalls) with
multiprocessing.RawArray shared-memory byte flags for worker-to-parent
completion signaling on the hot path (step_and_maybe_reset).

- _start_workers: creates shm_done_flags RawArray, passes to workers
- _wait_for_workers: spin-polls done_flags instead of Event.wait()
- Worker: _signal_done() closure writes shm_done_flags[idx]=1
- _shutdown_workers: uses _wait_for_workers instead of Event.wait()

Measured impact:
- 10% FPS improvement (7,737 -> 8,509 fps) on H200 with 8 workers
- 28% reduction in penv.wait_for_workers overhead (2,622us -> 1,891us)
- ParallelEnv.close() fixed from 80s timeout to ~0.9s

Co-authored-by: Cursor <[email protected]>
ghstack-source-id: f29522a
Pull-Request: #3457
Co-authored-by: Cursor <[email protected]>
Optimise the output-reading phase of step_and_maybe_reset when shared
memory and target device are both known and different (the common
CPU-shared -> CUDA case).

- When shared_device is not None and shared_device != device: use a
  single td.to(device) instead of _fast_apply with per-tensor check.
  Since .to() already creates new tensors, the extra .clone() is
  unnecessary.
- Keep the _fast_apply fallback for the mixed-device case.
- Move _sync_w2m() into a conditional - only called when a cross-device
  transfer actually happened.

Co-authored-by: Cursor <[email protected]>
ghstack-source-id: 07aba16
Pull-Request: #3458
Add two configuration knobs to the Dreamer example:

1. env.parallel_env_mode ("parallel" | "serial"): switches the train
   environment between ParallelEnv (uses IPC) and SerialEnv (no IPC
   overhead, better for cheap envs or when GPU contention between
   collector workers degrades throughput).

2. collector.compile block (enabled, backend, cudagraphs): passes
   compilation settings to MultiCollector via compile_policy and
   cudagraph_policy kwargs, enabling torch.compile + CUDA graphs for
   the policy in collector workers.

Parallel mode remains the default.

Co-authored-by: Cursor <[email protected]>
@vmoens vmoens force-pushed the gh/vmoens/220/head branch from 3dce31f to 3821082 Compare February 7, 2026 15:45
@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2026

⚠️ PR Title Label Error

Unknown or invalid prefix [Example].

Current title: [Example] Dreamer: SerialEnv mode and collector compile config

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2026

⚠️ PR Title Label Error

Unknown or invalid prefix [Example].

Current title: [Example] Dreamer: SerialEnv mode and collector compile config

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

vmoens added a commit that referenced this pull request Feb 8, 2026
Add two configuration knobs to the Dreamer example:

1. env.parallel_env_mode ("parallel" | "serial"): switches the train
   environment between ParallelEnv (uses IPC) and SerialEnv (no IPC
   overhead, better for cheap envs or debugging).

2. collector.compile block (enabled, backend, cudagraphs): passes
   compilation settings to MultiCollector via compile_policy and
   cudagraph_policy kwargs, enabling torch.compile + CUDA graphs for
   the policy in collector workers.

Co-authored-by: Cursor <[email protected]>
ghstack-source-id: 4876b37
Pull-Request: #3459
@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2026

Rebasing gh/vmoens/220/head onto main (requested by @vmoens).

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2026

Rebase failed.

Rebasing (1/1)
error: could not apply 382108239... [Example] Dreamer: SerialEnv mode and collector compile config
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 382108239... # [Example] Dreamer: SerialEnv mode and collector compile config

@vmoens
Copy link
Collaborator Author

vmoens commented Feb 8, 2026

@torchrlbot rebase

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2026

Rebasing gh/vmoens/220/head onto main (requested by @vmoens).

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2026

Rebase failed.

Rebasing (1/1)
error: could not apply 382108239... [Example] Dreamer: SerialEnv mode and collector compile config
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 382108239... # [Example] Dreamer: SerialEnv mode and collector compile config

@vmoens vmoens closed this Feb 8, 2026
@vmoens vmoens deleted the gh/vmoens/220/head branch February 8, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. sota-implementations/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant