add Temp Scheduler #1624

hallerite · 2026-01-20T18:55:34Z

Note

Introduces scheduled sampling temperature and ensures it flows end-to-end from rollout to training.

Config/Changelog: Adds sampling.temperature_schedule (constant/linear/cosine) with validation; documents in CHANGELOG.md
Runtime: Computes step-wise temperature (compute_temperature) and injects into get_sampling_args; logs sampling/temperature
Workers/Scheduler: EnvWorker now receives sampling_args and temperature; responses tag rollouts with temperature; Scheduler.set_sampling_args(...) updates args and temperature used for requests
Transport/Trajectories: TrainingSample gains optional temperature; interleave_rollout/branch_rollout attach it
Batching/Packer: prepare_batch now takes temperatures list; packing sorts and groups microbatches by temperature to avoid mixing; Single/Multi packers propagate and group by temperature
Trainer: Scales logits by microbatch temperature when computing logprobs/entropy
Tests: Update/add unit tests for batching by temperature and EnvWorker API

^{Written by Cursor Bugbot for commit 9ffb15e. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

src/prime_rl/orchestrator/config.py

Jackmin801 · 2026-01-20T20:52:31Z

hey @hallerite! nice PR, excited to be supporting temperature scheduling in the orchestrator. some quick comments:

You're probably going to have resolve a lot of merge conflicts when [Refactor] rename Runs -> MultiRunManager and cleanup hooks #1619 merges soon. Sorry about that. Its mostly renaming stuff so I would think the agents are capable enough to figure out the merge conflict resolution but yea sorry for the trouble.
Im not sure if grouping by temperature in the packer is necessary atm. We already group by run and implicitly by step (we never pack across step boundary for a run in a step) and so we will never be in a case where the prepare_batch loop sees any unique run having different temps in the same batch.

Though with async I guess we can have multiple sampling temperatures within a batch for a run with max unique temps <= max async. But in that case, I would think the better solution is to make the loss support passing a tensor of temperatures rather than packing such that similar temperatures are together.

src/prime_rl/orchestrator/config.py

samsja · 2026-01-21T04:28:27Z

src/prime_rl/orchestrator/env_worker.py

                    self._responses_since_restart += 1
                    if self._responses_since_restart >= 10 and self._restart_count > 0:
-                        logger.debug(f"Worker '{self.worker_name}' stable after {self._responses_since_restart} responses, resetting restart count")
+                        logger.debug(


not sure what is up with your ruff , did you used the pre-commit hook ?

hallerite added 3 commits January 20, 2026 00:26

add temp schedule

8ceca2f

propagate rollout temperature; split microbatches by temp

4f36096

Merge branch 'main' into temp_sched

dc1ef85

hallerite marked this pull request as draft January 20, 2026 19:02

cursor bot reviewed Jan 20, 2026

View reviewed changes

src/prime_rl/orchestrator/config.py Show resolved Hide resolved

hallerite added 2 commits January 20, 2026 19:06

fix tests

12f0600

adjust changelog; delete temp file

9ffb15e

hallerite marked this pull request as ready for review January 20, 2026 19:26

hallerite marked this pull request as draft January 21, 2026 00:46

Merge branch 'main' into temp_sched

1f063d1

samsja reviewed Jan 21, 2026

View reviewed changes

src/prime_rl/orchestrator/config.py Outdated Show resolved Hide resolved

samsja reviewed Jan 21, 2026

View reviewed changes

hallerite added 4 commits January 21, 2026 11:52

fixformatting

c8fe00b

derive temps from step directly

0d0535e

rename to temp_scheduler

7f53cef

fix import

0586fbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add Temp Scheduler #1624

add Temp Scheduler #1624

Uh oh!

hallerite commented Jan 20, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Jackmin801 commented Jan 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

samsja Jan 21, 2026

Uh oh!

hallerite Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add Temp Scheduler #1624

Are you sure you want to change the base?

add Temp Scheduler #1624

Uh oh!

Conversation

hallerite commented Jan 20, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jackmin801 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

samsja Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

hallerite Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hallerite commented Jan 20, 2026 •

edited by cursor bot

Loading

Jackmin801 commented Jan 20, 2026 •

edited

Loading