Skip to content

Conversation

@hallerite
Copy link
Contributor

@hallerite hallerite commented Jan 20, 2026

Note

Introduces scheduled sampling temperature and ensures it flows end-to-end from rollout to training.

  • Config/Changelog: Adds sampling.temperature_schedule (constant/linear/cosine) with validation; documents in CHANGELOG.md
  • Runtime: Computes step-wise temperature (compute_temperature) and injects into get_sampling_args; logs sampling/temperature
  • Workers/Scheduler: EnvWorker now receives sampling_args and temperature; responses tag rollouts with temperature; Scheduler.set_sampling_args(...) updates args and temperature used for requests
  • Transport/Trajectories: TrainingSample gains optional temperature; interleave_rollout/branch_rollout attach it
  • Batching/Packer: prepare_batch now takes temperatures list; packing sorts and groups microbatches by temperature to avoid mixing; Single/Multi packers propagate and group by temperature
  • Trainer: Scales logits by microbatch temperature when computing logprobs/entropy
  • Tests: Update/add unit tests for batching by temperature and EnvWorker API

Written by Cursor Bugbot for commit 9ffb15e. This will update automatically on new commits. Configure here.

@hallerite hallerite marked this pull request as draft January 20, 2026 19:02
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@hallerite hallerite marked this pull request as ready for review January 20, 2026 19:26
@Jackmin801
Copy link
Member

Jackmin801 commented Jan 20, 2026

hey @hallerite! nice PR, excited to be supporting temperature scheduling in the orchestrator. some quick comments:

  1. You're probably going to have resolve a lot of merge conflicts when [Refactor] rename Runs -> MultiRunManager and cleanup hooks #1619 merges soon. Sorry about that. Its mostly renaming stuff so I would think the agents are capable enough to figure out the merge conflict resolution but yea sorry for the trouble.

  2. Im not sure if grouping by temperature in the packer is necessary atm. We already group by run and implicitly by step (we never pack across step boundary for a run in a step) and so we will never be in a case where the prepare_batch loop sees any unique run having different temps in the same batch.

Though with async I guess we can have multiple sampling temperatures within a batch for a run with max unique temps <= max async. But in that case, I would think the better solution is to make the loss support passing a tensor of temperatures rather than packing such that similar temperatures are together.

@hallerite hallerite marked this pull request as draft January 21, 2026 00:46
self._responses_since_restart += 1
if self._responses_since_restart >= 10 and self._restart_count > 0:
logger.debug(f"Worker '{self.worker_name}' stable after {self._responses_since_restart} responses, resetting restart count")
logger.debug(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what is up with your ruff , did you used the pre-commit hook ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants