Add support for dynamically setting the number of steps for GRPO. by niting · Pull Request #1257 · google/tunix

niting · 2026-03-17T22:51:27Z

The existing implementation requires these to be specified by the user. We want users to be able to point to their dataset and the implementation should identify the length of dataset. The dataset length is then used to adjust the number of steps required provided the batch size.

Updates the Qwen script to use the feature.

It's a good idea to open an issue first for discussion.

Reference

Colab Notebook

Checklist

This change has been tested locally by doing a GRPO run and running the Qwen script.

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and all unit tests pass.
I have added all appropriate doc-strings/documentation.
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have signed the Contributor License Agreement.
I have followed Contribution Guidelines.

gemini-code-assist · 2026-03-17T22:51:31Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

wang2yn84 · 2026-03-19T05:17:03Z

tunix/cli/grpo_main.py

          dataset=self.config["dataset_name"],
          tfds_download=self.config["tfds_download"],
      )
+    self.compute_params(len(dataset))


I believe not all the dataset implements len and that's why we might have to rely on the config to provide the accurate length if we don't want to go through the dataset once.

That makes sense; if the len(..) is not implemented, and num_steps is not specified, then we should throw an error, but if dataset.len exists, then we should allow to not specify num_steps?

Pretty sure more things will break if len was not implemented. See tunix/cli/utils/data.py:177 which splits the train and test sets. It would be odd for a dataset to not have that implemented since they are typically just iterator types.

grain supports datasets without len()

Done. I now check if len is available and enforce that max_steps is required when it's not. Note that post_init_dataset in tunix/cli/utils/data.py will still break if len is not available, can fix that separately since that's unrelated to this PR.

Thank you! The fix makes sense to me.

wang2yn84 · 2026-03-19T05:18:00Z

tunix/cli/grpo_main.py

+    train_fraction = self.config.get("train_fraction")
+    if not train_fraction:
+      train_fraction = 0.8
+    if not max_steps:


Shall we check the max_steps against int(num_batches * num_train_epochs * train_fraction) if max_steps is available?

I can, but I was assuming that the user might specify max_steps when they want to really try out the behavior with different steps. I could potentially cap the max_steps to that value or just leave it as is for now. What do you prefer?

tunix/cli/grpo_main.py

wang2yn84 · 2026-03-19T05:22:02Z

examples/rl/grpo/gsm8k/run_qwen3.sh

  rl_training_config.actor_optimizer_config.schedule_type="warmup_cosine_decay_schedule" \
  rl_training_config.actor_optimizer_config.init_value=0.0 \
  rl_training_config.actor_optimizer_config.end_value=0.0 \
-  rl_training_config.actor_optimizer_config.warmup_ratio=$warmup_ratio \


Maybe we should still setup warm_up ratio to 0.1 instead of relying on the default value?

Done. Reverted this change.

wang2yn84 · 2026-03-19T05:25:01Z

examples/rl/grpo/gsm8k/run_qwen3.sh

+batch_size=${batch_size:-8}
 num_train_epochs=${num_train_epochs:-1}
 warmup_ratio=${warmup_ratio:-0.1}
 train_fraction=${train_fraction:-1.0}


I think we should set train_fraction? The default value is 0.8.

The train_fraction was 1.0. I updated it to 0.8.

The existing implementation requires these to be specified by the user. We want users to be able to point to their dataset and the implementation should identify the length of dataset. The dataset length is then used to adjust the number of steps required provided the batch size. Updates the Qwen script to use the feature.

niting requested review from abheesht17, hgao327, jiangyangmu, lc5211, sizhit2, tianshub and wang2yn84 as code owners March 17, 2026 22:51

github-actions bot assigned jiangyangmu Mar 17, 2026

wang2yn84 reviewed Mar 19, 2026

View reviewed changes

niting force-pushed the niting/compute_steps branch 2 times, most recently from 244a257 to 84827a6 Compare March 21, 2026 22:55

niting force-pushed the niting/compute_steps branch from 84827a6 to 9ea04cb Compare March 23, 2026 01:24

Conversation

niting commented Mar 17, 2026

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants