Skip to content

Conversation

DNXie
Copy link
Member

@DNXie DNXie commented Sep 9, 2025

  • Add yaml config for grpo.main
  • Add default values for dataclasses ReplayBuffer and DatasetActor
  • Fix the bug regarding services being passed as part of config.

Test

  1. python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
Generated 10 rollouts w/ average reward 0.05
Completed 10 training steps
Latest loss: 0.054432570934295654
Generated 20 rollouts w/ average reward 0.025
Completed 20 training steps
Latest loss: 0.01553526520729065
Generated 30 rollouts w/ average reward 0.05
Completed 30 training steps
Latest loss: 0.004179835319519043
Generated 40 rollouts w/ average reward 0.025
Completed 40 training steps
Latest loss: 0.0014289617538452148
  1. python -m apps.vllm.main --config apps/vllm/llama3_8b.yaml

Generation Results:
================================================================================
Sample 1:
User: Tell me a joke
Assistant: . I need a laugh.
Here's one: A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?"
...
--------------------------------------------------------------------------------
...

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 9, 2025
Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many comments, but mostly minor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I copied from other main files. But looks like it can run without sys.exit(xx). Removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun - we may want to put all of this in its own "frontend" file called omegaconf_frontend (idk, smth like that) so people can keep track of things they might want to switch out with their own frontend.

@Jack-Khuu
Copy link
Contributor

Thanks for splitting up the PRs

@DNXie DNXie merged commit 2bcf56c into meta-pytorch:main Sep 9, 2025
5 checks passed
@DNXie DNXie deleted the add_grpo_config branch September 10, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants