Skip to content

Conversation

allenwang28
Copy link
Contributor

@allenwang28 allenwang28 commented Oct 9, 2025

A reasonable starting point for 32B for perf testing:

  • seq_len = 2048
  • group_size = 16
  • batch_size = 32
  • trainer tp=8
  • 4 policy replicas

Also updates local_batch_size for the MAST configs

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 9, 2025

# Main loop configuration
rollout_threads: 1 # Recommended to set equal to policy.num_replicas
rollout_threads: 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we could double this. I made rollout_threads to 4 with 1 policy replica. And the waiting for buffer time went from 300s to 60s.

@allenwang28 allenwang28 requested a review from LucasLLC October 10, 2025 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants