Skip to content

Conversation

DNXie
Copy link
Member

@DNXie DNXie commented Aug 28, 2025

Added a dp_size dimension in replay buffer sampling to enable data parallel.

Updated GRPO/main.py accordingly

Test:

  1. pytest tests/unit_tests/test_replay_buffer.py
 7 passed, 7 warnings in 23.35s

python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml

All services initialized successfully!
Starting GRPO training loops...
Generated 10 rollouts w/ average reward 0.275
Completed 10 training steps
Latest loss: 0.01575610041618347
Generated 20 rollouts w/ average reward 0.05
Completed 20 training steps
Latest loss: 0.0045821815729141235

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 28, 2025
Copy link
Contributor

@pbontrager pbontrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just need to fix one thing to not return sorted samples

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to return a sorted sample here as that reduces variability in the sample. You need to get the index of the sorted array and then probably do this as a nested for loop to be easier to read.

batch = []
for rank in self.dp_size:
	local_batch = []
	for i in bsz:
		e = sampled_episodes[sort_order[rank*i]]
		local_batch.append(e)
	batch.append(local_batch)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out this issue. I have updated this part. Please review.

@DNXie DNXie requested a review from pbontrager September 10, 2025 17:53
Copy link
Contributor

@pbontrager pbontrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now. There's a few more small things to fix but I'll pre-approve it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cleaner but is it moving the data twice? It's probably fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code to make it more efficient.

@DNXie DNXie merged commit 029b1bc into meta-pytorch:main Sep 10, 2025
5 checks passed
@DNXie DNXie deleted the replay_buffer_dp_size branch September 10, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants