Skip to content

Commit 2cc6c6b

Browse files
committed
kl coeff = 1e-5
1 parent 4318d83 commit 2cc6c6b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

apps/grpo/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ def simple_grpo_loss(
129129
ref_logprobs: torch.Tensor,
130130
advantages: torch.Tensor,
131131
padding_mask: torch.Tensor,
132-
beta: float = 1e-4,
132+
beta: float = 1e-5,
133133
) -> torch.Tensor:
134134
logprobs: torch.Tensor = compute_logprobs(logits, response)
135135
kl = torch.exp(ref_logprobs - logprobs) - (ref_logprobs - logprobs) - 1

0 commit comments

Comments
 (0)