File tree Expand file tree Collapse file tree 2 files changed +1
-103
lines changed Expand file tree Collapse file tree 2 files changed +1
-103
lines changed Original file line number Diff line number Diff line change @@ -107,20 +107,7 @@ Check these metrics in Weights & Biases:
107107- ` reward/evaluate_response/avg_MathReward_reward ` - should stay reasonably high
108108- ` reward/evaluate_response/avg_ThinkingReward_reward ` - should increase quickly
109109
110- ### 5. Quick Debug Test
111-
112- Run the debug script to verify the reward function works:
113- ``` bash
114- python sandbox/grpo_language/debug_reward.py
115- ```
116-
117- Expected output:
118- - Japanese text → reward 1.0
119- - English text → reward 0.0
120- - Multiple Japanese blocks → reward 0.5
121- - No blocks but Japanese response → reward 0.2
122-
123- ### 6. Alternative: Start with English, then transition
110+ ### 5. Alternative: Start with English, then transition
124111
125112If Japanese isn't working, you could:
126113
Load Diff This file was deleted.
You can’t perform that action at this time.
0 commit comments