Add debug logging and troubleshooting guide for LanguageReward

casteryh · casteryh · commit a2c02374b224 · 2025-10-31T14:53:14.000-07:00
- Add debug prints to RewardActor showing:
  * Reward value
  * Number of thinking blocks
  * Detected language
  * Response sample

- Create debug_reward.py script for testing reward function
  * Tests 8 common scenarios
  * Shows detected language and confidence

- Add TROUBLESHOOTING.md with solutions for:
  * Model thinking in English instead of Japanese
  * Empty or missing thinking blocks
  * Short content that langid can't detect
  * Stronger system prompt alternatives
  * Expected training progression

This helps diagnose why LanguageReward might be constantly zero
diff --git a/sandbox/grpo_language/TROUBLESHOOTING.md b/sandbox/grpo_language/TROUBLESHOOTING.md
@@ -0,0 +1,165 @@
+# Troubleshooting LanguageReward Training
+
+## Issue: Language Reward is Always Zero
+
+If you're seeing the LanguageReward constantly at 0.0 during training, here's how to debug:
+
+### 1. Check What the Model is Generating
+
+The updated `main.py` includes debug logging. When you run training, look for lines like:
+
+```
+[LanguageReward Debug] Reward=0.00 | Blocks=1 | Lang=en | Sample: <think>Let me solve this step by step...</think>...
+```
+
+This tells you:
+- **Reward**: The actual reward value
+- **Blocks**: Number of thinking blocks found
+- **Lang**: Language detected by langid
+- **Sample**: First 80 chars of the response
+
+### 2. Common Causes and Solutions
+
+#### Cause 1: Model is Thinking in English
+
+**Symptom**: `Lang=en` in debug output
+
+**Why**: The model defaults to English because:
+- The dataset (GSM8K) is in English
+- Most models are English-dominant
+- The instruction might not be strong enough
+
+**Solutions**:
+
+A) **Strengthen the system prompt** (edit `main.py` line 217-220):
+```python
+system_prompt = """
+あなたは数学の問題を解くAIです。<think>タグの中で日本語で考えてください。これは必須です。
+Put all your scratchpad work between <think> and </think> tags. You MUST think in Japanese (日本語) inside the <think> tags.
+Your final answer should be between <answer> and </answer> tags otherwise it will not be scored.
+
+Example:
+<think>この問題を解きましょう。2 + 2 = 4です。</think>
+<answer>4</answer>
+"""
+```
+
+B) **Start with higher language reward weight**:
+In `main.py` line 327, you could add multiple LanguageReward instances:
+```python
+reward_functions=[
+    MathReward(),
+    ThinkingReward(),
+    LanguageReward(target_language="ja"),
+    LanguageReward(target_language="ja"),  # Double weight for language
+]
+```
+
+C) **Use few-shot examples in the prompt**:
+Add Japanese reasoning examples to each problem in the dataset transform.
+
+#### Cause 2: Model Not Using Thinking Blocks
+
+**Symptom**: `Blocks=0` in debug output
+
+**Why**: The model hasn't learned to use `<think>` tags yet
+
+**Solution**: This should improve as ThinkingReward trains the model. Be patient for first few hundred steps. The fallback reward (0.2) should help when there are no blocks but Japanese text.
+
+#### Cause 3: Empty or Very Short Thinking Blocks
+
+**Symptom**: `Lang=en` with very short content, Reward=0.00
+
+**Why**: langid needs sufficient text to reliably detect language. Very short text (< 10 chars) often defaults to English.
+
+**Solution**:
+- Wait for model to generate longer reasoning (this improves with training)
+- The ThinkingReward encourages substantial content in thinking blocks
+
+#### Cause 4: Mixed Language Content
+
+**Symptom**: Reward sometimes 1.0, sometimes 0.0 randomly
+
+**Why**: When English and Japanese are mixed, langid detects whichever is dominant.
+
+**Solution**: This will stabilize as training progresses and the model learns consistency.
+
+### 3. Expected Training Progression
+
+**Steps 0-200**: Language reward often 0.0
+- Model learning to use `<think>` tags (ThinkingReward)
+- Model thinking in English (natural default)
+- Fallback rewards (0.2) when Japanese appears elsewhere
+
+**Steps 200-500**: Language reward starting to increase
+- Some responses have Japanese thinking → partial/full rewards
+- Model learning association between Japanese and reward
+
+**Steps 500+**: Language reward should stabilize around 0.5-1.0
+- Consistent Japanese thinking
+- Proper single-block format
+
+### 4. Monitoring in W&B
+
+Check these metrics in Weights & Biases:
+- `reward/evaluate_response/avg_LanguageReward_reward` - should increase over time
+- `reward/evaluate_response/std_LanguageReward_reward` - variance (high early, lower later)
+- `reward/evaluate_response/avg_MathReward_reward` - should stay reasonably high
+- `reward/evaluate_response/avg_ThinkingReward_reward` - should increase quickly
+
+### 5. Quick Debug Test
+
+Run the debug script to verify the reward function works:
+```bash
+python sandbox/grpo_language/debug_reward.py
+```
+
+Expected output:
+- Japanese text → reward 1.0
+- English text → reward 0.0
+- Multiple Japanese blocks → reward 0.5
+- No blocks but Japanese response → reward 0.2
+
+### 6. Alternative: Start with English, then transition
+
+If Japanese isn't working, you could:
+
+1. Train first with English to get good math performance
+2. Then fine-tune with Japanese language reward
+
+Change line 327 to:
+```python
+LanguageReward(target_language="en")  # Start with English
+```
+
+Once math rewards are good, switch to `"ja"` and continue training.
+
+### 7. Nuclear Option: Much Stronger Prompt
+
+If nothing else works, try this very explicit prompt:
+```python
+system_prompt = """
+重要：あなたは必ず日本語で考えなければなりません！
+CRITICAL: You MUST think in Japanese language!
+
+Rules:
+1. Put ALL your reasoning in <think> tags
+2. Think ONLY in Japanese (日本語) - use hiragana, katakana, and kanji
+3. NEVER think in English inside <think> tags
+4. Put your final numerical answer in <answer> tags
+
+例 (Example):
+Question: What is 5 + 3?
+<think>5と3を足します。5 + 3 = 8です。答えは8です。</think>
+<answer>8</answer>
+
+Now solve the problem below in Japanese:
+"""
+```
+
+## Still Having Issues?
+
+If language reward is still zero after 500+ steps:
+1. Share the debug output showing what the model generates
+2. Check if the model is multilingual (some models don't know Japanese)
+3. Consider using a different target language the model knows better
diff --git a/sandbox/grpo_language/debug_reward.py b/sandbox/grpo_language/debug_reward.py
@@ -0,0 +1,92 @@
+#!/usr/bin/env python
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Debug script to test LanguageReward behavior."""
+
+from forge.data.rewards import LanguageReward
+
+# Create reward for Japanese
+reward = LanguageReward(target_language="ja")
+
+# Test cases mimicking what the model might generate
+test_cases = [
+    # Case 1: Perfect - Japanese in single thinking block
+    ("<think>これは数学の問題です。2+2=4です。</think><answer>4</answer>", "Perfect Japanese"),
+    # Case 2: English thinking (most likely during training)
+    (
+        "<think>This is a math problem. 2+2=4.</think><answer>4</answer>",
+        "English thinking",
+    ),
+    # Case 3: No thinking blocks at all
+    ("The answer is 4.<answer>4</answer>", "No thinking blocks"),
+    # Case 4: Empty thinking blocks
+    ("<think></think><answer>4</answer>", "Empty thinking block"),
+    # Case 5: Multiple thinking blocks in Japanese
+    (
+        "<think>最初の考え。</think><think>次の考え。</think><answer>4</answer>",
+        "Multiple Japanese blocks",
+    ),
+    # Case 6: Just the answer, no thinking
+    ("<answer>4</answer>", "Just answer tag"),
+    # Case 7: Thinking with mostly numbers/symbols
+    ("<think>2 + 2 = 4</think><answer>4</answer>", "Mostly numbers"),
+    # Case 8: Mixed English and Japanese
+    ("<think>Let me think... これは簡単です。</think><answer>4</answer>", "Mixed languages"),
+]
+
+print("=" * 80)
+print("LanguageReward Debug Output (target_language='ja')")
+print("=" * 80)
+
+for response, description in test_cases:
+    score = reward(prompt="", response=response, target=None)
+
+    import re
+
+    # Try to detect what langid thinks
+    import langid
+
+    # Extract thinking content if exists
+    think_match = re.findall(
+        r"<\s*think\s*>(.*?)<\s*/\s*think\s*>", response, re.IGNORECASE | re.DOTALL
+    )
+
+    if think_match:
+        content = " ".join(think_match)
+        detected_lang, confidence = langid.classify(content)
+        print(f"\n{description}:")
+        print(f"  Response: {response[:60]}...")
+        print(f"  Reward: {score}")
+        print(f"  Thinking blocks found: {len(think_match)}")
+        print(f"  Detected language: {detected_lang} (confidence: {confidence:.3f})")
+    else:
+        # Check fallback
+        response_text = re.sub(
+            r"<\s*/?\s*think\s*>", "", response, flags=re.IGNORECASE
+        ).strip()
+        if response_text:
+            detected_lang, confidence = langid.classify(response_text)
+            print(f"\n{description}:")
+            print(f"  Response: {response[:60]}...")
+            print(f"  Reward: {score}")
+            print("  Thinking blocks found: 0")
+            print(
+                f"  Fallback detection on response text: {detected_lang} (confidence: {confidence:.3f})"
+            )
+        else:
+            print(f"\n{description}:")
+            print(f"  Response: {response[:60]}...")
+            print(f"  Reward: {score}")
+            print("  No content to analyze")
+
+print("\n" + "=" * 80)
+print("Expected rewards:")
+print("  full_reward (1.0): Single Japanese thinking block")
+print("  partial_reward (0.5): Multiple Japanese thinking blocks")
+print("  fallback_reward (0.2): No blocks but Japanese response text")
+print("  no_match_reward (0.0): Wrong language")
+print("=" * 80)
diff --git a/sandbox/grpo_language/main.py b/sandbox/grpo_language/main.py
@@ -154,6 +154,32 @@ async def evaluate_response(self, prompt: str, response: str, target: str) -> fl
             reward_fn_name = getattr(
                 reward_fn, "__name__", reward_fn.__class__.__name__
             )
+
+            # Debug logging for LanguageReward to see what's happening
+            if reward_fn_name == "LanguageReward":
+                import re
+
+                import langid
+
+                think_matches = re.findall(
+                    r"<\s*think\s*>(.*?)<\s*/\s*think\s*>",
+                    response,
+                    re.IGNORECASE | re.DOTALL,
+                )
+                if think_matches:
+                    content = " ".join(think_matches)
+                    detected_lang, confidence = langid.classify(content)
+                    print(
+                        f"[LanguageReward Debug] Reward={reward:.2f} | "
+                        f"Blocks={len(think_matches)} | Lang={detected_lang} | "
+                        f"Sample: {response[:80]}..."
+                    )
+                else:
+                    print(
+                        f"[LanguageReward Debug] Reward={reward:.2f} | "
+                        f"Blocks=0 | Sample: {response[:80]}..."
+                    )
+
             # per function reward
             record_metric(
                 f"reward/evaluate_response/sum_{reward_fn_name}_reward",