Improve pronoun grounding test to validate cross-pronoun resolution

abrookins · claude · abrookins · commit f9773c0c151b · 2025-08-26T11:42:42.000-07:00
Changed test case to use different pronouns referring to different people: - "She said that he prefers..." → "Alice said that Bob prefers..." This properly tests that multiple pronouns in the same sentence are correctly resolved to different entities based on context, avoiding redundant same-name replacements while maintaining test validity. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
diff --git a/tests/test_llm_judge_evaluation.py b/tests/test_llm_judge_evaluation.py
@@ -263,13 +263,13 @@ async def test_judge_pronoun_grounding_evaluation(self):
 
         # Test case: good pronoun grounding
         context_messages = [
-            "John is a software engineer at Google.",
-            "Sarah works with him on the AI team.",
+            "Alice is the team lead for the project.",
+            "Bob is a junior developer working under her.",
         ]
 
-        original_text = "He mentioned that he prefers Python over JavaScript."
-        good_grounded_text = "John mentioned that he prefers Python over JavaScript."
-        expected_grounding = {"he": "John"}
+        original_text = "She said that he prefers Python over JavaScript."
+        good_grounded_text = "Alice said that Bob prefers Python over JavaScript."
+        expected_grounding = {"she": "Alice", "he": "Bob"}
 
         evaluation = await judge.evaluate_grounding(
             context_messages=context_messages,