[patch] Reduce NaN Occurrences by Simple Prompt Modification for JSON Output for context_precision (#581)

i-w-a · shahules786 · web-flow · commit 41c0c286ae36 · 2024-02-10T16:03:07.000-08:00
## Overview During the calculation of `context_precision`, an issue was observed where increasing the context amount led to a surge in NaN occurrences. Comparatively, `context_recall` does not exhibit this problem. An investigation into the causes of the difference uncovered that the issue stems from whether the prompts specify outputting in JSON format. ## Discovery It was found that simply specifying JSON output for `context_precision`, similar to what is done for `context_recall`, significantly reduces the incidence of NaN. Utilizing JSON mode appears to be crucial, as noted in the OpenAI reference for text generation in JSON mode: > "If you don't include an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit." [OpenAI Text Generation JSON Mode Documentation](https://platform.openai.com/docs/guides/text-generation/json-mode) ## Solution To align with best practices and address the NaN generation issue, I propose updating the prompt for `context_precision` to explicitly instruct the generation of output in JSON format. This small but impactful change will bring `context_precision` in line with how `context_recall` operates and ensure more stable and predictable outcomes when handling larger context volumes. ## Impact By making this explicit switch to JSON output, we not only follow the guideline provided by OpenAI but also prevent the potential uncontrolled behavior that can result in a heavy onslaught of NaN values. This improvement should increase the reliability of calculations within our system and significantly decrease the time spent debugging NaN-related issues. I look forward to your review and approval of this change, which will help us maintain robustness in our context precision calculations. Best, i-w-a --------- Co-authored-by: Shahules786 <Shahules786@gmail.com>
diff --git a/src/ragas/llms/prompt.py b/src/ragas/llms/prompt.py
@@ -91,7 +91,12 @@ def to_string(self) -> str:
         """
         Generate the prompt string from the variables.
         """
-        prompt_str = self.instruction + "\n"
+        added_json_instruction = (
+            "\nOutput in only valid JSON format."
+            if self.output_type.lower() == "json"
+            else ""
+        )
+        prompt_str = self.instruction + added_json_instruction + "\n"
 
         if self.examples:
             # Format the examples to match the Langchain prompt template
diff --git a/src/ragas/metrics/_context_precision.py b/src/ragas/metrics/_context_precision.py
@@ -18,7 +18,7 @@
 
 CONTEXT_PRECISION = Prompt(
     name="context_precision",
-    instruction="""Given question, answer and context verify if the context was useful in arriving at the given answer. Give verdict as "1" if useful and "0" if not. """,
+    instruction="""Given question, answer and context verify if the context was useful in arriving at the given answer. Give verdict as "1" if useful and "0" if not with json output. """,
     examples=[
         {
             "question": """What can you tell me about albert Albert Einstein?""",

Original file line number	Diff line number	Diff line change
`@@ -18,7 +18,7 @@`
`18`	`18`
`19`	`19`	`CONTEXT_PRECISION = Prompt(`
`20`	`20`	`name="context_precision",`
`21`		`- instruction="""Given question, answer and context verify if the context was useful in arriving at the given answer. Give verdict as "1" if useful and "0" if not. """,`
	`21`	`+ instruction="""Given question, answer and context verify if the context was useful in arriving at the given answer. Give verdict as "1" if useful and "0" if not with json output. """,`
`22`	`22`	`examples=[`
`23`	`23`	`{`
`24`	`24`	`"question": """What can you tell me about albert Albert Einstein?""",`