Enhance prompts and input structure for improved analysis; clarify segment ID extraction and relevance guidelines. Update test input to include additional segment IDs and user input fields for better context. Refactor utils for cleaner Directus client initialization.

roy · roy · commit 48b5da92112d · 2025-07-18T06:40:11.000Z
diff --git a/prompts.py b/prompts.py
@@ -268,9 +268,17 @@
 
 **references**: ARRAY
 For each data segment that supports your analysis, provide:
-- **segment_id**: int - The numerical identifier of the data segment (must be accurate)
+- **segment_id**: int - The numerical identifier of the data segment (must be accurate). Each segment_id looks like "SEGMENT_ID_<number>" in the input data - extract only the number portion (e.g., from "SEGMENT_ID_123" use 123)
 - **description**: string - Explain how this segment contributes to your overall analysis and its specific relevance to the topic
 
+## Critical Reference Guidelines
+- **ONLY include segment IDs that are explicitly mentioned in the input data with the format "SEGMENT_ID_<number>"**
+- **ONLY include segments that directly support claims, insights, or evidence in your analysis**
+- **DO NOT include a segment reference unless you can clearly explain its specific relevance to your findings**
+- **Extract the numeric ID correctly**: From "SEGMENT_ID_123" use 123, from "SEGMENT_ID_456" use 456
+- **Quality over quantity**: It's better to have fewer, highly relevant references than many irrelevant ones
+- **Verify relevance**: Each reference must correspond to content you actually analyzed and cited in your summary
+
 ## Quality Standards
 - **Depth**: Provide comprehensive analysis that goes beyond surface-level summarization
 - **Variety**: Use diverse language and avoid repetitive phrases
@@ -348,7 +356,7 @@
 - Explain what the investigation covers and why it matters
 - Should orient readers to the scope and purpose of the multi-aspect analysis
 
-**summary**: string (4-6 paragraphs with markdown formatting)
+**summary**: string (2-3 paragraphs with markdown formatting)
 - Develop an in-depth, multi-section analysis with proper markdown formatting
 - Include clear subsections with descriptive headings
 - Present findings in logical progression from key insights to supporting details
@@ -451,8 +459,17 @@
 - Address the broader implications and significance of the findings
 
 ### Segments
-- For each relevant document summary, provide accurate segment_id and description
-- Explain how each segment contributes to the overall analysis and its specific relevance
+For each data segment that supports your analysis, provide:
+- **segment_id**: int - The numerical identifier of the data segment (must be accurate). Each segment_id looks like "SEGMENT_ID_<number>" in the input data - extract only the number portion (e.g., from "SEGMENT_ID_123" use 123)
+- **description**: string - Explain how this segment contributes to your overall analysis and its specific relevance to the topic
+
+## Critical Segment Reference Guidelines
+- **ONLY include segment IDs that are explicitly mentioned in the document summaries with the format "SEGMENT_ID_<number>"**
+- **Extract the numeric ID correctly**: From "SEGMENT_ID_123" use 123, from "SEGMENT_ID_456" use 456
+- **ONLY include segments that directly support claims, insights, or evidence in your analysis**
+- **DO NOT include a segment reference unless you can clearly explain its specific relevance to your findings**
+- **Quality over quantity**: It's better to have fewer, highly relevant references than many irrelevant ones
+- **Verify relevance**: Each reference must correspond to content you actually analyzed and cited in your summary
 
 ## Quality Standards
 - **Depth**: Provide comprehensive analysis that goes beyond surface-level summarization
diff --git a/test_input.json b/test_input.json
@@ -1,9 +1,10 @@
 {
     "input": {
       "response_language": "en",
-      "segment_ids": [1,2,3,4],
-      "user_prompt": "Please summarise all the topics.",
-      "project_analysis_run_id": "1b15b167-166c-4c0e-8fb9-c3bf5d930f3e"
+      "segment_ids": [1,2,3,4,5,6,7],
+      "user_input": "Please summarise all the topics.",
+      "user_input_description": "Please summarise all the topics.",
+      "project_analysis_run_id": "39742451-b083-4c3e-a214-4431cce3957b"
     }
   }
   
diff --git a/utils.py b/utils.py
@@ -1,3 +1,7 @@
+# [ ] TODO: Add retry logic in rag calls
+# [ ] TODO: Check why  user_input and user_input_description are not being populated in directus
+# [ ] TODO: Change backend of echo to respond only with data not prompt
+
 import os
 import json
 import uuid
@@ -45,7 +49,9 @@
 DIRECTUS_USERNAME = str(os.getenv("DIRECTUS_USERNAME"))
 DIRECTUS_PASSWORD = str(os.getenv("DIRECTUS_PASSWORD"))
 
-directus = DirectusClient(url=DIRECTUS_BASE_URL, email=DIRECTUS_USERNAME, password=DIRECTUS_PASSWORD)
+directus = DirectusClient(
+    url=DIRECTUS_BASE_URL, email=DIRECTUS_USERNAME, password=DIRECTUS_PASSWORD
+)
 
 
 def generate_uuid() -> str:

Original file line number	Diff line number	Diff line change
`@@ -1,9 +1,10 @@`
`1`	`1`	`{`
`2`	`2`	`"input": {`
`3`	`3`	`"response_language": "en",`
`4`		`- "segment_ids": [1,2,3,4],`
`5`		`- "user_prompt": "Please summarise all the topics.",`
`6`		`- "project_analysis_run_id": "1b15b167-166c-4c0e-8fb9-c3bf5d930f3e"`
	`4`	`+ "segment_ids": [1,2,3,4,5,6,7],`
	`5`	`+ "user_input": "Please summarise all the topics.",`
	`6`	`+ "user_input_description": "Please summarise all the topics.",`
	`7`	`+ "project_analysis_run_id": "39742451-b083-4c3e-a214-4431cce3957b"`
`7`	`8`	`}`
`8`	`9`	`}`
`9`	`10`