1+ ---
12metadata :
23 name : " response_evaluation"
3- version : " 2.0.0"
4- description : " Evaluate answer quality and completeness"
4+ version : " 1.0.0"
5+ description : " Evaluate response quality and completeness"
6+ author : " RAG Team"
7+ created_date : " 2025-06-10"
8+ last_modified : " 2025-06-10"
9+ tags : ["evaluation", "reflexion", "quality-control"]
10+
511config :
6- temperature : 0.0
7- max_tokens : 300
12+ temperature : 0.3
13+ max_tokens : 1000
14+ model_type : " evaluation"
15+
816variables :
917 - name : " query"
1018 type : " string"
@@ -15,16 +23,53 @@ variables:
1523 - name : " docs_summary"
1624 type : " string"
1725 required : true
26+ - name : " cycle_number"
27+ type : " integer"
28+ required : true
29+ - name : " confidence_threshold"
30+ type : " float"
31+ required : true
32+
1833prompt_template : |
19- Evaluate the answer below for question {{query}}:
34+ You are an expert evaluator assessing the quality and completeness of AI responses.
2035
21- Answer:
36+ EVALUATION TASK:
37+ Assess if the following response sufficiently answers the user's question.
38+
39+ Original Question: {{query}}
40+
41+ Current Response (Cycle {{cycle_number}}):
2242 {{partial_answer}}
2343
24- Context Summary:
25- {{docs_summary}}
44+ Available Context: {{docs_summary}}
45+
46+ EVALUATION CRITERIA:
47+ 1. Completeness: Does the response address all aspects of the question?
48+ 2. Accuracy: Is the response supported by the available documents?
49+ 3. Confidence: Does the response contain uncertain or vague language?
50+ 4. Specificity: Are there specific sub-questions that need more detail?
51+
52+ RESPONSE FORMAT (JSON):
53+ {
54+ "confidence_score": 0.35,
55+ "decision": "continue|refine_query|complete|insufficient_data",
56+ "reasoning": "Detailed explanation of the assessment",
57+ "covered_aspects": ["aspect1", "aspect2"],
58+ "missing_aspects": ["missing1", "missing2"],
59+ "uncertainty_phrases": ["phrase1", "phrase2"],
60+ "specific_gaps": ["What specific details are missing?"]
61+ }
62+
63+ DECISION GUIDELINES:
64+ - confidence_score: 0.0-1.0 (how well the question is answered)
65+ - "complete": confidence >= {{confidence_threshold}} and no major gaps
66+ - "continue": confidence < {{confidence_threshold}} but retrievable information exists
67+ - "refine_query": need more specific queries for missing aspects
68+ - "insufficient_data": fundamental information is missing from knowledge base
2669
27- Provide JSON with:
28- {"confidence":0.0-1.0,"decision":"continue|complete|insufficient_data","reason":"brief"}
70+ INSTRUCTION:
71+ 1. Be very strict in the process
72+ 2. Always lower confidence on mistakes
73+ 3. Ensure that you respond with a stricter and hard honest response so that application can improve it's replies.
2974
30- - If unsure, choose “insufficient_data.”
75+ Provide your evaluation as valid JSON:
0 commit comments