You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: spring-ai-docs/src/main/antora/modules/ROOT/pages/api/testing.adoc
+69-1Lines changed: 69 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -92,4 +92,72 @@ void testEvaluation() {
92
92
}
93
93
----
94
94
95
-
The code above is from the example application located https://github.com/rd-1-2022/ai-azure-rag.git[here].
95
+
The code above is from the example application located https://github.com/rd-1-2022/ai-azure-rag.git[here].
96
+
97
+
== CorrectnessEvaluator
98
+
99
+
Whereas `RelevancyEvaluator` establishes if the generated content is relevant to the input, `CorrectnessEvaluator` determines if the generated content is correct, as compared with a reference answer that is correct. It also produces a score (with a range of 1 to 5) the gauge how correct the generated content is.
100
+
101
+
The `CorrectnessEvaluator` submits the following system prompt to the AI model as guidelines for determining correctness:
102
+
103
+
[source,text]
104
+
----
105
+
You are an expert evaluation system for a question answering chatbot.
106
+
You are given the following information:
107
+
- a user query, and
108
+
- a generated answer
109
+
You may also be given a reference answer to use for reference in your evaluation.
110
+
Your job is to judge the relevance and correctness of the generated answer.
111
+
Output a single score that represents a holistic evaluation.
112
+
Follow these guidelines for scoring:
113
+
- Your score has to be between 1 and 5, where 1 is the worst and 5 is the best.
114
+
- If the generated answer is not relevant to the user query,
115
+
you should give a score of 1.
116
+
- If the generated answer is relevant but contains mistakes,
117
+
you should give a score between 2 and 3.
118
+
- If the generated answer is relevant and fully correct,
119
+
you should give a score between 4 and 5.
120
+
Example Response:
121
+
4.0
122
+
The generated answer has the exact same metrics as the reference answer,
123
+
but it is not as concise.
124
+
----
125
+
126
+
Along with the system prompt, the query input, generated answer, and the reference answer are provided in the user prompt:
127
+
128
+
[source,text]
129
+
----
130
+
{query}
131
+
## Reference Answer
132
+
{reference_answer}
133
+
## Generated Answer
134
+
{generated_answer}
135
+
----
136
+
137
+
Here is an example of a JUnit test that performs a RAG query over a PDF document loaded into a Vector Store and then evaluates if the response is relevant to the user text.
assertTrue(evaluationResponse.isPass(), "Response is incorrect");
160
+
}
161
+
----
162
+
163
+
The `CorrectnessEvaluator` is created with a `ChatClient` as well as a threshold that the score must be greater than or equal to in order for the evaluation to be considered correct.
0 commit comments