You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are comparing a submitted answer to an expert answer on a given question. Here is the data:
22
+
You are comparing a submitted answer to an expert's rubric on a given question. Here is the data:
23
23
[BEGIN DATA]
24
24
************
25
25
[Question]: ${input}
26
26
************
27
-
[Expert]: ${expected}
27
+
[Expert Rubric]: ${expected}
28
28
************
29
29
[Submission]: ${output}
30
30
************
31
31
[END DATA]
32
+
33
+
Submissions contain message metadata inside of the <message_content> XML tags.
34
+
The attribute \`type=text\` indicates text content. The attribute \`type=tool-call\` indicates a tool call.
35
+
Use this metadata to determine the accuracy of the response.
32
36
33
-
Compare the factual content of the submitted answer with the expert answer. Ignore any differences in style, grammar, or punctuation.
34
-
The submitted answer may either be a subset or superset of the expert answer, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:
35
-
(A) The submitted answer is a subset of the expert answer and is fully consistent with it.
36
-
(B) The submitted answer is a superset of the expert answer and is fully consistent with it.
37
-
(C) The submitted answer contains all the same details as the expert answer.
38
-
(D) There is a disagreement between the submitted answer and the expert answer.
37
+
Compare the factual content of the submitted answer with the expert's answer rubric. Ignore any differences in style, grammar, or punctuation.
38
+
The submitted answer may either be a subset or superset of the expert's expected answer, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:
39
+
(A) The submitted answer is a subset of the answer the expert's rubric describes and is fully consistent with it.
40
+
(B) The submitted answer is a superset of the answer the expert's rubric describes and is fully consistent with it.
41
+
(C) The submitted answer contains all the same details of the answer the expert's rubric describes.
42
+
(D) There is a disagreement between the submitted answer and the expert's rubric.
39
43
(E) The answers differ, but these differences don't matter from the perspective of factuality.
0 commit comments