Skip to content

Commit 7417db9

Browse files
authored
Added jinja-compatible positive-judgement example to math templates (#158)
MathVerse and MathVista scoring templates had only contextual examples of Judgement 0 (where the model answer does not match the ground-truth). The [original MathVerse template](https://github.com/ZrrSkywalker/MathVerse/blob/937b090597aeafb8e82b35d310a4bc5b9e2ea29d/evaluation/prompts.py) included one example of Judgement 1, but it is incompatible with jinja due to the carat (^) character. This PR implements an example of Judgement 1 which is close to that of the original MathVerse template, but compatible with jinja. I also fixed a typo and standardized MathVerse/Vista instructions.
1 parent dad6d84 commit 7417db9

File tree

2 files changed

+16
-6
lines changed

2 files changed

+16
-6
lines changed

eureka_ml_insights/prompt_templates/mathverse_templates/scoring_prompt.jinja

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Below are two answers to a math question. Question is [Question], [Standard Answer] is the standard answer to the question, and [Model_answer] is the answer extracted from a model's output to this question. Determine whether these two answers are consistent.
2-
Please note that only when the [Model_answer] completely matches the [Standard Answer] means they are consistent. For non-multiple-choice questions, if the meaning is expressed in the same way, it is also considered consistent, for example, 0.5m and 50cm.
3-
If they are consistent, Judgment is 1; if they are different, Judgment is 0.
2+
Please note that only when the [Model_answer] completely matches the [Standard Answer] means they are consistent. For multiple choice questions, consider that the model answer may contain the letter representing the answer value. For non-multiple-choice questions, if the meaning is expressed in the same way, it is also considered consistent, for example, 0.5m and 50cm.
3+
If they are consistent, Judgement is 1; if they are different, Judgement is 0.
44

55
[Question]: Write the set of numbers represented on the number line in interval notation.
66
[Standard Answer]: (-2,1]
@@ -22,7 +22,12 @@ Judgement: 0
2222
[Model_answer] : null
2323
Judgement: 0
2424

25-
[Question]: {{question}}
25+
[Question]: Given the graph of the line that intersects with x-axis at -3 and with y-axis at 4, determine its equation. A. y = \\frac{{4}}{{3}}x + 4 B. Cannot determine.\n
26+
[Standard Answer]: A
27+
[Model_answer] : y = \\frac{{4}}{{3}}x + 4
28+
Judgement: 1
29+
30+
[Question]: {{query}}
2631
[Standard Answer]: {{answer}}
2732
[Model_answer] : {{extraction}}
28-
Judgement:
33+
Judgement:

eureka_ml_insights/prompt_templates/mathvista_templates/scoring_prompt.jinja

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Below are two answers to a math question. Question is [Question], [Standard Answer] is the standard answer to the question, and [Model_answer] is the answer extracted from a model's output to this question. Determine whether these two answers are consistent.
22
Please note that only when the [Model_answer] completely matches the [Standard Answer] means they are consistent. For multiple choice questions, consider that the model answer may contain the letter representing the answer value. For non-multiple-choice questions, if the meaning is expressed in the same way, it is also considered consistent, for example, 0.5m and 50cm.
3-
If they are consistent, Judgment is 1; if they are different, Judgment is 0.
3+
If they are consistent, Judgement is 1; if they are different, Judgement is 0.
44

55
[Question]: Write the set of numbers represented on the number line in interval notation.
66
[Standard Answer]: (-2,1]
@@ -22,7 +22,12 @@ Judgement: 0
2222
[Model_answer] : null
2323
Judgement: 0
2424

25+
[Question]: Given the graph of the line that intersects with x-axis at -3 and with y-axis at 4, determine its equation. A. y = \\frac{{4}}{{3}}x + 4 B. Cannot determine.\n
26+
[Standard Answer]: A
27+
[Model_answer] : y = \\frac{{4}}{{3}}x + 4
28+
Judgement: 1
29+
2530
[Question]: {{query}}
2631
[Standard Answer]: {{answer}}
2732
[Model_answer] : {{extraction}}
28-
Judgement:
33+
Judgement:

0 commit comments

Comments
 (0)