Skip to content

Commit 1a13a34

Browse files
author
sakher
committed
Added Quran Eval
1 parent 2420c62 commit 1a13a34

File tree

7 files changed

+84
-0
lines changed

7 files changed

+84
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:d102c8042318245beb349794ff93a27ce7f2c76cb6d9d0a11a2c81c2b3b7ce9c
3+
size 157821
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:26c157f176fe241c96a2f38a9eab980e21cb28393d74e903032fba3b28314bad
3+
size 180906
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:219103374aaef27cb66e295896b3afbb1af2543ba7315639b3b72e4df49b09a5
3+
size 823143
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:d0ad95b6ac9ef3b3e6e0bd24e5a00a3967a7f5ffaacb6c2f7337090db3a1aa88
3+
size 200064
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
quran-evals:
2+
evals:
3+
- guess_quran_verse_name
4+
- guess_quran_verse_type
5+
- guess_which_text_is_from_quran
6+
- masked_quranic_text
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
guess_quran_verse_name:
2+
id: guess_quran_verse_name.dev.v0
3+
description: Tests the model's ability to guess the name of a Quranic verse.
4+
metrics: [accuracy]
5+
guess_quran_verse_name.dev.v0:
6+
class: evals.elsuite.modelgraded.classify:ModelBasedClassify
7+
args:
8+
samples_jsonl: quran_eval/guess_quran_verse_name.jsonl
9+
eval_type: cot_classify
10+
modelgraded_spec: simple_fact
11+
12+
guess_quran_verse_type:
13+
id: guess_quran_verse_type.dev.v0
14+
description: Tests the model's ability to guess the type of a Quranic verse (e.g. Meccan or Medinan)
15+
metrics: [accuracy]
16+
guess_quran_verse_type.dev.v0:
17+
class: evals.elsuite.modelgraded.classify:ModelBasedClassify
18+
args:
19+
samples_jsonl: quran_eval/guess_quran_verse_type.jsonl
20+
eval_type: cot_classify
21+
modelgraded_spec: simple_fact
22+
23+
guess_which_text_is_from_quran:
24+
id: guess_which_text_is_from_quran.dev.v0
25+
description: Tests the model's ability to guess which text is from the Quran.
26+
metrics: [accuracy]
27+
guess_which_text_is_from_quran.dev.v0:
28+
class: evals.elsuite.modelgraded.classify:ModelBasedClassify
29+
args:
30+
samples_jsonl: quran_eval/guess_which_text_is_from_quran.jsonl
31+
eval_type: cot_classify
32+
modelgraded_spec: simple_fact
33+
34+
masked_quranic_text:
35+
id: masked_quranic_text.dev.v0
36+
description: Tests the model's ability to predict masked Quranic text.
37+
metrics: [accuracy]
38+
masked_quranic_text.dev.v0:
39+
class: evals.elsuite.modelgraded.classify:ModelBasedClassify
40+
args:
41+
samples_jsonl: quran_eval/masked_quranic_text.jsonl
42+
eval_type: cot_classify
43+
modelgraded_spec: simple_fact
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
simple_fact:
2+
prompt: |-
3+
You are tasked with comparing a submitted answer to an expert answer on a specific question. Here are the details provided:
4+
[BEGIN DATA]
5+
************
6+
[Question]: {input}
7+
************
8+
[Expert Answer]: {ideal}
9+
************
10+
[Submitted Answer]: {completion}
11+
************
12+
[END DATA]
13+
14+
Based on the factual content of the submitted answer compared to the expert answer, determine if the submitted answer is valid and complete. Consider only the factual accuracy and completeness of the answer, ignoring differences in style, grammar, or punctuation.
15+
Provide your judgment as either 'Y' for a valid and complete answer or 'N' for an invalid or incomplete answer. Your evaluation should focus on the following criteria:
16+
1. Does the submitted answer accurately reflect the factual content of the expert answer?
17+
2. Is the submitted answer complete, without missing any crucial details provided in the expert answer?
18+
choice_scores:
19+
"Y": 1.0
20+
"N": 0.0
21+
choice_strings: YN
22+
input_outputs:
23+
input: completion

0 commit comments

Comments
 (0)