-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Description
Hi,
The ARC benchmark, registered in deepeval/benchmarks/arc/arc.py should have several modes per question, that should each correspond to a different "self.confinement_instructions" (or rather, turning the attribute self.confinement_instructions into a dict, similar to the TruthfulQA implementation, found in deepeval/benchmarks/truthful_qa/truthful_qa.py). This is because while majority of questions are as follows:
e.g.
'Which are two parts of the carbon cycle?
A. freezing and thawing
B. growth and reproduction
C. evaporation and precipitation
D. photosynthesis and respiration
Answer: '
There are some of the following type:
'Different species of carnivorous animals that share the same habitat
in an ecosystem may
1. become decomposers
2. compete for food
3. produce their own food
4. mate with each other
Answer: '
While the self.confinement_instructions is irrespective of the question always "Output 'A', 'B', 'C', or 'D'. Full answer not needed.". This necessarily induces the model to err and try to answer using A, B, C or D.
Thank you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels