ARC benchmark should have several "modes" per question

Hi,

The ARC benchmark, registered in [deepeval/benchmarks/arc/arc.py](https://github.com/confident-ai/deepeval/blob/8f7aa0a5accf4148c51f5282bb7fd39f37c4661b/deepeval/benchmarks/arc/arc.py) should have several modes *per question*, that should each correspond to a different "self.confinement_instructions" (or rather, turning the attribute self.confinement_instructions into a dict, similar to the TruthfulQA implementation, found in [deepeval/benchmarks/truthful_qa/truthful_qa.py](https://github.com/confident-ai/deepeval/blob/8f7aa0a5accf4148c51f5282bb7fd39f37c4661b/deepeval/benchmarks/truthful_qa/truthful_qa.py)). This is because while majority of questions are as follows:
e.g.

```
'Which are two parts of the carbon cycle?

        A. freezing and thawing

        B. growth and reproduction

        C. evaporation and precipitation

        D. photosynthesis and respiration

        Answer: '
```

There are some of the following type:

```
'Different species of carnivorous animals that share the same habitat
        in an ecosystem may

        1. become decomposers

        2. compete for food

        3. produce their own food

        4. mate with each other

        Answer: '
```

While the self.confinement_instructions is irrespective of the question always "Output 'A', 'B', 'C', or 'D'. Full answer not needed.". This necessarily induces the model to err and try to answer using A, B, C or D.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARC benchmark should have several "modes" per question #2352

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ARC benchmark should have several "modes" per question #2352

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions