Refactoring the Contamination Scenario

Hi, @yifanmai!

Sorry for the delayed response, but getting back to the discussion [[pull request](https://github.com/stanford-crfm/helm/pull/3627)], we would like to follow your suggestion to make the implementation more efficient and better integrated into HELM.


# Proposal

We are planning to refactor our solution and turn the contamination index computation into a scenario, instead of treating it as a nested HELM run. The proposed command for execution will be:

```
helm-run --run-entries contamination:dataset=bluex,model=ibm/granite-3.3-8b-instruct,strategy=ts_guessing_question_multichoice,language=pt --suite my-suite --max-eval-instances 1500
```

In this new format:

- contamination indicates that the model contamination will be computed for a given scenario;
- dataset  specifies the scenario being used;
- strategy defines the contamination method applied;
- model indicates the model to be evaluated;
- language specifies the prompt language.

# Impact

No existing files in HELM will need to be modified — only new files will be added, following the same pattern used for adding new scenarios. The following files will be included:

- contamination_run_specs.py
- contamination_scenario.py
- test_contamination_scenario.py

However, since the contamination scenario needs to access one of HELM’s datasets to perform word/option masking, it will also require adding a few auxiliary files:

- contamination_utils.py
- contamination_base.py
- prompt_translations.py
- ts_guessing_question_based.py
- ts_guessing_question_multichoice.py

These additional files ensure the full functionality of the contamination computation.
Thus, according to the proposal, contamination becomes a scenario, and the specified dataset acts as a meta-scenario.

The new file structure is organized as follows:

```
src/
└── helm/
    ├── benchmark/
        ├── run_specs/
            contamination_run_specs.py
        └── scenarios/
            └──⭐ contamination/
                contamination_scenario.py
                contamination_utils.py
                contamination_base.py
                prompt_translations.py
                ts_guessing_question_based.py
                ts_guessing_question_multichoice.py
                test_contamination_scenario.py
 ```
We would like to confirm whether this proposal is aligned with HELM’s architecture and whether you think we can proceed with refactoring the module in this direction.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring the Contamination Scenario #3914

Proposal

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactoring the Contamination Scenario #3914

Description

Proposal

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions