Skip to content

Evals: Fix Fewshots #446

@pefontana

Description

@pefontana

We need to do a double check in our evalutions with fewshots.

  1. We are tokenizing the fewshots and the question separatly and the extendinf the String
    https://github.com/PsycheFoundation/psyche/blob/main/shared/eval/src/harness.rs#L135
    https://github.com/PsycheFoundation/psyche/blob/main/shared/eval/src/harness.rs#L148

That can lead to some mismatch in the tokens generations
For example in our implementation of arc easy and arc challenge, if the eval has fewshots, if we decode the eval request we have and additional space
' Answer: ....'

You can see the Hugging Face request to compare here batched_inps:
https://github.com/EleutherAI/lm-evaluation-harness/blob/cd9bac7c27f3c876bb8e60dca8ee3b6de6b33c35/lm_eval/models/huggingface.py#L1296-L1297

  1. I think here we need an extra space
    .map(|x| format!("{}{}", x.text, x.choices[x.answer]))
    .collect::<Vec<_>>()

We should check all the evals and see that our implementation has the same format as lm_eval

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions