Evals: Fix Fewshots

We need to do a double check in our evalutions with fewshots.

1. We are tokenizing the fewshots and the question separatly and the extendinf the String
https://github.com/PsycheFoundation/psyche/blob/main/shared/eval/src/harness.rs#L135
https://github.com/PsycheFoundation/psyche/blob/main/shared/eval/src/harness.rs#L148

That can lead to some mismatch in the tokens generations
For example in our implementation of arc easy and arc challenge, if the eval has fewshots, if we decode the eval request we have and additional space
' Answer: ....'

You can see the Hugging Face request to compare here `batched_inps`:
https://github.com/EleutherAI/lm-evaluation-harness/blob/cd9bac7c27f3c876bb8e60dca8ee3b6de6b33c35/lm_eval/models/huggingface.py#L1296-L1297

2. I think here we need an extra space 
https://github.com/PsycheFoundation/psyche/blob/abb433574089f6942b5e4141935c65d2a3e03c99/shared/eval/src/harness.rs#L243-L244


We should check all the evals and see that our implementation has the same format as `lm_eval`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evals: Fix Fewshots #446

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	.map(\|x\| format!("{}{}", x.text, x.choices[x.answer]))
	.collect::<Vec<_>>()

Evals: Fix Fewshots #446

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions