-
Notifications
You must be signed in to change notification settings - Fork 90
Closed
Description
We need to do a double check in our evalutions with fewshots.
- We are tokenizing the fewshots and the question separatly and the extendinf the String
https://github.com/PsycheFoundation/psyche/blob/main/shared/eval/src/harness.rs#L135
https://github.com/PsycheFoundation/psyche/blob/main/shared/eval/src/harness.rs#L148
That can lead to some mismatch in the tokens generations
For example in our implementation of arc easy and arc challenge, if the eval has fewshots, if we decode the eval request we have and additional space
' Answer: ....'
You can see the Hugging Face request to compare here batched_inps:
https://github.com/EleutherAI/lm-evaluation-harness/blob/cd9bac7c27f3c876bb8e60dca8ee3b6de6b33c35/lm_eval/models/huggingface.py#L1296-L1297
- I think here we need an extra space
psyche/shared/eval/src/harness.rs
Lines 243 to 244 in abb4335
.map(|x| format!("{}{}", x.text, x.choices[x.answer])) .collect::<Vec<_>>()
We should check all the evals and see that our implementation has the same format as lm_eval
Metadata
Metadata
Assignees
Labels
No labels