Perplexity implementation #2356
elements72
started this conversation in
General
Replies: 1 comment
-
Thanks for catching those issues. Do you think you would be open to a PR to fix those two issues? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! I was looking to the implementation of the perplexity and I had a couple of doubts.
The first is that the actual implementation of the window ppl is slightly different from the one on HF. On HF code they took out the average from the loss and made the average at the end over all the tokens while in Axolotl it is an average of losses.
The second one is about performance, from what I have understood in Axolotl you compute the ppl over all the samples of the eval_dataset at one time. Would not lead to OOM if the eval set is composed of several samples?
Beta Was this translation helpful? Give feedback.
All reactions