Perplexity implementation #2356

elements72 · 2025-02-22T11:53:35Z

elements72
Feb 22, 2025

Hi! I was looking to the implementation of the perplexity and I had a couple of doubts.
The first is that the actual implementation of the window ppl is slightly different from the one on HF. On HF code they took out the average from the loss and made the average at the end over all the tokens while in Axolotl it is an average of losses.
The second one is about performance, from what I have understood in Axolotl you compute the ppl over all the samples of the eval_dataset at one time. Would not lead to OOM if the eval set is composed of several samples?

NanoCode012 · 2025-02-25T10:17:43Z

NanoCode012
Feb 25, 2025
Maintainer

Thanks for catching those issues. Do you think you would be open to a PR to fix those two issues?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Perplexity implementation #2356

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Perplexity implementation #2356

Uh oh!

elements72 Feb 22, 2025

Replies: 1 comment

Uh oh!

NanoCode012 Feb 25, 2025 Maintainer

elements72
Feb 22, 2025

NanoCode012
Feb 25, 2025
Maintainer