perplexity: Add option to ignore context window overflow errors and continue score calculation #12512

EAddario · 2025-03-22T08:37:14Z

This PR adds an option to ignore the "task N does not fit in the context window" and display the offending test during ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores calculation.

For full background on the PR, please see: Squeezing Tensor Bits: the quest for smaller LLMs

slaren · 2025-03-29T13:14:51Z

examples/perplexity/perplexity.cpp

    if (params.winogrande_tasks > 0 && params.winogrande_tasks < data.size()) {
        LOG_INF("%s : selecting %zu random tasks\n", __func__, params.winogrande_tasks);
-        std::mt19937 rng(1);
+        std::mt19937 rng(std::random_device{}());


I suspect that using a fixed seed was intentional so that the results are repeatable.

I thought so too, but the rationale for the proposed change is that by "freezing" the test cases it lessens the validity of the test. If repeatable results are indeed the objective, why go through the extra trouble of randomizing in the first place, instead of just processing tests cases sequentially?

I realise this is more of a personal preference and I'm OK to revert the change back if needed.

m18coppola · 2025-04-04T21:26:29Z

Why skip the offending task? You could just make sure you have enough parallel decoding sequences for the dataset you're evaluating the model against. For example, some of the multiple choice questions in ARC have 5 possible answer which will cause your "task N does not fit in the context window" error. You can add the command line parameter -np 5 to ensure you can decode all of the possible answers instead of getting an error.

EAddario · 2025-04-06T18:15:15Z

Thanks for flagging this @m18coppola. I had not connected the dots. Will check and close the PR if not needed

EAddario · 2025-04-09T19:05:53Z

Closing this PR as desired functionally already exists. For reference, see @m18coppola's comment above

EAddario added 7 commits March 15, 2025 13:03

Ensure tasks are randomized across test runs

41a63f4

Error log offending task

7fbbadc

Add --ignore-context-overflow option for llama-perplexity

fa7d1c6

Add option to ignore context overflow during score computation

1035c6e

Merge branch 'master' into perplexity

8a827f8

Merge branch 'master' into perplexity

f348628

Merge branch 'master' into perplexity

e354bc3

github-actions bot added the examples label Mar 22, 2025

Merge branch 'master' into perplexity

d41f015

slaren reviewed Mar 29, 2025

View reviewed changes

EAddario marked this pull request as draft April 7, 2025 06:00

Merge branch 'master' into perplexity

c5cd061

EAddario closed this Apr 9, 2025

EAddario deleted the perplexity branch April 14, 2025 07:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perplexity: Add option to ignore context window overflow errors and continue score calculation #12512

perplexity: Add option to ignore context window overflow errors and continue score calculation #12512

Uh oh!

EAddario commented Mar 22, 2025

Uh oh!

slaren Mar 29, 2025

Uh oh!

EAddario Mar 30, 2025

Uh oh!

m18coppola commented Apr 4, 2025

Uh oh!

EAddario commented Apr 6, 2025

Uh oh!

EAddario commented Apr 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perplexity: Add option to ignore context window overflow errors and continue score calculation #12512

perplexity: Add option to ignore context window overflow errors and continue score calculation #12512

Uh oh!

Conversation

EAddario commented Mar 22, 2025

Uh oh!

slaren Mar 29, 2025

Choose a reason for hiding this comment

Uh oh!

EAddario Mar 30, 2025

Choose a reason for hiding this comment

Uh oh!

m18coppola commented Apr 4, 2025

Uh oh!

EAddario commented Apr 6, 2025

Uh oh!

EAddario commented Apr 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants