Skip to content

Conversation

@EAddario
Copy link
Contributor

This PR adds an option to ignore the "task N does not fit in the context window" and display the offending test during ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores calculation.

For full background on the PR, please see: Squeezing Tensor Bits: the quest for smaller LLMs

if (params.winogrande_tasks > 0 && params.winogrande_tasks < data.size()) {
LOG_INF("%s : selecting %zu random tasks\n", __func__, params.winogrande_tasks);
std::mt19937 rng(1);
std::mt19937 rng(std::random_device{}());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that using a fixed seed was intentional so that the results are repeatable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too, but the rationale for the proposed change is that by "freezing" the test cases it lessens the validity of the test. If repeatable results are indeed the objective, why go through the extra trouble of randomizing in the first place, instead of just processing tests cases sequentially?

I realise this is more of a personal preference and I'm OK to revert the change back if needed.

@m18coppola
Copy link
Contributor

Why skip the offending task? You could just make sure you have enough parallel decoding sequences for the dataset you're evaluating the model against. For example, some of the multiple choice questions in ARC have 5 possible answer which will cause your "task N does not fit in the context window" error. You can add the command line parameter -np 5 to ensure you can decode all of the possible answers instead of getting an error.

@EAddario
Copy link
Contributor Author

EAddario commented Apr 6, 2025

Thanks for flagging this @m18coppola. I had not connected the dots. Will check and close the PR if not needed

@EAddario EAddario marked this pull request as draft April 7, 2025 06:00
@EAddario
Copy link
Contributor Author

EAddario commented Apr 9, 2025

Closing this PR as desired functionally already exists. For reference, see @m18coppola's comment above

@EAddario EAddario closed this Apr 9, 2025
@EAddario EAddario deleted the perplexity branch April 14, 2025 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants