-
Notifications
You must be signed in to change notification settings - Fork 13.5k
perplexity: Add option to ignore context window overflow errors and continue score calculation #12512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if (params.winogrande_tasks > 0 && params.winogrande_tasks < data.size()) { | ||
| LOG_INF("%s : selecting %zu random tasks\n", __func__, params.winogrande_tasks); | ||
| std::mt19937 rng(1); | ||
| std::mt19937 rng(std::random_device{}()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that using a fixed seed was intentional so that the results are repeatable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too, but the rationale for the proposed change is that by "freezing" the test cases it lessens the validity of the test. If repeatable results are indeed the objective, why go through the extra trouble of randomizing in the first place, instead of just processing tests cases sequentially?
I realise this is more of a personal preference and I'm OK to revert the change back if needed.
|
Why skip the offending task? You could just make sure you have enough parallel decoding sequences for the dataset you're evaluating the model against. For example, some of the multiple choice questions in ARC have 5 possible answer which will cause your "task N does not fit in the context window" error. You can add the command line parameter |
|
Thanks for flagging this @m18coppola. I had not connected the dots. Will check and close the PR if not needed |
|
Closing this PR as desired functionally already exists. For reference, see @m18coppola's comment above |
This PR adds an option to ignore the "task N does not fit in the context window" and display the offending test during ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores calculation.
For full background on the PR, please see: Squeezing Tensor Bits: the quest for smaller LLMs