Commit b06bf56
authored
Improve stability of flaky perplexity test (#1884)
SUMMARY:
`tests/llmcompressor/transformers/compression/test_quantization.py:test_perplexity`
is currently flaky, with the test [occasionally
failing](https://github.com/neuralmagic/llm-compressor-testing/actions/runs/17994161150/job/51234264145)
due to the recorded `avg_ppl` exceeding the test threshold.
Through debugging, it seems like most of the high perplexity samples are
samples where most of the target labels are not trained (i.e. set to
`-100`). This makes the loss calculation averaging over the remaining
tokens more volatile and can result in high perplexity values recorded.
To correct this, I added a check that filters out samples where less
than `25%` of the tokens have training labels. This should make the
perplexity calculation more consistent, while still testing the model's
perplexities are reasonable.
TEST PLAN:
Ran the test locally and all cases passed. Although this is a flaky
test, so that doesn't guarantee this has solved the problem.
---------
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>1 parent 8f327a6 commit b06bf56
File tree
1 file changed
+11
-7
lines changed- tests/llmcompressor/transformers/compression
1 file changed
+11
-7
lines changedLines changed: 11 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
148 | | - | |
149 | | - | |
150 | | - | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
151 | 151 | | |
152 | | - | |
153 | | - | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
154 | 157 | | |
| 158 | + | |
155 | 159 | | |
156 | | - | |
| 160 | + | |
157 | 161 | | |
158 | | - | |
| 162 | + | |
159 | 163 | | |
0 commit comments