Commit f54496a
committed
Fix vLLM slow test OOM by reducing GPU memory utilization and improving cleanup
The vLLM slow tests were failing with OOM errors when running after
accelerate tests. The issue was:
1. vLLM V1 engine requires a specific amount of free GPU memory at startup
2. After accelerate tests, only 5.89 GiB was free (out of 14.74 GiB)
3. vLLM with gpu_memory_utilization=0.6 wanted 8.84 GiB
Fixes:
- Reduce gpu_memory_utilization from 0.6 to 0.35 in test config (needs 5.16 GiB)
- Add GPU memory cleanup fixture in conftest.py that runs before/after slow tests
- Improve AsyncVLLMModel.cleanup() to properly delete model object
The gpu_memory_utilization parameter only affects KV cache allocation and
does not impact model outputs with temperature=0.0, so this change is safe.1 parent e438e2d commit f54496a
File tree
3 files changed
+34
-1
lines changed- examples/model_configs
- src/lighteval/models/vllm
- tests
3 files changed
+34
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
544 | 544 | | |
545 | 545 | | |
546 | 546 | | |
| 547 | + | |
| 548 | + | |
547 | 549 | | |
548 | 550 | | |
549 | 551 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
5 | 7 | | |
6 | 8 | | |
7 | 9 | | |
| |||
21 | 23 | | |
22 | 24 | | |
23 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
0 commit comments