Commit 5b3ddff
authored
Fix kv cache issue (vllm-project#1797)
SUMMARY:
With the newest transformer change,
`test_kv_cache_gptq_model_state_dict_attr` is failing because it's
initializing empty weights on meta device and attempting to decompress
on meta device. I don't think this is the expected usage. When
model_decompress is called, the weights should finish being loaded
already.
TEST PLAN:
tested locally with the following command and passed:
pytest test
tests/llmcompressor/transformers/kv_cache/test_kv_cache.py::test_kv_cache_gptq_model_state_dict_attr
---------
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>1 parent 983bcc6 commit 5b3ddff
1 file changed
+4
-8
lines changedLines changed: 4 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
231 | 231 | | |
232 | 232 | | |
233 | 233 | | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | | - | |
240 | | - | |
241 | | - | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
242 | 238 | | |
243 | 239 | | |
244 | 240 | | |
| |||
0 commit comments