Skip to content

Commit 5b3ddff

Browse files
authored
Fix kv cache issue (#1797)
SUMMARY: With the newest transformer change, `test_kv_cache_gptq_model_state_dict_attr` is failing because it's initializing empty weights on meta device and attempting to decompress on meta device. I don't think this is the expected usage. When model_decompress is called, the weights should finish being loaded already. TEST PLAN: tested locally with the following command and passed: pytest test tests/llmcompressor/transformers/kv_cache/test_kv_cache.py::test_kv_cache_gptq_model_state_dict_attr --------- Signed-off-by: shanjiaz <[email protected]>
1 parent 983bcc6 commit 5b3ddff

File tree

1 file changed

+4
-8
lines changed

1 file changed

+4
-8
lines changed

tests/llmcompressor/transformers/kv_cache/test_kv_cache.py

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -231,14 +231,10 @@ def test_kv_cache_gptq_model_state_dict_attr(kv_cache_fixture, tmp_path):
231231

232232
output_dir, _ = next(kv_cache_fixture(recipe, tmp_path))
233233

234-
with init_empty_weights():
235-
# TODO: There is a bug in `apply_quantization_config` which means that, if using
236-
# CompressedLinears, the compression status is inferred to `compressed` and
237-
# therefore the attention kvcache parameters never undergo initializations
238-
model = AutoModelForCausalLM.from_pretrained(
239-
output_dir,
240-
quantization_config=CompressedTensorsConfig(run_compressed=False),
241-
)
234+
model = AutoModelForCausalLM.from_pretrained(
235+
output_dir,
236+
quantization_config=CompressedTensorsConfig(run_compressed=False),
237+
)
242238

243239
counts = 0
244240
for name, submodule in model.named_modules():

0 commit comments

Comments
 (0)