Skip to content

Commit 6d7a103

Browse files
committed
Turn off 2:4 sparse compression until supported in vllm
1 parent e48d9db commit 6d7a103

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,5 +116,7 @@ def get_recipe(fp8_enabled):
116116
print("==========================================\n")
117117

118118
# Save compressed model and tokenizer
119-
model.save_pretrained(save_dir, save_compressed=args.fp8)
119+
model.save_pretrained(
120+
save_dir, save_compressed=args.fp8, disable_sparse_compression=True
121+
)
120122
tokenizer.save_pretrained(save_dir)

0 commit comments

Comments
 (0)