Skip to content

Commit 597bc15

Browse files
committed
llama.cpp : fix --leave-output-tensor for llama-quantize.
* Tweaked llama-quantize's --leave-output-tensor parameter's impact on llama_model_quantize_internal() to exclude any tensor named "*output.weight" instead of just "output.weight".
1 parent 7eee341 commit 597bc15

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

src/llama.cpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18512,7 +18512,10 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
1851218512
// do not quantize norm tensors
1851318513
quantize &= name.find("_norm.weight") == std::string::npos;
1851418514

18515-
quantize &= params->quantize_output_tensor || name != "output.weight";
18515+
// While there's an effort to avoid hardcoded tensor names,
18516+
// --leave-output-tensor should still exclude any tensor named
18517+
// *output.weight instead of just output.weight.
18518+
quantize &= params->quantize_output_tensor || (name.find("output.weight") == std::string::npos);
1851618519
quantize &= !params->only_copy;
1851718520

1851818521
// do not quantize expert gating tensors

0 commit comments

Comments
 (0)