You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
printf(" model-f32.gguf [model-quant.gguf] type [nthreads]\n\n");
123
-
printf(" --allow-requantize: Allows requantizing tensors that have already been quantized. Warning: This can severely reduce quality compared to quantizing from 16bit or 32bit\n");
124
-
printf(" --leave-output-tensor: Will leave output.weight un(re)quantized. Increases model size but may also increase quality, especially when requantizing\n");
125
-
printf(" --pure: Disable k-quant mixtures and quantize all tensors to the same type\n");
123
+
printf(" --allow-requantize: allows requantizing tensors that have already been quantized. Warning: This can severely reduce quality compared to quantizing from 16bit or 32bit\n");
124
+
printf(" --leave-output-tensor: will leave output.weight un(re)quantized. Increases model size but may also increase quality, especially when requantizing\n");
125
+
printf(" --pure: disable k-quant mixtures and quantize all tensors to the same type\n");
126
126
printf(" --imatrix file_name: use data in file_name as importance matrix for quant optimizations\n");
127
127
printf(" --include-weights tensor_name: use importance matrix for this/these tensor(s)\n");
128
128
printf(" --exclude-weights tensor_name: use importance matrix for this/these tensor(s)\n");
0 commit comments