Skip to content

Commit 30fb095

Browse files
authored
[Minor] Add more detailed explanation on quantization argument (#2145)
1 parent 3a765bd commit 30fb095

File tree

2 files changed

+10
-4
lines changed

2 files changed

+10
-4
lines changed

vllm/engine/arg_utils.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,12 @@ def add_cli_args(
183183
type=str,
184184
choices=['awq', 'gptq', 'squeezellm', None],
185185
default=None,
186-
help='Method used to quantize the weights')
186+
help='Method used to quantize the weights. If '
187+
'None, we first check the `quantization_config` '
188+
'attribute in the model config file. If that is '
189+
'None, we assume the model weights are not '
190+
'quantized and use `dtype` to determine the data '
191+
'type of the weights.')
187192
parser.add_argument('--enforce-eager',
188193
action='store_true',
189194
help='Always use eager-mode PyTorch. If False, '

vllm/entrypoints/llm.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,10 @@ class LLM:
3838
However, if the `torch_dtype` in the config is `float32`, we will
3939
use `float16` instead.
4040
quantization: The method used to quantize the model weights. Currently,
41-
we support "awq", "gptq" and "squeezellm". If None, we assume the
42-
model weights are not quantized and use `dtype` to determine the
43-
data type of the weights.
41+
we support "awq", "gptq" and "squeezellm". If None, we first check
42+
the `quantization_config` attribute in the model config file. If
43+
that is None, we assume the model weights are not quantized and use
44+
`dtype` to determine the data type of the weights.
4445
revision: The specific model version to use. It can be a branch name,
4546
a tag name, or a commit id.
4647
tokenizer_revision: The specific tokenizer version to use. It can be a

0 commit comments

Comments
 (0)