Skip to content

Commit 439d61f

Browse files
authored
Set default 4-bit compression ratio to 1.0 (#815)
* Set default 4-bit compression ratio to 1.0 * udpate doc * set default ratio using default config
1 parent 01b069f commit 439d61f

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

docs/source/openvino/export.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Optional arguments:
6060
--pad-token-id PAD_TOKEN_ID
6161
This is needed by some models, for some tasks. If not provided, will attempt to use the tokenizer to guess it.
6262
--ratio RATIO A parameter used when applying 4-bit quantization to control the ratio between 4-bit and 8-bit quantization. If set to 0.8, 80% of the layers will be quantized to int4 while
63-
20% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 0.8.
63+
20% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 1.0.
6464
--sym Whether to apply symmetric quantization
6565
--group-size GROUP_SIZE
6666
The group size to use for int4 quantization. Recommended value is 128 and -1 will results in per-column quantization.

optimum/commands/export/openvino.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ def parse_args_openvino(parser: "ArgumentParser"):
102102
default=None,
103103
help=(
104104
"A parameter used when applying 4-bit quantization to control the ratio between 4-bit and 8-bit quantization. If set to 0.8, 80%% of the layers will be quantized to int4 "
105-
"while 20%% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 0.8."
105+
"while 20%% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 1.0."
106106
),
107107
)
108108
optional_group.add_argument(
@@ -277,7 +277,7 @@ def _get_default_int4_config(model_id_or_path, library_name):
277277
else:
278278
quantization_config = {
279279
"bits": 8 if is_int8 else 4,
280-
"ratio": 1 if is_int8 else (self.args.ratio or 0.8),
280+
"ratio": 1 if is_int8 else (self.args.ratio or _DEFAULT_4BIT_CONFIG["ratio"]),
281281
"sym": self.args.sym or False,
282282
"group_size": -1 if is_int8 else self.args.group_size,
283283
"all_layers": None if is_int8 else self.args.all_layers,

0 commit comments

Comments
 (0)