Skip to content

Commit ee19a7e

Browse files
Add command-line options in Windows' llm-ptq example for Gather nodes' INT4 ONNX quantization (#418)
Signed-off-by: vipandya <[email protected]>
1 parent 5b02483 commit ee19a7e

File tree

2 files changed

+17
-1
lines changed

2 files changed

+17
-1
lines changed

examples/windows/onnx_ptq/genai_llm/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@ The table below lists key command-line arguments of the ONNX PTQ example script.
5858
| `--no_position_ids` | Default: position_ids input enabled | Use this option to disable position_ids input in calibration data|
5959
| `--enable_mixed_quant` | Default: disabled mixed quant | Use this option to enable mixed precsion quantization|
6060
| `--layers_8bit` | Default: None | Use this option to Overrides default mixed quant strategy|
61+
| `--gather_quantize_axis` | Default: None | Use this option to enable INT4 quantization of Gather nodes - choose 0 or 1|
62+
| `--gather_block_size` | Default: 32 | Block-size for Gather node's INT4 quantization (when its enabled using gather_quantize_axis option)|
6163

6264
Run the following command to view all available parameters in the script:
6365

examples/windows/onnx_ptq/genai_llm/quantize.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,8 @@ def main(args):
441441
awqclip_bsz_col=args.awqclip_bsz_col,
442442
enable_mixed_quant=args.enable_mixed_quant,
443443
layers_8bit=args.layers_8bit,
444+
gather_block_size=args.gather_block_size,
445+
gather_quantize_axis=args.gather_quantize_axis,
444446
)
445447
logging.info(f"\nQuantization process took {time.time() - t} seconds")
446448

@@ -553,7 +555,19 @@ def main(args):
553555
"--block_size",
554556
type=int,
555557
default=128,
556-
help="Block size for AWQ quantization",
558+
help="Block size for INT4 quantization of MatMul/Gemm nodes",
559+
)
560+
parser.add_argument(
561+
"--gather_block_size",
562+
type=int,
563+
default=32,
564+
help="Block size for INT4 quantization of Gather nodes",
565+
)
566+
parser.add_argument(
567+
"--gather_quantize_axis",
568+
type=int,
569+
default=None,
570+
help="Quantization axis for INT4 quantization of Gather nodes",
557571
)
558572
parser.add_argument(
559573
"--use_zero_point",

0 commit comments

Comments
 (0)