Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/quantization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Checkpoint saved in `output_dir` can be directly passed to `trtllm-build`.
- int8_wo: Actually nothing is applied to weights. Weights are quantized to INT8 channel wise when TRTLLM building the engine.
- int4_wo: Same as int8_wo but in INT4.
- full_prec: No quantization.
- autoq_format: Specific quantization algorithms are searched in auto quantization. The algorithm must in ['fp8', 'int4_awq', 'w4a8_awq', 'int8_sq'] and you can use ',' to separate more than one quantization algorithms, such as `--autoq_format fp8,int4_awq,w4a8_awq`. Please attention that using int8_sq and fp8 together is not supported.
- autoq_format: Specific quantization algorithms are searched in auto quantization. The algorithm must be in ['fp8', 'int4_awq', 'w4a8_awq', 'int8_sq'] and you can use ',' to separate more than one quantization algorithms, such as `--autoq_format fp8,int4_awq,w4a8_awq`. Please note that using int8_sq and fp8 together is not supported.
- auto_quantize_bits: Effective bits constraint for auto quantization. If not set, regular quantization without auto quantization search is applied. Note: it must be set within correct range otherwise it will be set by lowest value if possible. For example, the weights of LLMs have 16 bits defaultly and it results in a weight compression rate of 40% if we set `auto_quantize_bits` to 9.6 (9.6 / 16 = 0.6), which means the average bits of the weights are 9.6 but not 16. However, which format to choose is determined by solving an optimization problem, so you need to generate the according checkpoint manually if you want to customize your checkpoint formats. The format of mixed precision checkpoint is described in detail below.
- output_dir: Path to save the quantized checkpoint.
- dtype: Specify data type of model when loading from Hugging Face.
Expand Down
2 changes: 1 addition & 1 deletion examples/quantization/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
-c ../constraints.txt
tensorrt_llm>=0.0.0.dev0
datasets==3.1.0
nemo-toolkit[all]==2.0.0rc1
nemo-toolkit==2.0.0rc1
rouge_score
transformers_stream_generator==0.0.4
tiktoken
Expand Down