NVIDIA · jaycoolslm · Dec 6, 2025 · Dec 6, 2025 · Dec 27, 2025
@@ -67,7 +67,7 @@ Checkpoint saved in `output_dir` can be directly passed to `trtllm-build`.
     - int8_wo: Actually nothing is applied to weights. Weights are quantized to INT8 channel wise when TRTLLM building the engine.
     - int4_wo: Same as int8_wo but in INT4.
     - full_prec: No quantization.
-- autoq_format: Specific quantization algorithms are searched in auto quantization. The algorithm must in ['fp8', 'int4_awq', 'w4a8_awq', 'int8_sq'] and you can use ',' to separate more than one quantization algorithms, such as `--autoq_format fp8,int4_awq,w4a8_awq`. Please attention that using int8_sq and fp8 together is not supported.
+- autoq_format: Specific quantization algorithms are searched in auto quantization. The algorithm must be in ['fp8', 'int4_awq', 'w4a8_awq', 'int8_sq'] and you can use ',' to separate more than one quantization algorithms, such as `--autoq_format fp8,int4_awq,w4a8_awq`. Please note that using int8_sq and fp8 together is not supported.
 - auto_quantize_bits: Effective bits constraint for auto quantization. If not set, regular quantization without auto quantization search is applied. Note: it must be set within correct range otherwise it will be set by lowest value if possible. For example, the weights of LLMs have 16 bits defaultly and it results in a weight compression rate of 40% if we set `auto_quantize_bits` to 9.6 (9.6 / 16 = 0.6), which means the average bits of the weights are 9.6 but not 16. However, which format to choose is determined by solving an optimization problem, so you need to generate the according checkpoint manually if you want to customize your checkpoint formats. The format of mixed precision checkpoint is described in detail below.
 - output_dir: Path to save the quantized checkpoint.
 - dtype: Specify data type of model when loading from Hugging Face.

@@ -1,7 +1,7 @@
 -c ../constraints.txt
 tensorrt_llm>=0.0.0.dev0
 datasets==3.1.0
-nemo-toolkit[all]==2.0.0rc1
+nemo-toolkit==2.0.0rc1
 rouge_score
 transformers_stream_generator==0.0.4
 tiktoken