You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**`scheme` (str|dict|AutoScheme)**: The predefined quantization keys, e.g. `W4A16`, `MXFP4`, `NVFP4`, `GGUF:Q4_K_M`.
192
+
-**`scheme` (str|dict|AutoScheme)**: The predefined quantization keys, e.g. `W4A16`, `MXFP4`, `NVFP4`, `GGUF:Q4_K_M`. For MXFP4/NVFP4, we recommend exporting to LLM-Compressor format.
193
193
-**`bits` (int)**: Number of bits for quantization (default is `None`). If not None, it will override the scheme setting.
194
194
-**`group_size` (int)**: Size of the quantization group (default is `None`). If not None, it will override the scheme setting.
195
195
-**`sym` (bool)**: Whether to use symmetric quantization (default is `None`). If not None, it will override the scheme setting.
196
-
-**`layer_config` (dict)**: Configuration for weight quantization (default is `None`), mainly for mixed schemes.
196
+
-**`layer_config` (dict)**: Configuration for layer_wise scheme (default is `None`), mainly for customized mixed schemes.
197
197
198
198
##### Algorithm Settings
199
-
-**`enable_alg_ext` (bool)**: [Experimental Feature] Enable algorithm variants for specific schemes (e.g., MXFP4/W2A16) that could bring notable improvements. Default is `False`.
199
+
-**`enable_alg_ext` (bool)**: [Experimental Feature]Only for `iters>0`. Enable algorithm variants for specific schemes (e.g., MXFP4/W2A16) that could bring notable improvements. Default is `False`.
200
200
-**`disable_opt_rtn` (bool)**: Use pure RTN mode for specific schemes (e.g., GGUF and WOQ). Default is `False` (improved RTN enabled).
AutoScheme provide automatically algorithm to provide mixed bits/data_type quantization recipes. For some accuracy result, please refer to this [doc](https://github.com/intel/auto-round/blob/main/docs/auto_scheme_acc.md).
221
222
Please refer to the [user guide](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme) for more details on AutoScheme.
222
223
~~~python
223
224
from auto_round import AutoRound, AutoScheme
@@ -299,7 +300,7 @@ for output in outputs:
299
300
300
301
301
302
### SGLang (Intel GPU/CUDA)
302
-
Please note that support for the MoE models and visual language models is currently limited.
303
+
**Please note that support for the MoE models and visual language models is currently limited.**
0 commit comments