You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| ✅ Quantized Model Loading | Load any HuggingFace model in 4-bit or 8-bit precision with customizable quantization settings |
24
+
| ✅ Quantized Model Loading | Load HuggingFace models with various quantization techniques (including AWQ, GPTQ, GGUF) in 4-bit or 8-bit precision, featuring customizable settings.|
25
25
| ✅ Advanced Dataset Management | Load, preprocess, and split datasets with flexible configurations |
- `percdamp (float)`: Dampening percentage for Hessian update. Default: 0.01.
@@ -260,10 +263,15 @@ AWQ adapts quantization based on activation patterns.
260
263
:inherited-members:
261
264
:undoc-members:
262
265
266
+
**Inference with AWQ Quantized Models:** Models quantized using `AWQQuantizer` (or via the high-level API with the 'awq' method) are returned as standard Hugging Face `PreTrainedModel` instances. The quantization is handled transparently by the custom `QuantizedLinear` layers. Therefore, inference can be performed using the usual methods like `.generate()` or by directly calling the model, with no special steps required for AWQ-quantized layers.
267
+
263
268
**Key `__init__` Parameters for `AWQQuantizer`:**
264
-
- ``group_size (int)``: Group size for quantization.
265
-
- ``zero_point (bool)``: Enable zero-point computation for activations.
266
-
- ``version (str)``: AWQ algorithm version.
269
+
- ``group_size (int)``: Size of the quantization group. Default: 128.
270
+
- ``zero_point (bool)``: Whether to use zero-point quantization for activations. Default: True.
0 commit comments