File tree Expand file tree Collapse file tree 2 files changed +24
-18
lines changed
src/llmcompressor/transformers/compression Expand file tree Collapse file tree 2 files changed +24
-18
lines changed Original file line number Diff line number Diff line change
1
+ # Compression Formats
2
+
3
+ The following table outlines the possible quantization and sparsity
4
+ compression formats that are applied to a model during compression.
5
+ The formats are determined according to the quantization scheme and
6
+ sparsity type. For more details on the quantization schemes, see
7
+ ` guides/compression_schemes.md ` .
8
+
9
+
10
+ | Quantization | Sparsity | Quant Compressor | Sparsity Compressor |
11
+ | ---------------| ----------| ----------------------| ---------------------|
12
+ | W8A8 - int | None | int_quantized | Dense |
13
+ | W8A8 - float | None | float_quantized | Dense |
14
+ | W4A16 - float | None | nvfp4_pack_quantized | Dense |
15
+ | W4A4 - float | None | nvfp4_pack_quantized | Dense |
16
+ | W4A16 - int | None | pack_quantized | Dense |
17
+ | W8A16 - int | None | pack_quantized | Dense |
18
+ | W8A16 - float | None | naive_quantized | Dense |
19
+ | W8A8 - int | 2:4 | int_quantized | Sparse24 |
20
+ | W8A8 - float | 2:4 | float_quantized | Sparse24 |
21
+ | W4A16 - int | 2:4 | marlin_24 | Dense |
22
+ | W8A16 - int | 2:4 | marlin_24 | Dense |
23
+ | W8A16 - float | 2:4 | naive_quantized | Dense |
Original file line number Diff line number Diff line change @@ -18,24 +18,7 @@ def infer_quantization_format(
18
18
Infers the quantization format for a model based on its state and provided
19
19
compression arguments.
20
20
21
- The following table outlines the possible quantization and sparsity formats
22
- along with their corresponding compressor formats:
23
-
24
- +---------------+----------+----------------------+---------------------+
25
- | Quantization | Sparsity | Quant Compressor | Sparsity Compressor |
26
- | | | Format | Format |
27
- +---------------+----------+----------------------+---------------------+
28
- | W8A8 - int | None | int_quantized | Dense |
29
- | W8A8 - float | None | float_quantized | Dense |
30
- | W4A16 - int | None | pack_quantized | Dense |
31
- | W8A16 - int | None | pack_quantized | Dense |
32
- | W8A16 - float | None | naive_quantized | Dense |
33
- | W8A8 - int | 2:4 | int_quantized | Sparse24 |
34
- | W8A8 - float | 2:4 | float_quantized | Sparse24 |
35
- | W4A16 - int | 2:4 | marlin_24 | Dense |
36
- | W8A16 - int | 2:4 | marlin_24 | Dense |
37
- | W8A16 - float | 2:4 | naive_quantized | Dense |
38
- +---------------+----------+----------------------+---------------------+
21
+ For a summary of the formats, see `docs/guides/compression_formats.md`.
39
22
40
23
:param model: model to check for quantization, if the model is not quantized no
41
24
quantization format is returned
You can’t perform that action at this time.
0 commit comments