Commit dce5e81
authored
Implement
## Purpose ##
* Abstract functionality which allows modifiers to act as quantization
configs into a mixin called `QuantizationMixin`
* This gives #1279 an interface to properly infer which pipeline to use
based on the recipe (if a recipe contains modifiers requires
calibration, then use the "basic" or "sequential" pipelines)
* This enables future modifiers to act as quantization modifiers (in the
same way that GPTQ does now)
* Related to #1354 where previous logic would attempt to add a
QuantizedKVCache for dynamic kv_quant
## Changes ##
* Implement `QuantizationMixin` which implements five public methods
* Lifecycle methods
* `initialize_quantization` is used to apply a config and attach
observers to a model
* quantization is disabled so that modules aren't quantized before
they're calibrated
* `start_calibration` is used to initialize calibration hooks and status
* quantization is enabled, since we currently quantize as we calibrate,
although this decision is somewhat arbitrary
* `end_calibration` is used to remove calibration hooks and apply the
frozen status
* quantization remains enabled, since we want future forward passes to
simulate quantization
* Recipe-related methods
* `has_config` returns true if a config was specified, used for checking
against duplicate configs in the recipe
* `resolve_quantization_config` returns the quantization config
specified by the modifier fields
* `QuantizationModifier` inherits from `QuantizationMixin`
* `GPTQModifier` inherits from `QuantizationMixin`
* Unlike QMod, GPTQ disables quantization during calibration. As noted
before, this is a somewhat arbitrary choice but one which matches the
current implementation
* Calibration utils
* Replace `set_unset_kv_cache` with `initialize_quantized_kv_cache` and
`freeze_module_quantization`
* Treat the `QuantizedKVCache` as analogous to another observer
* Pull setting the calibration status out of`update_weight_zp_scale`
* This better matches the lifecycle detailed in `QuantizationMixin`
description
* Implement `reset_quantization_status` which is used to remove any
existing quantization configs before the current config is applied by
`initialize_quantization`
## Remove Support ##
* Removing support for recipe with multiple quantization modifiers
active at the same time (a check for this will be added by #1279)
* Remove `num_calibration_steps`, `quantize`,
`disable_quantization_observer_epoch` and `min_tokens_per_module`
* `num_calibration_steps` is already controlled by
https://github.com/vllm-project/llm-compressor/blob/42b62f5283d0234b26623fe1f1bf02a77c6e4019/src/llmcompressor/datasets/utils.py#L106
* `quantize` was implemented as a workaround for GPTQ's modifier
builder. Similar functionality may be require to support SpinQuant +
GPTQ, but such functionality should exist at a higher level
* `disable_quantization_observer_epoch` seems to implement functionality
where a model's observers are removed but quantization remains active.
This functionality is maintained by setting an "end" epoch for qmod
* `min_tokens_per_module` requires that the modifier have references to
the calibration dataset, which is disallowed by #1279. This information
is already printed in GPTQ's logs. If research still wants this tool
specifically for `QuantizationModifier`, then it can be reimplemented to
avoid using references to the calibration dataset
## Testing ##
* Updated tests to reflect new mixin
* Ran a set of GPTQ and QuantizationModifier examples to completion
* CI tests pass
---------
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>QuantizationMixin (#1351)1 parent e168e3a commit dce5e81
File tree
8 files changed
+436
-582
lines changed- src/llmcompressor/modifiers/quantization
- gptq
- quantization
- tests/llmcompressor
- modifiers
- calibration
- quantization
- pytorch/modifiers/pruning/sparsegpt
8 files changed
+436
-582
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | | - | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | | - | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| |||
49 | 51 | | |
50 | 52 | | |
51 | 53 | | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | 54 | | |
57 | 55 | | |
58 | 56 | | |
| |||
102 | 100 | | |
103 | 101 | | |
104 | 102 | | |
105 | | - | |
106 | | - | |
| 103 | + | |
107 | 104 | | |
108 | 105 | | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
| 106 | + | |
114 | 107 | | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
| 108 | + | |
119 | 109 | | |
120 | 110 | | |
121 | | - | |
122 | | - | |
123 | | - | |
| 111 | + | |
124 | 112 | | |
125 | 113 | | |
126 | 114 | | |
| |||
200 | 188 | | |
201 | 189 | | |
202 | 190 | | |
203 | | - | |
| 191 | + | |
204 | 192 | | |
205 | | - | |
206 | | - | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
207 | 198 | | |
208 | | - | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
209 | 207 | | |
210 | 208 | | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
| 209 | + | |
| 210 | + | |
218 | 211 | | |
219 | 212 | | |
220 | 213 | | |
| |||
242 | 235 | | |
243 | 236 | | |
244 | 237 | | |
| 238 | + | |
245 | 239 | | |
246 | 240 | | |
247 | 241 | | |
248 | 242 | | |
249 | 243 | | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
250 | 249 | | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
0 commit comments