-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
Background
Right now, quantization configs are serialized through the following lifecycle:
apply_quantization_config
is used to attachquantization_scheme
attributes to modules- The model undergoes calibration and compression
- The quantization config is regenerated from the model using
QuantizationConifig.from_pretrained
- The new config is serialized by
ModelCompressor.update_config
This approach has some downsides (see phi3 example config)
- Any config group names set by the user are discarded
- The config groups which are generated do not necessarily match the config groups set by the user
- The ignore list becomes very large and ugly to read
- The logic for generating a config from a model is very difficult to maintain
The scope of this issue to investigate an approach whereby step (1) attaches the config as a quantization_config
attribute on the model, which is then read by step (4) without having to go through step (3). This would mitigate all of the above downsides.
! Some things to keep in mind
apply_quantization_config
may be applied multiple times. This may necessitate some logic to "merge" quantization configs. This has been written as a draft already, feel free to ping @kylesayrs if you would like to leverage this, or feel free to use your own.
Metadata
Metadata
Assignees
Labels
No labels