Save and load models trained with QuantizationAwareTraining() #16357
Unanswered
w2ex
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I started playing with the QuantizationAwareTraining() callback, which seems quite useful for deploying lite versions of our models on CPU.
However, I don't quite get totally how I am supposed to use it.
At the moment, I run my pl.Trainer with the callbacks ModelCheckpoint() to save the best steps, and QuantizationAwareTraining().
Once my training is completed, I usually load the best steps saved with ModelCheckpoint() in order to save them with onnx.
When I do this, I get a mistake if I use model.load_from_checkpoint(checkpoint_path=...), because the checkpoints also saves the fake_quant layers :
Unexpected key(s) in state_dict: "input_layer.0.weight_fake_quant.fake_quant_enabled"...
So I use the
strict=False
flag to ignore these layers, save the not-quantized model to onnx and then quantize it withonnxruntime.quantization.quantize_static(...)
.Now, that seems sub-optimal and not the proper way to do it. Seems like I lose all the histogram values saved in the checkpoints that could be useful for the conversion. The on_fit_end() hook seems to quantify the model when it stopped training but I don't seem to profit from that (is it because I interrupt the Trainer before waiting for the epochs to complete, hence it is never called ?).
If I would be willing to save the full model, with the fake quantization layers in order to be able to run it on GPUs (so that I can evaluate it on large datasets while getting an idea of the real performances of the quantized model), should I call the
_prepare_model()
from QuantizationAwareTraining() to the model before loading the checkpoint ?I don't know if all of this is clear, but to make it simpler : when training a model with QuantizationAwareTraining(), how can I save and load the model with the fake_quantization layers so that I can evaluate it on GPU ?
Thank you very much :)
Beta Was this translation helpful? Give feedback.
All reactions