Is your feature request related to a problem? Please describe.
device and dtype are special kwargs that should be passed along to quantizers from the layers.
Our quantizers are currently based on prefixes, but we should have a special case for device and dtype so that the user should not manually add weight_device and input_quant_device and so on.
Even if we were to treat device and dtype separately, there's another issue related to how we apply quantization, and the state dict.
We put everything on the meta device, and then loading the state dict will take care of the rest.
For quantization parameters, where no state dict is loaded, they are stuck on meta device unless handled properly.
If we can correctly propagate these params, this would solve some other annoying issues around re-init of quant_tensor.
Is your feature request related to a problem? Please describe.
deviceanddtypeare special kwargs that should be passed along to quantizers from the layers.Our quantizers are currently based on prefixes, but we should have a special case for
deviceanddtypeso that the user should not manually addweight_deviceandinput_quant_deviceand so on.Even if we were to treat
deviceanddtypeseparately, there's another issue related to how we apply quantization, and the state dict.We put everything on the meta device, and then loading the state dict will take care of the rest.
For quantization parameters, where no state dict is loaded, they are stuck on
metadevice unless handled properly.If we can correctly propagate these params, this would solve some other annoying issues around re-init of
quant_tensor.