-
Notifications
You must be signed in to change notification settings - Fork 248
Implement the AO API in torchchat quantization handlers and unify logic. #1291
Conversation
Implement the AO API in torchchat quantization handlers and unify logic. 1 - implement .quantize() for TC quantization handlers and support args to make consistent with AO 2 - remove special handling for various combinations of parameters and use validate_args before calling with **q_kwargs 3 - remove check probing whether we successfully loaded a8wx and install an error-reporting handler if loading failed which will be called as quant handler and issue an error 4 - unify both tc and ao quantization handler dicts with shared calling logic
Added comment, and a missing self parameter
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1291
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6926b9c with merge base 95ebcb8 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Fix typo (func -> q.__init__)
Fix arg order (args with default after args w/o default)
| torchao_experimental_quant_api_spec.loader.exec_module(torchao_experimental_quant_api) | ||
| from torchao_experimental_quant_api import Int8DynActIntxWeightQuantizer | ||
| ao_quantizer_class_dict["linear:a8wxdq"] = Int8DynActIntxWeightQuantizer | ||
| quantizer_class_dict["linear:a8wxdq"] = Int8DynActIntxWeightQuantizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we can't put this inside of the quantizer_class_dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems cleaner to try import Int8DynActIntxWeightQuantizer first and fallback to ErrorHandler, then assign this handler to linear:a8wxdq in quantizer_class_dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems cleaner to try import
Int8DynActIntxWeightQuantizerfirst and fallback toErrorHandler, then assign this handler tolinear:a8wxdqinquantizer_class_dict.
This is what the code is doing now. Try to import, and if the import fails, set up the error handler.
Or were you thinking to do a wrapper that imports the class Int8DynActIntxWeightQuantizer and then calls the error if it fails, and the imported method if the import succeeds? I assumed that all the conditional import will go away soonish since we should know what version of AO we pin, and whether it has the new class. (And that enablement is there, it doesn't disappear again, so that we can just do a simple import in the future.)
.... Because init can't return an alternate class, the wrapper would have to redispatch all methods internally, which isn't a big deal per se, but may add readability concerns? I'm happy to go either way, ideally as a follow-on.
LMK what you think the best long-term trajectory is for this functionality, and we'll align the code to that. (I think the current version is preferable to the previous state, b/c we don't have to special case in the dispatch loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if this can go away soon I don't have a strong opinion and can live with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanup! Just one comment and please make sure CI jobs are passing
|
My pleasure. Sorry about that error, should be fixed now. |
larryliu0820
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is passing
Implement the AO API in torchchat quantization handlers and unify logic.
1 - implement .quantize() for TC quantization handlers and support args to make consistent with AO
2 - remove special handling for various combinations of parameters and use validate_args before calling with **q_kwargs
3 - remove check probing whether we successfully loaded a8wx and install an error-reporting handler if loading failed which will be called as quant handler and issue an error
4 - unify both tc and ao quantization handler dicts with shared calling logic
5 - provide informational message when quantizer option not supported (via introspection)