-
Notifications
You must be signed in to change notification settings - Fork 722
Manually apply 4bit weight packing #7274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7274
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 1be39d1 with merge base de74961 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D67051119 |
Summary: ## Context Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the `groupwise_affine_quantize_tensor` utility function from `torchao` was recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur. To fix, just have the `VkInt4WeightOnlyQuantizer` perform that step itself. Reviewed By: jorgep31415 Differential Revision: D67051119
be122f3 to
afaf771
Compare
Summary: ## Context Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the `groupwise_affine_quantize_tensor` utility function from `torchao` was recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur. To fix, just have the `VkInt4WeightOnlyQuantizer` perform that step itself. Reviewed By: jorgep31415 Differential Revision: D67051119
afaf771 to
654d8a6
Compare
|
This pull request was exported from Phabricator. Differential Revision: D67051119 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D67051119 |
Summary: ## Context Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the `groupwise_affine_quantize_tensor` utility function from `torchao` was recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur. To fix, just have the `VkInt4WeightOnlyQuantizer` perform that step itself. Reviewed By: jorgep31415 Differential Revision: D67051119
654d8a6 to
187c22d
Compare
|
This pull request was exported from Phabricator. Differential Revision: D67051119 |
Summary: ## Context Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the `groupwise_affine_quantize_tensor` utility function from `torchao` was recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur. To fix, just have the `VkInt4WeightOnlyQuantizer` perform that step itself. Reviewed By: jorgep31415 Differential Revision: D67051119
187c22d to
1be39d1
Compare
|
This pull request was exported from Phabricator. Differential Revision: D67051119 |
Summary:
Context
Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the
groupwise_affine_quantize_tensorutility function fromtorchaowas recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur.To fix, just have the
VkInt4WeightOnlyQuantizerperform that step itself.Differential Revision: D67051119