-
Notifications
You must be signed in to change notification settings - Fork 688
Add xnnpack pass to propagate custom meta field to q/dq nodes #14864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14864
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 3 New Failures, 3 Unrelated FailuresAs of commit db612db with merge base d00279d ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
fa97c1b
to
bc8f182
Compare
bc8f182
to
9ef58fe
Compare
return self.converted_graph.forward(*inputs) | ||
|
||
class Quantize_(Stage): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GregoryComer let me know if I should move this into the test class, instead of the test harness, or rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I like it here.
9ef58fe
to
f7515a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tester stage! I don't really have a better idea on the name, and it's not part of the public API surface, so I'm good with these changes.
exec = tester.get_artifact() | ||
program_buffer = exec.buffer | ||
self.assertEqual(len(exec._tensor_data), 1) | ||
data_buffer = bytes(exec._tensor_data.pop("model")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to assert the size of it? Just to make sure it is indeed the quantized weight tensor. Also I would like (somehow) to validate that we also didn't put this in the blob, perhaps by asserting that the blob size is < weight_size (if we have large-ish weights).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I'll add a check on size.
Re validating that it's not in the blob, we can check that forward fails when we do not pass in the data buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@digantdesai added a check on size, and check on accuracy.
Verified locally that we're missing the weight if we do not pass in the data buffer. The test segfaults after ~4 runs though, maybe something isn't cleaned up properly in pybindings. Left that as a todo for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lucylq. I hope we have quantized LoRA working OK with XNNPACK now?
@digantdesai Yes! Thanks a lot for the help. There were a few issues with export_llama and emitter, but I was able to generate quantized lora files. They work well on desktop, need to debug load issues on android. |
714f81a
to
0c08ac7
Compare
0c08ac7
to
db612db
Compare
### Summary Enable quantization with program-data separation. To select weights for separation, we tag nodes on the eager model. After quantization, qdq nodes are generated. These do not contain the external tags that their inputs have. This PR propagates the tags to the qdq nodes, so that quant weights are moved to external file and can be shared. ### Test plan ``` python -m unittest executorch.backends.xnnpack.test.passes.test_propagate_custom_meta_pass ```
Summary
Enable quantization with program-data separation.
To select weights for separation, we tag nodes on the eager model. After quantization, qdq nodes are generated. These do not contain the external tags that their inputs have. This PR propagates the tags to the qdq nodes, so that quant weights are moved to external file and can be shared.
Test plan