-
Notifications
You must be signed in to change notification settings - Fork 315
Update Int4PreshuffledTensor to align with implementation details of the Float8Tensor #2738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…the Float8Tensor Summary: similar to #2687, we updated Int4PreshuffledTensor to align the implementation details, also used TorchAOBaseTensor to simplify some of the implementations Note: This is just refactoring Int4PreshuffledTensor, no BC related changes in this PR Test Plan: python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2738, branch: jerryzh168/stack/20
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2738
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 12 PendingAs of commit 1073ced with merge base 7c13cde ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…the Float8Tensor Summary: similar to #2687, we updated Int4PreshuffledTensor to align the implementation details, also used TorchAOBaseTensor to simplify some of the implementations Test Plan: python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2738, branch: jerryzh168/stack/20
b45aca0
to
584723f
Compare
…the Float8Tensor Summary: similar to #2687, we updated Int4PreshuffledTensor to align the implementation details, also used TorchAOBaseTensor to simplify some of the implementations Note: This is just refactoring Int4PreshuffledTensor, no BC related changes in this PR Test Plan: python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2738, branch: jerryzh168/stack/20
584723f
to
9b72101
Compare
…the Float8Tensor Summary: similar to #2687, we updated Int4PreshuffledTensor to align the implementation details, also used TorchAOBaseTensor to simplify some of the implementations Note: This is just refactoring Int4PreshuffledTensor, no BC related changes in this PR Test Plan: python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2738, branch: jerryzh168/stack/20
9b72101
to
986ce18
Compare
4edb547
to
2005be0
Compare
…the Float8Tensor Summary: similar to #2687, we updated Int4PreshuffledTensor to align the implementation details, also used TorchAOBaseTensor to simplify some of the implementations Note: This is just refactoring Int4PreshuffledTensor, no BC related changes in this PR Test Plan: python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2738, branch: jerryzh168/stack/20
986ce18
to
c0b8f24
Compare
group_zero = weight_tensor.group_zero.contiguous() | ||
res = torch.ops.fbgemm.bf16i4bf16_shuffled( | ||
input_tensor, wq, group_scale, group_zero | ||
) | ||
else: | ||
# dynamically quantizes activation to fp8 | ||
assert weight_tensor.row_scale is not None | ||
row_scale = weight_tensor.row_scale.contiguous() | ||
xq, x_scale = quantize_fp8_row(input_tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw @vkuzo, we didn't use _choose_quant_func_and_quantize_tensor
and float8 quant args yet, but we can add that if there are multiple choices of float8 activation quant in the future. please let me know if you have different thoughts here.
c0b8f24
to
1073ced
Compare
Stacked PRs:
optional_tensor_names
in TorchAOBaseTensor #2710Update Int4PreshuffledTensor to align with implementation details of the Float8Tensor
Summary:
similar to #2687, we updated Int4PreshuffledTensor to align
the implementation details, also used TorchAOBaseTensor to simplify some of the implementations
Note: This is just refactoring Int4PreshuffledTensor, no BC related changes in this PR
Test Plan:
python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py
Reviewers:
Subscribers:
Tasks:
Tags: