Skip to content

Conversation

@SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Oct 14, 2024

Stack from ghstack (oldest at bottom):

Context

Fix the existing implementation of int4 weight quantized linear to conform with how the _weight_int4packed_mm op works in the ATen library.

For some additional context, the current op implementation does not actually match the behaviour of _weight_int4packed_mm. The ATen op expects that the weights have already been packed into a specific format, with inner_k_tiles as a packing parameter. The packing is accomplished via calling the _convert_weight_to_int4pack operator. Thus the current implementation in vulkan is equivalent to calling _convert_weight_to_int4pack + _weight_int4packed_mm. To address this discrepancy, the operator implementation is registered under the linear_weight_int4 custom op as of this diff.

The problems with the existing implementation were as follows:

  • The expected sizes of the scales and zeros tensor was incorrect. Previously, the sizes were assumed to be (2, N, num_groups) but the correct size is (num_groups, N, 2)
  • Previously, when unpacking a uint8_t into 2 unpacked int4 values, it was assumed that the LSB was the first value and the MSB was the second value. However, this ordering should be flipped
  • The original implementation expected the output tensor to be channels packed, but in practice we want the output tensor to be width packed

This diff addresses the above issues, and introduces a dedicated test binary to test against an equivalent reference implementation expressed with ATen functions.

Differential Revision: D64354773

## Context

Fix the existing implementation of int4 weight quantized linear to conform with how the `_weight_int4packed_mm` op works in the ATen library.

For some additional context, the current op implementation does not actually match the behaviour of `_weight_int4packed_mm`. The ATen op expects that the weights have already been packed into a specific format, with `inner_k_tiles` as a packing parameter. The packing is accomplished via calling the `_convert_weight_to_int4pack` operator. Thus the current implementation in vulkan is equivalent to calling `_convert_weight_to_int4pack` + `_weight_int4packed_mm`.  To address this discrepancy, the operator implementation is registered under the `linear_weight_int4`  custom op as of this diff.

The problems with the existing implementation were as follows:

* The expected sizes of the scales and zeros tensor was incorrect. Previously, the sizes were assumed to be `(2, N, num_groups)` but the correct size is `(num_groups, N, 2)`
* Previously, when unpacking a uint8_t into 2 unpacked int4 values, it was assumed that the LSB was the first value and the MSB was the second value. However, this ordering should be flipped
* The original implementation expected the output tensor to be channels packed, but in practice we want the output tensor to be width packed

This diff addresses the above issues, and introduces a dedicated test binary to test against an equivalent reference implementation expressed with ATen functions.

Differential Revision: [D64354773](https://our.internmc.facebook.com/intern/diff/D64354773/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6200

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 470ff32 with merge base 8673567 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2024
SS-JIA added a commit that referenced this pull request Oct 14, 2024
## Context

Fix the existing implementation of int4 weight quantized linear to conform with how the `_weight_int4packed_mm` op works in the ATen library.

For some additional context, the current op implementation does not actually match the behaviour of `_weight_int4packed_mm`. The ATen op expects that the weights have already been packed into a specific format, with `inner_k_tiles` as a packing parameter. The packing is accomplished via calling the `_convert_weight_to_int4pack` operator. Thus the current implementation in vulkan is equivalent to calling `_convert_weight_to_int4pack` + `_weight_int4packed_mm`.  To address this discrepancy, the operator implementation is registered under the `linear_weight_int4`  custom op as of this diff.

The problems with the existing implementation were as follows:

* The expected sizes of the scales and zeros tensor was incorrect. Previously, the sizes were assumed to be `(2, N, num_groups)` but the correct size is `(num_groups, N, 2)`
* Previously, when unpacking a uint8_t into 2 unpacked int4 values, it was assumed that the LSB was the first value and the MSB was the second value. However, this ordering should be flipped
* The original implementation expected the output tensor to be channels packed, but in practice we want the output tensor to be width packed

This diff addresses the above issues, and introduces a dedicated test binary to test against an equivalent reference implementation expressed with ATen functions.

Differential Revision: [D64354773](https://our.internmc.facebook.com/intern/diff/D64354773/)

ghstack-source-id: 247944123
Pull Request resolved: #6200
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64354773

## Context

Fix the existing implementation of int4 weight quantized linear to conform with how the `_weight_int4packed_mm` op works in the ATen library.

For some additional context, the current op implementation does not actually match the behaviour of `_weight_int4packed_mm`. The ATen op expects that the weights have already been packed into a specific format, with `inner_k_tiles` as a packing parameter. The packing is accomplished via calling the `_convert_weight_to_int4pack` operator. Thus the current implementation in vulkan is equivalent to calling `_convert_weight_to_int4pack` + `_weight_int4packed_mm`.  To address this discrepancy, the operator implementation is registered under the `linear_weight_int4`  custom op as of this diff.

The problems with the existing implementation were as follows:

* The expected sizes of the scales and zeros tensor was incorrect. Previously, the sizes were assumed to be `(2, N, num_groups)` but the correct size is `(num_groups, N, 2)`
* Previously, when unpacking a uint8_t into 2 unpacked int4 values, it was assumed that the LSB was the first value and the MSB was the second value. However, this ordering should be flipped
* The original implementation expected the output tensor to be channels packed, but in practice we want the output tensor to be width packed

This diff addresses the above issues, and introduces a dedicated test binary to test against an equivalent reference implementation expressed with ATen functions.

Differential Revision: [D64354773](https://our.internmc.facebook.com/intern/diff/D64354773/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64354773

SS-JIA added a commit that referenced this pull request Oct 14, 2024
Pull Request resolved: #6200

## Context

Fix the existing implementation of int4 weight quantized linear to conform with how the `_weight_int4packed_mm` op works in the ATen library.

For some additional context, the current op implementation does not actually match the behaviour of `_weight_int4packed_mm`. The ATen op expects that the weights have already been packed into a specific format, with `inner_k_tiles` as a packing parameter. The packing is accomplished via calling the `_convert_weight_to_int4pack` operator. Thus the current implementation in vulkan is equivalent to calling `_convert_weight_to_int4pack` + `_weight_int4packed_mm`.  To address this discrepancy, the operator implementation is registered under the `linear_weight_int4`  custom op as of this diff.

The problems with the existing implementation were as follows:

* The expected sizes of the scales and zeros tensor was incorrect. Previously, the sizes were assumed to be `(2, N, num_groups)` but the correct size is `(num_groups, N, 2)`
* Previously, when unpacking a uint8_t into 2 unpacked int4 values, it was assumed that the LSB was the first value and the MSB was the second value. However, this ordering should be flipped
* The original implementation expected the output tensor to be channels packed, but in practice we want the output tensor to be width packed

This diff addresses the above issues, and introduces a dedicated test binary to test against an equivalent reference implementation expressed with ATen functions.
ghstack-source-id: 247975468
@exported-using-ghexport

Differential Revision: [D64354773](https://our.internmc.facebook.com/intern/diff/D64354773/)
## Context

Fix the existing implementation of int4 weight quantized linear to conform with how the `_weight_int4packed_mm` op works in the ATen library.

For some additional context, the current op implementation does not actually match the behaviour of `_weight_int4packed_mm`. The ATen op expects that the weights have already been packed into a specific format, with `inner_k_tiles` as a packing parameter. The packing is accomplished via calling the `_convert_weight_to_int4pack` operator. Thus the current implementation in vulkan is equivalent to calling `_convert_weight_to_int4pack` + `_weight_int4packed_mm`.  To address this discrepancy, the operator implementation is registered under the `linear_weight_int4`  custom op as of this diff.

The problems with the existing implementation were as follows:

* The expected sizes of the scales and zeros tensor was incorrect. Previously, the sizes were assumed to be `(2, N, num_groups)` but the correct size is `(num_groups, N, 2)`
* Previously, when unpacking a uint8_t into 2 unpacked int4 values, it was assumed that the LSB was the first value and the MSB was the second value. However, this ordering should be flipped
* The original implementation expected the output tensor to be channels packed, but in practice we want the output tensor to be width packed

This diff addresses the above issues, and introduces a dedicated test binary to test against an equivalent reference implementation expressed with ATen functions.

Differential Revision: [D64354773](https://our.internmc.facebook.com/intern/diff/D64354773/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64354773

## Context

Fix the existing implementation of int4 weight quantized linear to conform with how the `_weight_int4packed_mm` op works in the ATen library.

For some additional context, the current op implementation does not actually match the behaviour of `_weight_int4packed_mm`. The ATen op expects that the weights have already been packed into a specific format, with `inner_k_tiles` as a packing parameter. The packing is accomplished via calling the `_convert_weight_to_int4pack` operator. Thus the current implementation in vulkan is equivalent to calling `_convert_weight_to_int4pack` + `_weight_int4packed_mm`.  To address this discrepancy, the operator implementation is registered under the `linear_weight_int4`  custom op as of this diff.

The problems with the existing implementation were as follows:

* The expected sizes of the scales and zeros tensor was incorrect. Previously, the sizes were assumed to be `(2, N, num_groups)` but the correct size is `(num_groups, N, 2)`
* Previously, when unpacking a uint8_t into 2 unpacked int4 values, it was assumed that the LSB was the first value and the MSB was the second value. However, this ordering should be flipped
* The original implementation expected the output tensor to be channels packed, but in practice we want the output tensor to be width packed

This diff addresses the above issues, and introduces a dedicated test binary to test against an equivalent reference implementation expressed with ATen functions.

Differential Revision: [D64354773](https://our.internmc.facebook.com/intern/diff/D64354773/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64354773

## Context

Fix the existing implementation of int4 weight quantized linear to conform with how the `_weight_int4packed_mm` op works in the ATen library.

For some additional context, the current op implementation does not actually match the behaviour of `_weight_int4packed_mm`. The ATen op expects that the weights have already been packed into a specific format, with `inner_k_tiles` as a packing parameter. The packing is accomplished via calling the `_convert_weight_to_int4pack` operator. Thus the current implementation in vulkan is equivalent to calling `_convert_weight_to_int4pack` + `_weight_int4packed_mm`.  To address this discrepancy, the operator implementation is registered under the `linear_weight_int4`  custom op as of this diff.

The problems with the existing implementation were as follows:

* The expected sizes of the scales and zeros tensor was incorrect. Previously, the sizes were assumed to be `(2, N, num_groups)` but the correct size is `(num_groups, N, 2)`
* Previously, when unpacking a uint8_t into 2 unpacked int4 values, it was assumed that the LSB was the first value and the MSB was the second value. However, this ordering should be flipped
* The original implementation expected the output tensor to be channels packed, but in practice we want the output tensor to be width packed

This diff addresses the above issues, and introduces a dedicated test binary to test against an equivalent reference implementation expressed with ATen functions.

Differential Revision: [D64354773](https://our.internmc.facebook.com/intern/diff/D64354773/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64354773

## Context

Fix the existing implementation of int4 weight quantized linear to conform with how the `_weight_int4packed_mm` op works in the ATen library.

For some additional context, the current op implementation does not actually match the behaviour of `_weight_int4packed_mm`. The ATen op expects that the weights have already been packed into a specific format, with `inner_k_tiles` as a packing parameter. The packing is accomplished via calling the `_convert_weight_to_int4pack` operator. Thus the current implementation in vulkan is equivalent to calling `_convert_weight_to_int4pack` + `_weight_int4packed_mm`.  To address this discrepancy, the operator implementation is registered under the `linear_weight_int4`  custom op as of this diff.

The problems with the existing implementation were as follows:

* The expected sizes of the scales and zeros tensor was incorrect. Previously, the sizes were assumed to be `(2, N, num_groups)` but the correct size is `(num_groups, N, 2)`
* Previously, when unpacking a uint8_t into 2 unpacked int4 values, it was assumed that the LSB was the first value and the MSB was the second value. However, this ordering should be flipped
* The original implementation expected the output tensor to be channels packed, but in practice we want the output tensor to be width packed

This diff addresses the above issues, and introduces a dedicated test binary to test against an equivalent reference implementation expressed with ATen functions.

Differential Revision: [D64354773](https://our.internmc.facebook.com/intern/diff/D64354773/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64354773

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 841277a.

@SS-JIA SS-JIA deleted the gh/SS-JIA/114/head branch January 24, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants