Skip to content

Conversation

@navsud
Copy link
Contributor

@navsud navsud commented Aug 11, 2025

Summary: 8-bit activation qconfig should not use reduce_range=True which limits the range to 0,127. This diff fixes that issue.

Differential Revision: D80007226

@navsud navsud requested a review from cccclai as a code owner August 11, 2025 16:12
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13284

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 5 Unrelated Failures

As of commit 9f528a0 with merge base 310a05d (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 11, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80007226

@navsud navsud added the release notes: none Do not include this in the release notes label Aug 11, 2025
navsud added a commit to navsud/executorch that referenced this pull request Aug 11, 2025
Summary:

8-bit activation qconfig should not use reduce_range=True which limits the range to 0,127. This diff fixes that issue.

Differential Revision: D80007226
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80007226

@cccclai
Copy link
Contributor

cccclai commented Aug 12, 2025

8-bit activation qconfig should not use reduce_range=True which limits the range to 0,127

What range should it be instead?

@navsud
Copy link
Contributor Author

navsud commented Aug 12, 2025

8-bit activation qconfig should not use reduce_range=True which limits the range to 0,127

What range should it be instead?

Without reduce_range=True, it would use the default of 0,255 ranges, which is the correct ranges for uint8. With this change, I was able to recover the training loss during QAT back to that of the fp32 training.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

navsud added a commit to navsud/executorch that referenced this pull request Aug 13, 2025
Summary:

8-bit activation qconfig should not use reduce_range=True which limits the range to 0,127. This diff fixes that issue.

Reviewed By: cccclai

Differential Revision: D80007226
Summary:
Pull Request resolved: pytorch#13284

8-bit activation qconfig should not use reduce_range=True which limits the range to 0,127. This diff fixes that issue.

Reviewed By: cccclai

Differential Revision: D80007226
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80007226

@facebook-github-bot facebook-github-bot merged commit e032ca3 into pytorch:main Aug 13, 2025
98 of 106 checks passed
agrima1304 pushed a commit to agrima1304/executorch that referenced this pull request Aug 26, 2025
Differential Revision: D80007226

Pull Request resolved: pytorch#13284
@Novelfor
Copy link

I found the weight use reduce_range too..

    weight_fake_quant_ctr = FusedMovingAvgObsFakeQuantize.with_args(
        dtype=torch.int8,
        quant_min=torch.iinfo(torch.int8).min + 1,
        quant_max=torch.iinfo(torch.int8).max,
        qscheme=torch.per_tensor_symmetric,
        reduce_range=True,
        observer=MovingAverageMinMaxObserver,
    )

I think if modify it to False will improve performance too...

@cccclai
Copy link
Contributor

cccclai commented Aug 26, 2025

I found the weight use reduce_range too..

    weight_fake_quant_ctr = FusedMovingAvgObsFakeQuantize.with_args(
        dtype=torch.int8,
        quant_min=torch.iinfo(torch.int8).min + 1,
        quant_max=torch.iinfo(torch.int8).max,
        qscheme=torch.per_tensor_symmetric,
        reduce_range=True,
        observer=MovingAverageMinMaxObserver,
    )

I think if modify it to False will improve performance too...

cc: @haowhsu-quic @winskuo-quic @shewu-quic @DannyYuyang-quic

@navsud
Copy link
Contributor Author

navsud commented Aug 26, 2025

@haowhsu-quic
I see reduce_range=True at many places in this file. Do we really need to reduce the range by 1-bit for qualcomm backends? If not, we should remove them to get back the 1-bit range, which would improve the model accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants