Skip to content

Conversation

@martinlsm
Copy link
Collaborator

@martinlsm martinlsm commented Jun 24, 2025

Support for per-channel quantization was recently added to the Arm
backend. This patch changes the default setting to use per-channel
quantization for weights in convolutional and linear layers, instead of
per-tensor quantization, which was the previous default.

The reason for this change is that per-channel quantization offers
better numerical accuracy for models containing convolutional and/or
fully connected layers. Unless there is an explicit limitation in the
use case that prevents the use of per-channel quantization, it is
generally preferred.

The option to set quantization granularity can still be manually set
using get_symmetric_quantization_config(is_per_channel=False). This
patch only changes the default.

Unit and model tests are affected by this change. Error tolerances for
those tests have not been changed, as model outputs are compared against
a reference that uses the exact same quantization strategy. That is, if
a model output is altered by this patch, the reference it is compared
against would also be altered accordingly.

To verify the impact of this change in terms of top-1 and top-5
accuracy, a manual test was run on MobileNetV2. The results show a
noticeable improvement:

  • Per-tensor quantization Top-1 / Top-5 accuracy: 66.45% / 87.50%
  • Per-channel quantization Top-1 / Top-5 accuracy: 70.85% / 89.50%

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11873

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit cfaa79f with merge base 2bd96df (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2025
@martinlsm
Copy link
Collaborator Author

@pytorchbot label ciflow/trunk

@martinlsm
Copy link
Collaborator Author

@pytorchbot label "partner: arm"

@pytorch-bot pytorch-bot bot added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Jun 24, 2025
@martinlsm
Copy link
Collaborator Author

@pytorchbot label "topic: not user facing"

@zingo zingo added release notes: arm Changes to the ARM backend delegate and removed topic: not user facing labels Jun 24, 2025
@jackzhxng
Copy link
Contributor

@martinlsm no need to add "topic: not user facing" anymore btw

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume rationale is accuracy but can you please add more details? And linear and conv weights only right? perf impact? I also didn't see any ATOL/RTOL change in this diff.

"# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n",
"quantizer = EthosUQuantizer(compile_spec)\n",
"operator_config = get_symmetric_quantization_config(is_per_channel=False)\n",
"operator_config = get_symmetric_quantization_config(is_per_channel=True)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
"operator_config = get_symmetric_quantization_config(is_per_channel=True)\n",
"operator_config = get_symmetric_quantization_config()\n",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@digantdesai I have resolved your code comment and answered all your questions in the updated commit message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@martinlsm martinlsm force-pushed the marlin-per-channel-quant branch from 8175bb3 to c3567b4 Compare June 27, 2025 07:53
Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am glad to see we didn't have to mess with the backend code at all when quantizing the weights differently. Thanks @martinlsm.

@oscarandersson8218
Copy link
Collaborator

I am glad to see we didn't have to mess with the backend code at all when quantizing the weights differently. Thanks @martinlsm.

@digantdesai we did mess with the backend code but in a separate PR #11752.

@martinlsm martinlsm force-pushed the marlin-per-channel-quant branch from fb40b0a to 4dd805c Compare July 4, 2025 11:13
Martin Lindström added 2 commits July 8, 2025 08:21
Support for per-channel quantization was recently added to the Arm
backend. This patch changes the default setting to use per-channel
quantization for weights in convolutional and linear layers, instead of
per-tensor quantization, which was the previous default.

The reason for this change is that per-channel quantization offers
better numerical accuracy for models containing convolutional and/or
fully connected layers. Unless there is an explicit limitation in the
use case that prevents the use of per-channel quantization, it is
generally preferred.

The option to set quantization granularity can still be manually set
using `get_symmetric_quantization_config(is_per_channel=False)`. This
patch only changes the default.

Unit and model tests are affected by this change. Error tolerances for
those tests have not been changed, as model outputs are compared against
a reference that uses the exact same quantization strategy. That is, if
a model output is altered by this patch, the reference it is compared
against would also be altered accordingly.

To verify the impact of this change in terms of top-1 and top-5
accuracy, a manual test was run on MobileNetV2. The results show a
noticeable improvement:

- Per-tensor quantization Top-1 / Top-5 accuracy: 66.45% / 87.50%
- Per-channel quantization Top-1 / Top-5 accuracy: 70.85% / 89.50%

Change-Id: I35d5c62741c7f93b916560874689245db96a588b
Signed-off-by: Martin Lindström <[email protected]>
Previously we were just a few minutes off the 90 minute timeout. With
per-channel quantizaiton enabled by defualt it seems that we exceed that
limit consistently. This patch increases the timeout to 120 minutes.

Change-Id: I20f3fb369329dd51e95ffec667617afe93c50aa3
Signed-off-by: Oscar Andersson <[email protected]>
@oscarandersson8218 oscarandersson8218 force-pushed the marlin-per-channel-quant branch from 4dd805c to 07c2ba7 Compare July 8, 2025 10:22
@digantdesai
Copy link
Contributor

digantdesai commented Jul 9, 2025

#11752

LOL That's what I thought :D

Are we ready to merge this one then?

Copy link
Collaborator

@Sebastian-Larsson Sebastian-Larsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated CI failures

@Sebastian-Larsson Sebastian-Larsson merged commit e9c11a4 into pytorch:main Jul 15, 2025
203 of 205 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: arm Changes to the ARM backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants