Arm backend: Make per-channel quantization default #11873

martinlsm · 2025-06-24T06:45:43Z

Support for per-channel quantization was recently added to the Arm
backend. This patch changes the default setting to use per-channel
quantization for weights in convolutional and linear layers, instead of
per-tensor quantization, which was the previous default.

The reason for this change is that per-channel quantization offers
better numerical accuracy for models containing convolutional and/or
fully connected layers. Unless there is an explicit limitation in the
use case that prevents the use of per-channel quantization, it is
generally preferred.

The option to set quantization granularity can still be manually set
using get_symmetric_quantization_config(is_per_channel=False). This
patch only changes the default.

Unit and model tests are affected by this change. Error tolerances for
those tests have not been changed, as model outputs are compared against
a reference that uses the exact same quantization strategy. That is, if
a model output is altered by this patch, the reference it is compared
against would also be altered accordingly.

To verify the impact of this change in terms of top-1 and top-5
accuracy, a manual test was run on MobileNetV2. The results show a
noticeable improvement:

Per-tensor quantization Top-1 / Top-5 accuracy: 66.45% / 87.50%
Per-channel quantization Top-1 / Top-5 accuracy: 70.85% / 89.50%

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

pytorch-bot · 2025-06-24T06:45:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11873

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit cfaa79f with merge base 2bd96df ():

NEW FAILURES - The following jobs have failed:

pull / test-eval_llama-mmlu-linux / linux-job (gh)
RuntimeError: Command docker exec -t 864cd11ad375d3735fe6e8e481c9a2fe472033ddddda3729c59d90f8db9b22c7 /exec failed with exit code 1
trunk / unittest-release / linux / linux-job (gh)
backends/xnnpack/test/ops/test_conv1d.py::TestConv1d::test_qs8_conv1d_batchnorm_seq

This comment was automatically generated by Dr. CI and updates every 15 minutes.

martinlsm · 2025-06-24T06:45:59Z

@pytorchbot label ciflow/trunk

martinlsm · 2025-06-24T06:46:26Z

@pytorchbot label "partner: arm"

martinlsm · 2025-06-24T06:46:45Z

@pytorchbot label "topic: not user facing"

jackzhxng · 2025-06-24T19:05:02Z

@martinlsm no need to add "topic: not user facing" anymore btw

digantdesai

I assume rationale is accuracy but can you please add more details? And linear and conv weights only right? perf impact? I also didn't see any ATOL/RTOL change in this diff.

digantdesai · 2025-06-26T03:56:01Z

examples/arm/ethos_u_minimal_example.ipynb

    "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n",
    "quantizer = EthosUQuantizer(compile_spec)\n",
-    "operator_config = get_symmetric_quantization_config(is_per_channel=False)\n",
+    "operator_config = get_symmetric_quantization_config(is_per_channel=True)\n",


Nit

Suggested change

"operator_config = get_symmetric_quantization_config(is_per_channel=True)\n",

"operator_config = get_symmetric_quantization_config()\n",

@digantdesai I have resolved your code comment and answered all your questions in the updated commit message.

digantdesai

I am glad to see we didn't have to mess with the backend code at all when quantizing the weights differently. Thanks @martinlsm.

oscarandersson8218 · 2025-07-02T18:43:05Z

I am glad to see we didn't have to mess with the backend code at all when quantizing the weights differently. Thanks @martinlsm.

@digantdesai we did mess with the backend code but in a separate PR #11752.

Support for per-channel quantization was recently added to the Arm backend. This patch changes the default setting to use per-channel quantization for weights in convolutional and linear layers, instead of per-tensor quantization, which was the previous default. The reason for this change is that per-channel quantization offers better numerical accuracy for models containing convolutional and/or fully connected layers. Unless there is an explicit limitation in the use case that prevents the use of per-channel quantization, it is generally preferred. The option to set quantization granularity can still be manually set using `get_symmetric_quantization_config(is_per_channel=False)`. This patch only changes the default. Unit and model tests are affected by this change. Error tolerances for those tests have not been changed, as model outputs are compared against a reference that uses the exact same quantization strategy. That is, if a model output is altered by this patch, the reference it is compared against would also be altered accordingly. To verify the impact of this change in terms of top-1 and top-5 accuracy, a manual test was run on MobileNetV2. The results show a noticeable improvement: - Per-tensor quantization Top-1 / Top-5 accuracy: 66.45% / 87.50% - Per-channel quantization Top-1 / Top-5 accuracy: 70.85% / 89.50% Change-Id: I35d5c62741c7f93b916560874689245db96a588b Signed-off-by: Martin Lindström <[email protected]>

Previously we were just a few minutes off the 90 minute timeout. With per-channel quantizaiton enabled by defualt it seems that we exceed that limit consistently. This patch increases the timeout to 120 minutes. Change-Id: I20f3fb369329dd51e95ffec667617afe93c50aa3 Signed-off-by: Oscar Andersson <[email protected]>

digantdesai · 2025-07-09T04:32:55Z

#11752

LOL That's what I thought :D

Are we ready to merge this one then?

.github/workflows/trunk.yml

Sebastian-Larsson

Unrelated CI failures

martinlsm requested review from oscarandersson8218 and wwwind June 24, 2025 06:45

martinlsm requested a review from digantdesai as a code owner June 24, 2025 06:45

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2025

pytorch-bot bot added the ciflow/trunk label Jun 24, 2025

pytorch-bot bot added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Jun 24, 2025

pytorch-bot bot added the topic: not user facing label Jun 24, 2025

zingo added release notes: arm Changes to the ARM backend delegate and removed topic: not user facing labels Jun 24, 2025

digantdesai reviewed Jun 26, 2025

View reviewed changes

martinlsm force-pushed the marlin-per-channel-quant branch from 8175bb3 to c3567b4 Compare June 27, 2025 07:53

digantdesai approved these changes Jul 2, 2025

View reviewed changes

martinlsm force-pushed the marlin-per-channel-quant branch from fb40b0a to 4dd805c Compare July 4, 2025 11:13

Martin Lindström added 2 commits July 8, 2025 08:21

oscarandersson8218 force-pushed the marlin-per-channel-quant branch from 4dd805c to 07c2ba7 Compare July 8, 2025 10:22

oscarandersson8218 reviewed Jul 9, 2025

View reviewed changes

.github/workflows/trunk.yml Show resolved Hide resolved

Sebastian-Larsson added 2 commits July 11, 2025 09:47

Merge branch 'main' into marlin-per-channel-quant

1a0c4bc

Merge branch 'main' into marlin-per-channel-quant

cfaa79f

Sebastian-Larsson approved these changes Jul 15, 2025

View reviewed changes

Sebastian-Larsson merged commit e9c11a4 into pytorch:main Jul 15, 2025
203 of 205 checks passed

	"operator_config = get_symmetric_quantization_config(is_per_channel=True)\n",
	"operator_config = get_symmetric_quantization_config()\n",

Arm backend: Make per-channel quantization default #11873

Arm backend: Make per-channel quantization default #11873

Uh oh!

Conversation

martinlsm commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11873

❌ 2 New Failures

Uh oh!

martinlsm commented Jun 24, 2025

Uh oh!

martinlsm commented Jun 24, 2025

Uh oh!

martinlsm commented Jun 24, 2025

Uh oh!

jackzhxng commented Jun 24, 2025

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

digantdesai Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

martinlsm Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

digantdesai Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

oscarandersson8218 commented Jul 2, 2025

Uh oh!

digantdesai commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Sebastian-Larsson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

martinlsm commented Jun 24, 2025 •

edited

Loading

pytorch-bot bot commented Jun 24, 2025 •

edited

Loading

digantdesai commented Jul 9, 2025 •

edited

Loading