Add NPU (Ascend) backend support for INT4 weight-only quantization workflow #3172

orangeH25 · 2025-10-14T11:44:22Z

Related to #3044

Summary

This PR adds NPU (Ascend) backend support for the INT4 weight-only quantization workflow.

It introduces a new tensor subclass, Int4PlainInt32TensorNPU, aligned with the existing Int4PlainInt32Tensor for the plain_int32 packing format.

Environment

torchao version: 0.13.0 (main branch, commit: f64daac)
torch version: 2.7.1
torch_npu version: 2.7.1rc1
Ascend Toolkit (CANN): 8.2.RC1
Device: Ascend 910B4
OS: EulerOS 2.10 (Kernel 4.19.90, aarch64)
Python: 3.11

Files changed

Modified

torchao/quantization/__init__.py
torchao/quantization/quant_api.py
torchao/quantization/quantize_/workflows/__init__.py

Added

torchao/quantization/quantize_/workflows/int4/int4_plain_int32_tensor_npu.py
test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor_npu.py

Implementation Overview

Introduces Int4PlainInt32TensorNPU to enable NPU backend support for INT4 weight-only quantization.
Registeres new tensor subclass and integrated into quant_api.py for dispatch.
Updates __init__.py files to ensure proper import and exposure.
Adds corresponding test cases for NPU workflow.

Test Case

test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor_npu.py

…rkflow

pytorch-bot · 2025-10-14T11:44:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3172

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

FFFrog · 2025-10-14T13:47:30Z

test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor_npu.py

+try:
+    import torch_npu
+except ImportError:
+    torch_npu = None
+


PyTorch provide Autoload mechinasm, so we do not need to import it explicitly.

FFFrog · 2025-10-14T13:50:58Z

test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor_npu.py

+@unittest.skipIf(torch_npu is None, "torch_npu is not available")
+@unittest.skipIf(not torch_npu.npu.is_available(), "NPU not available")


Suggested change

@unittest.skipIf(torch_npu is None, "torch_npu is not available")

@unittest.skipIf(not torch_npu.npu.is_available(), "NPU not available")

@unittest.skipIf(torch.accelerator.current_accelerator(True).type == "npu" and torch.accelerator.is_available(), "NPU not available")

FFFrog · 2025-10-14T13:56:02Z

test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor_npu.py

+@unittest.skipIf(
+    version.parse(torch_npu.__version__) < version.parse("2.7.1rc1"),
+    "Need torch_npu 2.7.1rc1+",
+)


We can remove it because there are some strcit version mapping between PyTorch and Torch_NPU

FFFrog · 2025-10-14T13:57:07Z

torchao/quantization/quantize_/workflows/int4/int4_plain_int32_tensor_npu.py

+        )
+
+        assert int_data.dtype == torch.int32, (
+            f"torch_npu.npu_convert_weight_to_int4pack expects `int32` dtype"


Suggested change

f"torch_npu.npu_convert_weight_to_int4pack expects `int32` dtype"

f"torch.ops.npu.npu_convert_weight_to_int4pack expects `int32` dtype"

FFFrog · 2025-10-14T13:57:22Z

torchao/quantization/quantize_/workflows/int4/int4_plain_int32_tensor_npu.py

+        )
+
+        assert int_data.shape[-1] % 8 == 0, (
+            f"torch_npu.npu_convert_weight_to_int4pack expects last dim must be aligned to 8,but got {int_data.shape[-1]}"


Suggested change

f"torch_npu.npu_convert_weight_to_int4pack expects last dim must be aligned to 8,but got {int_data.shape[-1]}"

f"torch.ops.npu.npu_convert_weight_to_int4pack expects last dim must be aligned to 8,but got {int_data.shape[-1]}"

orangeH25 · 2025-10-15T06:30:26Z

Hi @jcaip @jerryzh168 , please help to review it, thanks!

orangeH25 added 3 commits October 13, 2025 11:07

Add NPU (Ascend) backend support for INT4 weight-only quantization wo…

f3aefca

…rkflow

use torch.ops.npu prefix and drop redundant torch_npu import

68eea61

Merge branch 'pytorch:main' into quant/int4/wo/0

164435e

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2025

orangeH25 marked this pull request as draft October 14, 2025 11:44

FFFrog reviewed Oct 14, 2025

View reviewed changes

Modify test file and update comments

06c77d1

orangeH25 marked this pull request as ready for review October 15, 2025 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NPU (Ascend) backend support for INT4 weight-only quantization workflow #3172

Add NPU (Ascend) backend support for INT4 weight-only quantization workflow #3172

Uh oh!

orangeH25 commented Oct 14, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

FFFrog Oct 14, 2025

Uh oh!

FFFrog Oct 14, 2025 •

edited

Loading

Uh oh!

FFFrog Oct 14, 2025

Uh oh!

FFFrog Oct 14, 2025

Uh oh!

FFFrog Oct 14, 2025

Uh oh!

orangeH25 commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@unittest.skipIf(torch_npu is None, "torch_npu is not available")
		@unittest.skipIf(not torch_npu.npu.is_available(), "NPU not available")

	@unittest.skipIf(torch_npu is None, "torch_npu is not available")
	@unittest.skipIf(not torch_npu.npu.is_available(), "NPU not available")
	@unittest.skipIf(torch.accelerator.current_accelerator(True).type == "npu" and torch.accelerator.is_available(), "NPU not available")

	f"torch_npu.npu_convert_weight_to_int4pack expects `int32` dtype"
	f"torch.ops.npu.npu_convert_weight_to_int4pack expects `int32` dtype"

Add NPU (Ascend) backend support for INT4 weight-only quantization workflow #3172

Are you sure you want to change the base?

Add NPU (Ascend) backend support for INT4 weight-only quantization workflow #3172

Uh oh!

Conversation

orangeH25 commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Environment

Files changed

Implementation Overview

Test Case

Uh oh!

pytorch-bot bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3172

Uh oh!

FFFrog Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

FFFrog Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FFFrog Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

FFFrog Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

FFFrog Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

orangeH25 commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

orangeH25 commented Oct 14, 2025 •

edited

Loading

pytorch-bot bot commented Oct 14, 2025 •

edited

Loading

FFFrog Oct 14, 2025 •

edited

Loading