Add SCALE_DTYPE and ZP_DTYPE support for quantization shaders #13225

ahmtox · 2025-08-08T14:39:58Z

Summary:
This change adds support for parameterized SCALE_DTYPE and ZP_DTYPE to the quantization and dequantization shaders. This is necessary as when exporting llama with "8da4w" you might have different affine calls with various scale and zero point dtypes. I've also added functionality to automatically populate optional parameters.

NOTE: Disable the fusion for linear_qta8a_qga4w as the bug for why it doesn't work with exporting llama is being resolved.

Key Changes:

(1) YAML Configuration Updates:

Added SCALE_DTYPE and ZP_DTYPE parameters to quantize_texture.yaml and dequantize_texture.yaml
Added generate_variant_forall entries for SCALE_DTYPE (float) and ZP_DTYPE (int8, int32, float)
This enables shader variants for different scale and zero_point data types

(2) GLSL Shader Updates:

Added SCALE_T and ZP_T type definitions using the new parameters
Updated tensor declarations to use parameterized types instead of hardcoded "float" and "int"
Added proper type casting (float() and int()) for all scale and zero_point accesses
Added required extensions for SCALE_DTYPE and ZP_DTYPE

(3) C++ Implementation Updates:

Added dtype suffixes for scale and zero_point in all quantize/dequantize node functions
Added comprehensive data type validation in all implementation functions:
- Scale tensors: fp32 only (for now)
- Zero point tensors: int32, int8, fp32
Updated Quantize.cpp, Dequantize.cpp, and ChooseQParams.cpp with consistent validation

This change resolves shader compilation errors and enables more flexible quantization strategies by supporting multiple data types for quantization parameters.

Differential Revision: D79835267

pytorch-bot · 2025-08-08T14:40:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13225

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Cancelled Jobs, 1 Unrelated Failure

As of commit aaad9b6 with merge base f7ddbde ():

NEW FAILURES - The following jobs have failed:

Build documentation / build (buck2) / Build doc (gh)
At least one of the pre-conditions you specified did not hold
pull / test-moshi-linux / linux-job (gh)
RuntimeError: Command docker exec -t ea1e1aab94d05894799b8275deec7e687cf068e34c8abbab34541b28a91879f5 /exec failed with exit code 1

CANCELLED JOBS - The following jobs were cancelled. Please retry:

pull / test-static-llama-qnn-linux / linux-job (gh)
##[error]The operation was canceled.
pull / unittest-arm-backend-with-no-fvp (test_pytest_models) / linux-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-openvino-linux / linux-job (gh) (trunk failure)
AttributeError: '_OpNamespace' 'quantized_decomposed' object has no attribute 'convert_element_type'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-08-08T14:40:09Z

This pull request was exported from Phabricator. Differential Revision: D79835267

github-actions · 2025-08-08T14:40:41Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…h#13225) Summary: This change adds support for parameterized SCALE_DTYPE and ZP_DTYPE to the quantization and dequantization shaders. This is necessary as when exporting llama with "8da4w" you might have different affine calls with various scale and zero point dtypes. I've also added functionality to automatically populate optional parameters. NOTE: Disable the fusion for linear_qta8a_qga4w as the bug for why it doesn't work with exporting llama is being resolved. **Key Changes:** (1) **YAML Configuration Updates:** - Added SCALE_DTYPE and ZP_DTYPE parameters to quantize_texture.yaml and dequantize_texture.yaml - Added generate_variant_forall entries for SCALE_DTYPE (float) and ZP_DTYPE (int8, int32, float) - This enables shader variants for different scale and zero_point data types (2) **GLSL Shader Updates:** - Added SCALE_T and ZP_T type definitions using the new parameters - Updated tensor declarations to use parameterized types instead of hardcoded "float" and "int" - Added proper type casting (float() and int()) for all scale and zero_point accesses - Added required extensions for SCALE_DTYPE and ZP_DTYPE (3) **C++ Implementation Updates:** - Added dtype suffixes for scale and zero_point in all quantize/dequantize node functions - Added comprehensive data type validation in all implementation functions: - Scale tensors: fp32 only (for now) - Zero point tensors: int32, int8, fp32 - Updated Quantize.cpp, Dequantize.cpp, and ChooseQParams.cpp with consistent validation This change resolves shader compilation errors and enables more flexible quantization strategies by supporting multiple data types for quantization parameters. Reviewed By: SS-JIA Differential Revision: D79835267

facebook-github-bot · 2025-08-08T15:05:01Z

This pull request was exported from Phabricator. Differential Revision: D79835267

…h#13225) Summary: Pull Request resolved: pytorch#13225 This change adds support for parameterized SCALE_DTYPE and ZP_DTYPE to the quantization and dequantization shaders. This is necessary as when exporting llama with "8da4w" you might have different affine calls with various scale and zero point dtypes. I've also added functionality to automatically populate optional parameters. NOTE: Disable the fusion for linear_qta8a_qga4w as the bug for why it doesn't work with exporting llama is being resolved. **Key Changes:** (1) **YAML Configuration Updates:** - Added SCALE_DTYPE and ZP_DTYPE parameters to quantize_texture.yaml and dequantize_texture.yaml - Added generate_variant_forall entries for SCALE_DTYPE (float) and ZP_DTYPE (int8, int32, float) - This enables shader variants for different scale and zero_point data types (2) **GLSL Shader Updates:** - Added SCALE_T and ZP_T type definitions using the new parameters - Updated tensor declarations to use parameterized types instead of hardcoded "float" and "int" - Added proper type casting (float() and int()) for all scale and zero_point accesses - Added required extensions for SCALE_DTYPE and ZP_DTYPE (3) **C++ Implementation Updates:** - Added dtype suffixes for scale and zero_point in all quantize/dequantize node functions - Added comprehensive data type validation in all implementation functions: - Scale tensors: fp32 only (for now) - Zero point tensors: int32, int8, fp32 - Updated Quantize.cpp, Dequantize.cpp, and ChooseQParams.cpp with consistent validation This change resolves shader compilation errors and enables more flexible quantization strategies by supporting multiple data types for quantization parameters. Reviewed By: SS-JIA Differential Revision: D79835267

facebook-github-bot · 2025-08-08T15:08:06Z

This pull request was exported from Phabricator. Differential Revision: D79835267

Differential Revision: D79835267 Pull Request resolved: pytorch#13225

ahmtox requested a review from SS-JIA as a code owner August 8, 2025 14:39

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 8, 2025

facebook-github-bot added the fb-exported label Aug 8, 2025

SS-JIA approved these changes Aug 8, 2025

View reviewed changes

ahmtox force-pushed the export-D79835267 branch from f40f187 to c60a325 Compare August 8, 2025 15:02

ahmtox force-pushed the export-D79835267 branch from c60a325 to 0bc8c7f Compare August 8, 2025 15:04

ahmtox force-pushed the export-D79835267 branch from 0bc8c7f to aaad9b6 Compare August 8, 2025 15:08

facebook-github-bot merged commit bbb913b into pytorch:main Aug 8, 2025
97 of 104 checks passed

agrima1304 pushed a commit to agrima1304/executorch that referenced this pull request Aug 26, 2025

Add SCALE_DTYPE and ZP_DTYPE support for quantization shaders

f57ffe3

Differential Revision: D79835267 Pull Request resolved: pytorch#13225

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SCALE_DTYPE and ZP_DTYPE support for quantization shaders #13225

Add SCALE_DTYPE and ZP_DTYPE support for quantization shaders #13225

Uh oh!

ahmtox commented Aug 8, 2025

Uh oh!

pytorch-bot bot commented Aug 8, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add SCALE_DTYPE and ZP_DTYPE support for quantization shaders #13225

Add SCALE_DTYPE and ZP_DTYPE support for quantization shaders #13225

Uh oh!

Conversation

ahmtox commented Aug 8, 2025

Uh oh!

pytorch-bot bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13225

❌ 2 New Failures, 2 Cancelled Jobs, 1 Unrelated Failure

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

github-actions bot commented Aug 8, 2025

This PR needs a release notes: label

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

facebook-github-bot commented Aug 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Aug 8, 2025 •

edited

Loading

This PR needs a `release notes:` label