-
Notifications
You must be signed in to change notification settings - Fork 752
Add SCALE_DTYPE and ZP_DTYPE support for quantization shaders #13225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13225
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Cancelled Jobs, 1 Unrelated FailureAs of commit aaad9b6 with merge base f7ddbde ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D79835267 |
This PR needs a
|
…h#13225) Summary: This change adds support for parameterized SCALE_DTYPE and ZP_DTYPE to the quantization and dequantization shaders. This is necessary as when exporting llama with "8da4w" you might have different affine calls with various scale and zero point dtypes. I've also added functionality to automatically populate optional parameters. NOTE: Disable the fusion for linear_qta8a_qga4w as the bug for why it doesn't work with exporting llama is being resolved. **Key Changes:** (1) **YAML Configuration Updates:** - Added SCALE_DTYPE and ZP_DTYPE parameters to quantize_texture.yaml and dequantize_texture.yaml - Added generate_variant_forall entries for SCALE_DTYPE (float) and ZP_DTYPE (int8, int32, float) - This enables shader variants for different scale and zero_point data types (2) **GLSL Shader Updates:** - Added SCALE_T and ZP_T type definitions using the new parameters - Updated tensor declarations to use parameterized types instead of hardcoded "float" and "int" - Added proper type casting (float() and int()) for all scale and zero_point accesses - Added required extensions for SCALE_DTYPE and ZP_DTYPE (3) **C++ Implementation Updates:** - Added dtype suffixes for scale and zero_point in all quantize/dequantize node functions - Added comprehensive data type validation in all implementation functions: - Scale tensors: fp32 only (for now) - Zero point tensors: int32, int8, fp32 - Updated Quantize.cpp, Dequantize.cpp, and ChooseQParams.cpp with consistent validation This change resolves shader compilation errors and enables more flexible quantization strategies by supporting multiple data types for quantization parameters. Reviewed By: SS-JIA Differential Revision: D79835267
f40f187 to
c60a325
Compare
…h#13225) Summary: This change adds support for parameterized SCALE_DTYPE and ZP_DTYPE to the quantization and dequantization shaders. This is necessary as when exporting llama with "8da4w" you might have different affine calls with various scale and zero point dtypes. I've also added functionality to automatically populate optional parameters. NOTE: Disable the fusion for linear_qta8a_qga4w as the bug for why it doesn't work with exporting llama is being resolved. **Key Changes:** (1) **YAML Configuration Updates:** - Added SCALE_DTYPE and ZP_DTYPE parameters to quantize_texture.yaml and dequantize_texture.yaml - Added generate_variant_forall entries for SCALE_DTYPE (float) and ZP_DTYPE (int8, int32, float) - This enables shader variants for different scale and zero_point data types (2) **GLSL Shader Updates:** - Added SCALE_T and ZP_T type definitions using the new parameters - Updated tensor declarations to use parameterized types instead of hardcoded "float" and "int" - Added proper type casting (float() and int()) for all scale and zero_point accesses - Added required extensions for SCALE_DTYPE and ZP_DTYPE (3) **C++ Implementation Updates:** - Added dtype suffixes for scale and zero_point in all quantize/dequantize node functions - Added comprehensive data type validation in all implementation functions: - Scale tensors: fp32 only (for now) - Zero point tensors: int32, int8, fp32 - Updated Quantize.cpp, Dequantize.cpp, and ChooseQParams.cpp with consistent validation This change resolves shader compilation errors and enables more flexible quantization strategies by supporting multiple data types for quantization parameters. Reviewed By: SS-JIA Differential Revision: D79835267
c60a325 to
0bc8c7f
Compare
|
This pull request was exported from Phabricator. Differential Revision: D79835267 |
…h#13225) Summary: Pull Request resolved: pytorch#13225 This change adds support for parameterized SCALE_DTYPE and ZP_DTYPE to the quantization and dequantization shaders. This is necessary as when exporting llama with "8da4w" you might have different affine calls with various scale and zero point dtypes. I've also added functionality to automatically populate optional parameters. NOTE: Disable the fusion for linear_qta8a_qga4w as the bug for why it doesn't work with exporting llama is being resolved. **Key Changes:** (1) **YAML Configuration Updates:** - Added SCALE_DTYPE and ZP_DTYPE parameters to quantize_texture.yaml and dequantize_texture.yaml - Added generate_variant_forall entries for SCALE_DTYPE (float) and ZP_DTYPE (int8, int32, float) - This enables shader variants for different scale and zero_point data types (2) **GLSL Shader Updates:** - Added SCALE_T and ZP_T type definitions using the new parameters - Updated tensor declarations to use parameterized types instead of hardcoded "float" and "int" - Added proper type casting (float() and int()) for all scale and zero_point accesses - Added required extensions for SCALE_DTYPE and ZP_DTYPE (3) **C++ Implementation Updates:** - Added dtype suffixes for scale and zero_point in all quantize/dequantize node functions - Added comprehensive data type validation in all implementation functions: - Scale tensors: fp32 only (for now) - Zero point tensors: int32, int8, fp32 - Updated Quantize.cpp, Dequantize.cpp, and ChooseQParams.cpp with consistent validation This change resolves shader compilation errors and enables more flexible quantization strategies by supporting multiple data types for quantization parameters. Reviewed By: SS-JIA Differential Revision: D79835267
|
This pull request was exported from Phabricator. Differential Revision: D79835267 |
0bc8c7f to
aaad9b6
Compare
Differential Revision: D79835267 Pull Request resolved: pytorch#13225
Summary:
This change adds support for parameterized SCALE_DTYPE and ZP_DTYPE to the quantization and dequantization shaders. This is necessary as when exporting llama with "8da4w" you might have different affine calls with various scale and zero point dtypes. I've also added functionality to automatically populate optional parameters.
NOTE: Disable the fusion for linear_qta8a_qga4w as the bug for why it doesn't work with exporting llama is being resolved.
Key Changes:
(1) YAML Configuration Updates:
(2) GLSL Shader Updates:
(3) C++ Implementation Updates:
This change resolves shader compilation errors and enables more flexible quantization strategies by supporting multiple data types for quantization parameters.
Differential Revision: D79835267