[ET-VK] Use performant tiled algorithm for 4 bit weight only quantized linear #10205

SS-JIA · 2025-04-15T16:55:58Z

Stack from ghstack (oldest at bottom):

-> [ET-VK] Use performant tiled algorithm for 4 bit weight only quantized linear #10205
[ET-VK] Add co-op algorithm for 4 bit weight only quantized linear #10204
[ET-VK] Allow int4 linear to execute without 8bit buffer support #10030
[ET-VK][ez] Add support for buffer backed qparams in int4 linear + add checks for physical limits when allocating #9974

Context

As title. Update the default compute shader for weight-only quantized int4 linear to use a tiled algorithm, which should boost performance for gemm cases, i.e. where mat1 is a matrix.

Changes

Changed q_4w_linear name to q_4w_linear_tiled name
Update the compute shader to use tiled algorithm

Using a value of 3 for TILE_ROWS; I expect to add variants which switch between different output tile configurations.

Differential Revision: D73044649

…d linear ## Context As title. Update the default compute shader for weight-only quantized int4 linear to use a tiled algorithm, which should boost performance for `gemm` cases, i.e. where `mat1` is a matrix. ## Changes * Changed `q_4w_linear` name to `q_4w_linear_tiled` name * Update the compute shader to use tiled algorithm Using a value of 3 for `TILE_ROWS`; I expect to add variants which switch between different output tile configurations. Differential Revision: [D73044649](https://our.internmc.facebook.com/intern/diff/D73044649/) [ghstack-poisoned]

pytorch-bot · 2025-04-15T16:56:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10205

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit ef07365 with merge base 6d1caca ():

NEW FAILURE - The following job has failed:

pull / android / run-emulator (gh)
The process '/usr/bin/sh' failed with exit code 255

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…d linear ## Context As title. Update the default compute shader for weight-only quantized int4 linear to use a tiled algorithm, which should boost performance for `gemm` cases, i.e. where `mat1` is a matrix. ## Changes * Changed `q_4w_linear` name to `q_4w_linear_tiled` name * Update the compute shader to use tiled algorithm Using a value of 3 for `TILE_ROWS`; I expect to add variants which switch between different output tile configurations. Differential Revision: [D73044649](https://our.internmc.facebook.com/intern/diff/D73044649/) ghstack-source-id: 278225005 Pull Request resolved: #10205

facebook-github-bot · 2025-04-15T16:56:44Z

This pull request was exported from Phabricator. Differential Revision: D73044649

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2025

facebook-github-bot added the fb-exported label Apr 15, 2025

trivedivivek added the topic: not user facing label Apr 16, 2025

trivedivivek approved these changes Apr 16, 2025

View reviewed changes

facebook-github-bot merged commit b9c8c82 into gh/SS-JIA/213/base Apr 16, 2025
82 of 87 checks passed

facebook-github-bot deleted the gh/SS-JIA/213/head branch April 16, 2025 18:51

facebook-github-bot temporarily deployed to cherry-pick-bot April 16, 2025 18:51 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Apr 16, 2025

[ET-VK] Use performant tiled algorithm for 4 bit weight only quantized linear #10236

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Use performant tiled algorithm for 4 bit weight only quantized linear #10205

[ET-VK] Use performant tiled algorithm for 4 bit weight only quantized linear #10205

Uh oh!

SS-JIA commented Apr 15, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 15, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ET-VK] Use performant tiled algorithm for 4 bit weight only quantized linear #10205

[ET-VK] Use performant tiled algorithm for 4 bit weight only quantized linear #10205

Uh oh!

Conversation

SS-JIA commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Uh oh!

pytorch-bot bot commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10205

❌ 1 New Failure

Uh oh!

facebook-github-bot commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SS-JIA commented Apr 15, 2025 •

edited

Loading

pytorch-bot bot commented Apr 15, 2025 •

edited

Loading