Skip to content

Conversation

trivedivivek
Copy link
Contributor

Summary:
This diff improves the performance of quantized matrix multiplication by devectorizing the shader.

An example modification is shown below:

// Before
VEC4_T sums[TILE_ROWS][TILE_TXCOLS];

// After
T sums[TILE_ROWS * TILE_TXCOLS * 4];

// Before
sums[r][${c}] = VEC4_T(0.0);

// After
for (int j = 0; j < 4; j++) {
    sums[r * TILE_TXCOLS * 4 + ${c} * 4 + j] = T(0.0);
}

Differential Revision: D85023829

@trivedivivek trivedivivek requested a review from SS-JIA as a code owner October 20, 2025 15:46
Copy link

pytorch-bot bot commented Oct 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15274

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 8 New Failures, 2 Unrelated Failures

As of commit b831b18 with merge base 0a1dfb2 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 20, 2025
Copy link

meta-codesync bot commented Oct 20, 2025

@trivedivivek has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85023829.

@trivedivivek trivedivivek added the release notes: vulkan Changes to the Vulkan backend delegate label Oct 20, 2025
trivedivivek added a commit to trivedivivek/executorch that referenced this pull request Oct 20, 2025
…rch#15274)

Summary:

This diff improves the performance of quantized matrix multiplication by devectorizing the shader. 

An example modification is shown below:

```glsl
// Before
VEC4_T sums[TILE_ROWS][TILE_TXCOLS];

// After
T sums[TILE_ROWS * TILE_TXCOLS * 4];

// Before
sums[r][${c}] = VEC4_T(0.0);

// After
for (int j = 0; j < 4; j++) {
    sums[r * TILE_TXCOLS * 4 + ${c} * 4 + j] = T(0.0);
}
```

Differential Revision: D85023829
Summary:

The diff includes minor performance improvements to the quantized matrix multiplication shader.

Differential Revision: D84998542
…rch#15274)

Summary:

This diff improves the performance of quantized matrix multiplication by devectorizing the shader. 

An example modification is shown below:

```glsl
// Before
VEC4_T sums[TILE_ROWS][TILE_TXCOLS];

// After
T sums[TILE_ROWS * TILE_TXCOLS * 4];

// Before
sums[r][${c}] = VEC4_T(0.0);

// After
for (int j = 0; j < 4; j++) {
    sums[r * TILE_TXCOLS * 4 + ${c} * 4 + j] = T(0.0);
}
```

Differential Revision: D85023829
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported release notes: vulkan Changes to the Vulkan backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant