Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@manuelcandales
Copy link
Contributor

@manuelcandales manuelcandales commented May 12, 2025

Performance improvements to lowbit quantized linear metal kernels in torchao. See AO PR#2167 for details.
The table below summarizes torchchat's decode speed (tokens/second) on Metal backend on M1 Max 64GB after this update

# bits Llama 3.2-1B Llama 3.2-3B Llama 3.1-8B
1 179.96 87.01 51.10
2 186.91 98.13 54.37
3 170.62 85.69 48.05
4 175.15 89.54 50.51
5 147.10 70.19 38.58
6 140.51 63.62 35.48
7 131.27 64.19 32.69

@pytorch-bot
Copy link

pytorch-bot bot commented May 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1541

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit 0698a1d with merge base a37b08a (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 12, 2025
@Jack-Khuu
Copy link
Contributor

You can ignore the ET issues I'm bumping it today

lmk when you want to land and I'll bypass/force

@Jack-Khuu Jack-Khuu changed the title bump torchao pin Bump torchao pin to pick up qmv_fast optimization May 12, 2025
@manuelcandales manuelcandales merged commit fd3059b into main May 13, 2025
68 of 72 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants