Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@manuelcandales
Copy link
Contributor

@manuelcandales manuelcandales commented Dec 18, 2024

Updates torchao pin to benefit from optimizations to the MPS experimental lowbit kernels AO PR #1422

Llama 3.2 1B (llama3.2-1b-base):
1-bit: 28.0688
2-bit: 31.2422
3-bit: 30.1294
4-bit: 30.7905
5-bit: 28.1504
6-bit: 28.4321
7-bit: 27.3991

Llama 3.1 8B (llama3.1-base):
1-bit: 7.4459
2-bit: 15.6508
3-bit: 15.3086
4-bit: 16.1268
5-bit: 6.7308
6-bit: 6.4887
7-bit: 6.4537

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1428

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Pending, 2 Unrelated Failures

As of commit 3c0e898 with merge base 56be609 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

  • pull / test-gpu-aoti-float16 (cuda, stories15M) / linux-job (gh) (trunk failure)
    RuntimeError: run_func_( container_handle_, input_handles.data(), input_handles.size(), output_handles.data(), output_handles.size(), reinterpret_cast<AOTInductorStreamHandle>(stream_handle), proxy_executor_handle_) API call failed at /pytorch/torch/csrc/inductor/aoti_runner/model_container_runner.cpp, line 107
  • pull / test-gpu-aoti-float32 (cuda, stories15M) / linux-job (gh) (trunk failure)
    RuntimeError: run_func_( container_handle_, input_handles.data(), input_handles.size(), output_handles.data(), output_handles.size(), reinterpret_cast<AOTInductorStreamHandle>(stream_handle), proxy_executor_handle_) API call failed at /pytorch/torch/csrc/inductor/aoti_runner/model_container_runner.cpp, line 107

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 18, 2024
@manuelcandales manuelcandales changed the title update torchao pin: optimized mps experimental shaders update torchao pin: optimized mps lowbit shaders Dec 18, 2024
@Jack-Khuu Jack-Khuu added the Quantization Issues related to Quantization or torchao label Dec 18, 2024
Copy link
Contributor

@Jack-Khuu Jack-Khuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, if no objections from quant folk

@manuelcandales manuelcandales merged commit 113e40b into main Dec 18, 2024
49 of 53 checks passed
vmpuri pushed a commit that referenced this pull request Feb 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Meta Open Source bot. Quantization Issues related to Quantization or torchao

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants