You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update base for Update on "[Executorch][quant] Optimize per channel dequantize"
When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.
Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)
[ghstack-poisoned]
Copy file name to clipboardExpand all lines: README.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,6 +24,9 @@ Check out the [Getting Started](https://pytorch.org/executorch/stable/getting-st
24
24
25
25
Check out the examples of [Llama](./examples/models/llama2/README.md), [Llava](./examples/models/llava/README.md) and [other models](./examples/README.md) running on edge devices using ExecuTorch.
26
26
27
+
28
+
**[UPDATE - 09/25]** We have added support for running [Llama 3.2 1B/3B](./examples/models/llama2/README.md) models via ExecuTorch.
29
+
27
30
## Feedback
28
31
29
32
We welcome any feedback, suggestions, and bug reports from the community to help
0 commit comments