Commit 6bfb617
static attn export with cpu 4bit embedding
Summary:
buck2 run mode/dev-nosan executorch/examples/models/fb/llama4:export_static_transformer_qnn -- -p manifold://executorch/tree/models/llama/stories_110M/params.json -c manifold://executorch/tree/models/llama/stories_110M/stories110M.pt -t manifold://executorch/tree/models/llama/stories_110M/tokenizer.model -o /tmp/llm/stories_uint8.pte --cache_len 128 --methods prefill,32 -E "4,32"
embedding graph P1856192830
P1856279457
Differential Revision: D774592771 parent 3ba0466 commit 6bfb617
1 file changed
+4
-0
lines changedLines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
124 | 124 | | |
125 | 125 | | |
126 | 126 | | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
127 | 131 | | |
128 | 132 | | |
129 | 133 | | |
| |||
0 commit comments