Skip to content

Commit 6bfb617

Browse files
Min Guofacebook-github-bot
authored andcommitted
static attn export with cpu 4bit embedding
Summary: buck2 run mode/dev-nosan executorch/examples/models/fb/llama4:export_static_transformer_qnn -- -p manifold://executorch/tree/models/llama/stories_110M/params.json -c manifold://executorch/tree/models/llama/stories_110M/stories110M.pt -t manifold://executorch/tree/models/llama/stories_110M/tokenizer.model -o /tmp/llm/stories_uint8.pte --cache_len 128 --methods prefill,32 -E "4,32" embedding graph P1856192830 P1856279457 Differential Revision: D77459277
1 parent 3ba0466 commit 6bfb617

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

backends/qualcomm/_passes/lift_constant_scalar_operands.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,10 @@ def _create_tensor_args(
124124
) -> Dict[int, TensorConstant]:
125125
tensor_args = {}
126126
for i, arg in enumerate(node.args):
127+
if hasattr(node.target, "_schema"):
128+
schema = node.target._schema.arguments[i]
129+
else:
130+
continue
127131
schema = node.target._schema.arguments[i]
128132
is_tensor_arg_got_num = isinstance(
129133
schema.type, torch.TensorType

0 commit comments

Comments
 (0)