static attn export with cpu 4bit embedding

Min Guo · facebook-github-bot · commit 6bfb617e751c · 2025-07-01T14:58:59.000-07:00
Summary:
buck2 run mode/dev-nosan executorch/examples/models/fb/llama4:export_static_transformer_qnn -- -p manifold://executorch/tree/models/llama/stories_110M/params.json -c manifold://executorch/tree/models/llama/stories_110M/stories110M.pt -t manifold://executorch/tree/models/llama/stories_110M/tokenizer.model -o /tmp/llm/stories_uint8.pte --cache_len 128 --methods prefill,32 -E "4,32"

embedding graph P1856192830
P1856279457

Differential Revision: D77459277
diff --git a/backends/qualcomm/_passes/lift_constant_scalar_operands.py b/backends/qualcomm/_passes/lift_constant_scalar_operands.py
@@ -124,6 +124,10 @@ def _create_tensor_args(
     ) -> Dict[int, TensorConstant]:
         tensor_args = {}
         for i, arg in enumerate(node.args):
+            if hasattr(node.target, "_schema"):
+                schema = node.target._schema.arguments[i]
+            else:
+                continue 
             schema = node.target._schema.arguments[i]
             is_tensor_arg_got_num = isinstance(
                 schema.type, torch.TensorType