-
Notifications
You must be signed in to change notification settings - Fork 748
Qualcomm AI Engine Direct - GA Static Phi-4-mini #13179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualcomm AI Engine Direct - GA Static Phi-4-mini #13179
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13179
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Cancelled JobAs of commit d6dda6c with merge base 18098a4 ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
d0fe026 to
76f4d50
Compare
|
Hi @cccclai Thanks!! |
| # TODO: Encountered the following error during runtime, so switched behavior for now. | ||
| # Error: libc++abi: terminating due to uncaught exception of type std::runtime_error: invert=true is not supported for Split PreTokenizer. Only invert=false is supported. | ||
| data["pre_tokenizer"]["pretokenizers"][-2]["invert"] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cccclai,
And here is a temporary workaround for the tokenizer error:
invert=true is currently not supported for Split PreTokenizer, and only invert=false is allowed.
We would appreciate it if this could be fixed.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DannyYuyang-quic is it particularly for phi tokenizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow-up!
I’m not sure if tokenizers from other GA models would encounter the same issue, but based on what I’ve seen so far, models like Qwen2, Qwen3, and Gemma3 don’t seem to have the invert kwarg in their tokenizer.
76f4d50 to
a14610e
Compare
|
@pytorchbot label "release notes: qualcomm" |
a14610e to
013d222
Compare
|
|
||
|
|
||
| APPLY_ROPE_EMBEDDING_FUNCTIONS = { | ||
| "phi_4_mini": apply_partial_rotary_emb_single, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this stay in this folder? https://github.com/pytorch/executorch/tree/main/examples/models/phi_4_mini/config or is it qnn specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not qnn-specific.
apply_partial_rotary_emb_single(partial ROPE embedding) is required whenever the condition partial_rotary_factor < 1.0 is met.
I've updated the condition to explicitly check for partial_rotary_factor < 1.0, ensuring that partial ROPE is applied only when the condition is met.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wow...you're still awake...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refer to the config in this folder #13086 instead of having phi specific logic inside static llama?
013d222 to
371c80a
Compare
| if config.partial_rotary_factor < 1: | ||
| self.apply_rope_emb = apply_partial_rotary_emb_single | ||
| else: | ||
| self.apply_rope_emb = apply_rotary_emb_single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Looks like there is conflict, can you resolve? |
Summary: - Support Phi-4-mini-instruct for static llama path - add P-ROPE for phi-4-mini - add EOS tok for Phi-4-mini
46f019a to
d6dda6c
Compare
cccclai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for enabling phi4!
### Summary - Support Phi-4-mini-instruct for static llama path - add P-ROPE for phi-4-mini - add EOS tok for Phi-4-mini ### Test plan ``` python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s $DEVICE -m SM8750 --prompt "I would like to learn python, could you teach me with a simple example?" --temperature 0 --model_mode hybrid --prefill_ar_len 32 --max_seq_len 128 --ptq 16a8w --decoder_model phi_4_mini --num_sharding 4 ``` cc: @haowhsu-quic, @shewu-quic, @winskuo-quic, @cccclai
Summary
Test plan
cc: @haowhsu-quic, @shewu-quic, @winskuo-quic, @cccclai