Skip to content

Conversation

@jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented Apr 29, 2025

Add ExecuTorch support for Qwen3 0.6B, 1.7B, and 4B

Qwen3 0.6B

Export with xnnpack + 8da4w quantization

python -m examples.models.llama.export_llama --model qwen3-0_6b --params examples/models/qwen3/0_6b_config.json -kv --use_sdpa_with_kv_cache -X --xnnpack-extended-ops -d fp32 --output_name="qwen3-0_6b_x_8da4w.pte" --verbose -qmode 8da4w

Run with pybindings

python -m examples.models.llama.runner.native --model qwen3-0_6b --pte qwen3-0_6b_x_8da4w.pte  --tokenizer ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json --tokenizer_config ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer_config.json --prompt "Who is the president of the US?" --params examples/models/qwen3/0_6b_config.json --max_len 128 -kv --temperature 0.6

>> Okay, let's see. The user is asking about the president of the US, but they wrote "And who is the president of the US?" and "And who is the president of the US?" So maybe they are using the same question but in a different way. They might be referring to the same president. Let me check. ...

# Some rough stats
Prefill time: 0.24 s
Generation tok/s: 17.15 s
Memory: 826.68 MB

Qwen3 1.7B

Export with xnnpack + 8da4w quantization

python -m examples.models.llama.export_llama --model qwen3-1_7b --params examples/models/qwen3/1_7b_config.json -kv --use_sdpa_with_kv_cache -X --xnnpack-extended-ops -d fp32 --output_name="qwen3-1_7b_x_8da4w.pte" --verbose -qmode 8da4w

Run with pybindings

python -m examples.models.llama.runner.native --model qwen3-1_7b --pte qwen3-1_7b_x_8da4w.pte  --tokenizer ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json --tokenizer_config ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer_config.json --prompt "Who is the president of the US?" --params examples/models/qwen3/1_7b_config.json --max_len 128 -kv --temperature 0.6

# Some rough stats
Prefill time: 0.25 s
Generation tok/s: 16.87 s
Memory: 1.02 GB

Qwen3 4B

Export with xnnpack + 8da4w quantization

python -m examples.models.llama.export_llama --model qwen3-4b --params examples/models/qwen3/4b_config.json -kv --use_sdpa_with_kv_cache -X --xnnpack-extended-ops -d fp32 --output_name="qwen3-4b_x_8da4w.pte" --verbose -qmode 8da4w

Run with pybindings

python -m examples.models.llama.runner.native --model qwen3-4b --pte qwen3-4b_x_8da4w.pte  --tokenizer ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer.json --tokenizer_config ~/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/a9c98e602b9d36d2a2f7ba1eb0f5f31e4e8e5143/tokenizer_config.json --prompt "Who is the president of the US?" --params examples/models/qwen3/4b_config.json --max_len 128 -kv --temperature 0.6

# Some rough stats
Prefill time: 0.44 s
Generation tok/s: 12.12 s
Memory: 2.5 GB

bypass-github-export-checks

@jackzhxng jackzhxng requested a review from lucylq as a code owner April 29, 2025 03:16
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10539

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 21 Pending

As of commit 626a0f0 with merge base 32dffbc (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 29, 2025
@jackzhxng jackzhxng added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Apr 29, 2025
@jackzhxng jackzhxng changed the title Add Qwen3 0.6B Add Qwen3 0.6B, 1.7B, and 4B Apr 29, 2025
@cbilgin
Copy link

cbilgin commented Apr 29, 2025

Is there a copy paste issue or the response to the prompt somehow wrong? Looks like what's vibe coding is the prompt but the response is about US president?

Copy link
Contributor

@tarun292 tarun292 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check with @madhu-fb about the qk_norm changes before landing.

@tarun292
Copy link
Contributor

Is there a copy paste issue or the response to the prompt somehow wrong? Looks like what's vibe coding is the prompt but the response is about US president?

Yea same question, seems like you might not have copied over the full response?

@tarun292
Copy link
Contributor

Please add a README.md for the export flow and also i don't see the config json's for 1.7B and 4B.

@jackzhxng jackzhxng force-pushed the jz/add-qwen3 branch 2 times, most recently from 8412c37 to 8aac481 Compare April 29, 2025 06:41
@mergennachin
Copy link
Contributor

Nice job

Add this to the top-level README - https://github.com/pytorch/executorch/blob/main/README.md?plain=1#L54

The perf benchmarks on desktop is not really representative. As next steps, having instructions on running on iOS and Android phones, and showing the benchmarks would be good. We can publicize the actual screencast on mobile phones more.

@facebook-github-bot
Copy link
Contributor

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jackzhxng jackzhxng merged commit 7b86bf1 into main Apr 29, 2025
91 of 92 checks passed
@jackzhxng jackzhxng deleted the jz/add-qwen3 branch April 29, 2025 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants