Support qwen phi gemma whisper #14110

neuropilot-captain · 2025-09-09T09:43:36Z

Summary

Added AoT support for qwen2, qwen2.5, qwen3, gemma2, gemma3, phi3, phi4, whisper
Added runner support for qwen, gemma2, phi3

TODO

Add runner support for gemma3, phi4 and whisper.

…t backend

Support weight sharing in MTK Runtime

…_phi_gemma_whisper

pytorch-bot · 2025-09-09T09:43:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14110

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 28 Pending

As of commit ed29c7d with merge base 72d50b2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-09-09T09:44:16Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

cccclai · 2025-09-09T16:50:20Z

Thanks! Can you fix the lint error?

neuropilot-captain · 2025-09-10T07:08:41Z

Hi could you please suggest how to deal with the lint-urls and lint-xrefs errors? For lint-urls the highlighted urls are example urls in the comment section, so should be ignorable. For lint-xrefs we are not sure what the error is. Thanks!

cccclai · 2025-09-10T19:02:35Z

Here is the patch

diff --git a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
index 14126e5bc4..f617887b13 100644
--- a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
+++ b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
@@ -1932,7 +1932,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin):
                 Will be removed in v5 of Transformers.
             proxies (`Dict[str, str]`, *optional*):
                 A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
-                'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
+                'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. @lint-ignore
             token (`str` or *bool*, *optional*):
                 The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
                 when running `huggingface-cli login` (stored in `~/.huggingface`).
diff --git a/examples/mediatek/aot_utils/llm_utils/tokenizers_/utils.py b/examples/mediatek/aot_utils/llm_utils/tokenizers_/utils.py
index 8a80d5d6f6..a137e2c982 100644
--- a/examples/mediatek/aot_utils/llm_utils/tokenizers_/utils.py
+++ b/examples/mediatek/aot_utils/llm_utils/tokenizers_/utils.py
@@ -392,7 +392,7 @@ def cached_file(
             Will be removed in v5 of Transformers.
         proxies (`Dict[str, str]`, *optional*):
             A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
-            'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
+            'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request. @lint-ignore
         token (`str` or *bool*, *optional*):
             The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
             when running `huggingface-cli login` (stored in `~/.huggingface`).

You can run ./scripts/lint_urls.sh inside executorch folder and repro the result

cccclai · 2025-09-12T03:32:35Z

I'm not sure why `Lint / link-check / lint-xrefs / linux-job (pull_request) keeps running forever..

cccclai · 2025-09-12T05:01:53Z

Looks like a bug from ourside, can you add following patch too

diff --git a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_gemma.py b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_gemma.py
index 69bcd0d99c..cd63b44699 100644
--- a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_gemma.py
+++ b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_gemma.py
@@ -308,7 +308,7 @@ class GemmaTokenizer(PreTrainedTokenizer):
                 Optional second list of IDs for sequence pairs.
 
         Returns:
-            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
+            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s). @lint-ignore
         """
         bos_token_id = [self.bos_token_id] if self.add_bos_token else []
         eos_token_id = [self.eos_token_id] if self.add_eos_token else []
diff --git a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
index f617887b13..e620c6f99c 100644
--- a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
+++ b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
@@ -1318,12 +1318,12 @@ ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING = r"""
                 Whether to return token type IDs. If left to the default, will return the token type IDs according to
                 the specific tokenizer's default, defined by the `return_outputs` attribute.
 
-                [What are token type IDs?](../glossary#token-type-ids)
+                [What are token type IDs?](../glossary#token-type-ids) @lint-ignore
             return_attention_mask (`bool`, *optional*):
                 Whether to return the attention mask. If left to the default, will return the attention mask according
                 to the specific tokenizer's default, defined by the `return_outputs` attribute.
 
-                [What are attention masks?](../glossary#attention-mask)
+                [What are attention masks?](../glossary#attention-mask) @lint-ignore
             return_overflowing_tokens (`bool`, *optional*, defaults to `False`):
                 Whether or not to return overflowing token sequences. If a pair of sequences of input ids (or a batch
                 of pairs) is provided with `truncation_strategy = longest_first` or `True`, an error is raised instead
@@ -1346,17 +1346,17 @@ ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING = r"""
 
             - **input_ids** -- List of token ids to be fed to a model.
 
-              [What are input IDs?](../glossary#input-ids)
+              [What are input IDs?](../glossary#input-ids) @lint-ignore
 
             - **token_type_ids** -- List of token type ids to be fed to a model (when `return_token_type_ids=True` or
               if *"token_type_ids"* is in `self.model_input_names`).
 
-              [What are token type IDs?](../glossary#token-type-ids)
+              [What are token type IDs?](../glossary#token-type-ids) @lint-ignore
 
             - **attention_mask** -- List of indices specifying which tokens should be attended to by the model (when
               `return_attention_mask=True` or if *"attention_mask"* is in `self.model_input_names`).
 
-              [What are attention masks?](../glossary#attention-mask)
+              [What are attention masks?](../glossary#attention-mask) @lint-ignore
 
             - **overflowing_tokens** -- List of overflowing tokens sequences (when a `max_length` is specified and
               `return_overflowing_tokens=True`).
@@ -3495,7 +3495,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin):
                 Whether to return the attention mask. If left to the default, will return the attention mask according
                 to the specific tokenizer's default, defined by the `return_outputs` attribute.
 
-                [What are attention masks?](../glossary#attention-mask)
+                [What are attention masks?](../glossary#attention-mask)  @lint-ignore
             return_tensors (`str` or [`~utils.TensorType`], *optional*):
                 If set, will return tensors instead of list of python integers. Acceptable values are:
 
@@ -3621,7 +3621,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin):
     ) -> List[int]:
         """Create the token type IDs corresponding to the sequences passed.
 
-        [What are token type IDs?](../glossary#token-type-ids)
+        [What are token type IDs?](../glossary#token-type-ids) @lint-ignore
 
         Should be overridden in a subclass if the model has a special way of building those.
 
diff --git a/examples/mediatek/aot_utils/mllm_utils/preprocessor_whisper.py b/examples/mediatek/aot_utils/mllm_utils/preprocessor_whisper.py
index ce90a0b1cd..b9e88a9e8e 100644
--- a/examples/mediatek/aot_utils/mllm_utils/preprocessor_whisper.py
+++ b/examples/mediatek/aot_utils/mllm_utils/preprocessor_whisper.py
@@ -175,7 +175,7 @@ class WhisperAudioProcessor(SequenceFeatureExtractor):
                 Whether to return the attention mask. If left to the default, will return the attention mask according
                 to the specific feature_extractor's default.
 
-                [What are attention masks?](../glossary#attention-mask)
+                [What are attention masks?](../glossary#attention-mask) @lint-ignore
 
                 <Tip>

cccclai · 2025-09-12T16:56:14Z

After double checking the xref errors, it looks like it's due to the large files in the PRs. Can you make the following changes

Remove examples/mediatek/aot_utils/mllm_utils/audio/Jimmy.mp3. It's a bit too large
Remove the tokenizers stored at examples/mediatek/models/llm_models/weights/whisper-large-v3 (for other models as well), and in each repro, ask users to download from the website. Can you also rename weights as tokenizers as they aren't weights.

With the changes above, I'm able to run ./scripts/lint_xrefs.sh successfully. You should be able to repro by

cd executorch
./scripts/lint_xrefs.sh

neuropilot-captain · 2025-09-12T19:14:32Z

We have remove the large files and ran ./scripts/lint_xrefs.sh successfully on our side, the issue should be solved now.
For the naming part, the idea is to let users download the weights and move them to the respective model folder in the 'weights' repo, hence called weights. Hope that clarifies.
Thanks!

cccclai · 2025-09-12T19:15:12Z

We have remove the large files and ran ./scripts/lint_xrefs.sh successfully on our side, the issue should be solved now. For the naming part, the idea is to let users download the weights and move them to the respective model folder in the 'weights' repo, hence called weights. Hope that clarifies. Thanks!

I see, thanks a lot! It passes on my side too. Once the CI job finish, I'll merge

### Summary 1. Added AoT support for qwen2, qwen2.5, qwen3, gemma2, gemma3, phi3, phi4, whisper 2. Added runner support for qwen, gemma2, phi3 ### TODO 1. Add runner support for gemma3, phi4 and whisper.

neuropilot-captain added 24 commits May 12, 2025 19:53

Support preprocess_multimethod with extracted_share_data in Neuropilo…

5350547

…t backend

Support weight sharing in MTK Runtime

705f94e

Apply lintrunner

e8e7429

remove dependancy to getPaddedSize

cbcb919

Add shared weights flow to llama export script

a0bfa5d

Refine code

a6da626

Merge pull request #3 from neuropilot-captain/extract_share_runtime

7e6a7d6

Support weight sharing in MTK Runtime

Added support for Qwen, Phi, Gemma & Whisper

39f3c5c

Fix lintrunner errors

56b19fb

Bug fix, lintrunner error fix & qwen3 gemma2 runner support

2f4b9ad

Fix backend IO order bug

84c81a3

First working llama shared weights flow

486dd4e

Merge remote-tracking branch 'upstream/main' into extract_share

7cc7321

Merge branch 'main' into extract_share

b2303b2

Fix conflict

dbe864d

Update for delegate interface changes

6491161

Fix gemma3 AoT SWA Mask

40d6d43

Added platform-config argument

62352f4

Update runner support for varying number of inputs

34e286b

Merge remote-tracking branch 'origin/extract_share' into support_qwen…

d417b06

…_phi_gemma_whisper

Support share weights for phi, gemma, qwen

781e953

Fix lint errors and update llama sample run script

6b30094

Merge remote-tracking branch 'origin/extract_share' into support_qwen…

ea5cd4d

…_phi_gemma_whisper

Update runner for weights sharing

fd52664

neuropilot-captain requested review from cccclai, kirklandsign and larryliu0820 as code owners September 9, 2025 09:43

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2025

neuropilot-captain force-pushed the support_qwen_phi_gemma_whisper branch from fb1ea7d to fd52664 Compare September 9, 2025 16:06

neuropilot-captain added 4 commits September 10, 2025 00:10

Revert for fix conflict

14ce449

Merge branch 'main' into support_qwen_phi_gemma_whisper

0a28404

Merge branch 'main' into support_qwen_phi_gemma_whisper

a427f62

Fix conflicts

018e574

neuropilot-captain added 2 commits September 10, 2025 10:02

Fix lint errors

f481c2c

Merge branch 'main' into support_qwen_phi_gemma_whisper

31ac425

Merge branch 'main' into support_qwen_phi_gemma_whisper

855a143

neuropilot-captain added 2 commits September 11, 2025 11:49

Fix lint-url errors

f09ffd3

Fix lint-url error

a8ccdb6

cccclai approved these changes Sep 11, 2025

View reviewed changes

Merge branch 'main' into support_qwen_phi_gemma_whisper

4f65d18

neuropilot-captain added 2 commits September 12, 2025 13:53

fix ci lint error with workaround patch

8e237e2

Merge branch 'main' into support_qwen_phi_gemma_whisper

ddc8ce8

cccclai added this to the 1.0.0 milestone Sep 12, 2025

neuropilot-captain added 3 commits September 13, 2025 02:02

remove large files for lint

97e73d2

Remove large files

66cad81

Add gemma3 and phi4 run.sh

ed29c7d

cccclai approved these changes Sep 12, 2025

View reviewed changes

cccclai merged commit 95e3b53 into pytorch:main Sep 12, 2025
118 of 121 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support qwen phi gemma whisper #14110

Support qwen phi gemma whisper #14110

Uh oh!

neuropilot-captain commented Sep 9, 2025

Uh oh!

pytorch-bot bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 9, 2025

Uh oh!

cccclai commented Sep 9, 2025

Uh oh!

neuropilot-captain commented Sep 10, 2025

Uh oh!

cccclai commented Sep 10, 2025

Uh oh!

cccclai commented Sep 12, 2025

Uh oh!

cccclai commented Sep 12, 2025

Uh oh!

cccclai commented Sep 12, 2025

Uh oh!

neuropilot-captain commented Sep 12, 2025

Uh oh!

cccclai commented Sep 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support qwen phi gemma whisper #14110

Support qwen phi gemma whisper #14110

Uh oh!

Conversation

neuropilot-captain commented Sep 9, 2025

Summary

TODO

Uh oh!

pytorch-bot bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14110

⏳ No Failures, 28 Pending

Uh oh!

github-actions bot commented Sep 9, 2025

This PR needs a release notes: label

Uh oh!

cccclai commented Sep 9, 2025

Uh oh!

neuropilot-captain commented Sep 10, 2025

Uh oh!

cccclai commented Sep 10, 2025

Uh oh!

cccclai commented Sep 12, 2025

Uh oh!

cccclai commented Sep 12, 2025

Uh oh!

cccclai commented Sep 12, 2025

Uh oh!

neuropilot-captain commented Sep 12, 2025

Uh oh!

cccclai commented Sep 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Sep 9, 2025 •

edited

Loading

This PR needs a `release notes:` label