Skip to content

Conversation

@neuropilot-captain
Copy link
Collaborator

Summary

  1. Added AoT support for qwen2, qwen2.5, qwen3, gemma2, gemma3, phi3, phi4, whisper
  2. Added runner support for qwen, gemma2, phi3

TODO

  1. Add runner support for gemma3, phi4 and whisper.

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14110

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 28 Pending

As of commit ed29c7d with merge base 72d50b2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2025
@github-actions
Copy link

github-actions bot commented Sep 9, 2025

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@neuropilot-captain neuropilot-captain force-pushed the support_qwen_phi_gemma_whisper branch from fb1ea7d to fd52664 Compare September 9, 2025 16:06
@cccclai
Copy link
Contributor

cccclai commented Sep 9, 2025

Thanks! Can you fix the lint error?

@neuropilot-captain
Copy link
Collaborator Author

Hi could you please suggest how to deal with the lint-urls and lint-xrefs errors? For lint-urls the highlighted urls are example urls in the comment section, so should be ignorable. For lint-xrefs we are not sure what the error is. Thanks!

@cccclai
Copy link
Contributor

cccclai commented Sep 10, 2025

Here is the patch

diff --git a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
index 14126e5bc4..f617887b13 100644
--- a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
+++ b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
@@ -1932,7 +1932,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin):
                 Will be removed in v5 of Transformers.
             proxies (`Dict[str, str]`, *optional*):
                 A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
-                'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
+                'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. @lint-ignore
             token (`str` or *bool*, *optional*):
                 The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
                 when running `huggingface-cli login` (stored in `~/.huggingface`).
diff --git a/examples/mediatek/aot_utils/llm_utils/tokenizers_/utils.py b/examples/mediatek/aot_utils/llm_utils/tokenizers_/utils.py
index 8a80d5d6f6..a137e2c982 100644
--- a/examples/mediatek/aot_utils/llm_utils/tokenizers_/utils.py
+++ b/examples/mediatek/aot_utils/llm_utils/tokenizers_/utils.py
@@ -392,7 +392,7 @@ def cached_file(
             Will be removed in v5 of Transformers.
         proxies (`Dict[str, str]`, *optional*):
             A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
-            'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
+            'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request. @lint-ignore
         token (`str` or *bool*, *optional*):
             The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
             when running `huggingface-cli login` (stored in `~/.huggingface`).

You can run ./scripts/lint_urls.sh inside executorch folder and repro the result

@cccclai
Copy link
Contributor

cccclai commented Sep 12, 2025

I'm not sure why `Lint / link-check / lint-xrefs / linux-job (pull_request) keeps running forever..

@cccclai
Copy link
Contributor

cccclai commented Sep 12, 2025

Looks like a bug from ourside, can you add following patch too

diff --git a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_gemma.py b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_gemma.py
index 69bcd0d99c..cd63b44699 100644
--- a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_gemma.py
+++ b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_gemma.py
@@ -308,7 +308,7 @@ class GemmaTokenizer(PreTrainedTokenizer):
                 Optional second list of IDs for sequence pairs.
 
         Returns:
-            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
+            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s). @lint-ignore
         """
         bos_token_id = [self.bos_token_id] if self.add_bos_token else []
         eos_token_id = [self.eos_token_id] if self.add_eos_token else []
diff --git a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
index f617887b13..e620c6f99c 100644
--- a/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
+++ b/examples/mediatek/aot_utils/llm_utils/tokenizers_/tokenization_utils_base.py
@@ -1318,12 +1318,12 @@ ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING = r"""
                 Whether to return token type IDs. If left to the default, will return the token type IDs according to
                 the specific tokenizer's default, defined by the `return_outputs` attribute.
 
-                [What are token type IDs?](../glossary#token-type-ids)
+                [What are token type IDs?](../glossary#token-type-ids) @lint-ignore
             return_attention_mask (`bool`, *optional*):
                 Whether to return the attention mask. If left to the default, will return the attention mask according
                 to the specific tokenizer's default, defined by the `return_outputs` attribute.
 
-                [What are attention masks?](../glossary#attention-mask)
+                [What are attention masks?](../glossary#attention-mask) @lint-ignore
             return_overflowing_tokens (`bool`, *optional*, defaults to `False`):
                 Whether or not to return overflowing token sequences. If a pair of sequences of input ids (or a batch
                 of pairs) is provided with `truncation_strategy = longest_first` or `True`, an error is raised instead
@@ -1346,17 +1346,17 @@ ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING = r"""
 
             - **input_ids** -- List of token ids to be fed to a model.
 
-              [What are input IDs?](../glossary#input-ids)
+              [What are input IDs?](../glossary#input-ids) @lint-ignore
 
             - **token_type_ids** -- List of token type ids to be fed to a model (when `return_token_type_ids=True` or
               if *"token_type_ids"* is in `self.model_input_names`).
 
-              [What are token type IDs?](../glossary#token-type-ids)
+              [What are token type IDs?](../glossary#token-type-ids) @lint-ignore
 
             - **attention_mask** -- List of indices specifying which tokens should be attended to by the model (when
               `return_attention_mask=True` or if *"attention_mask"* is in `self.model_input_names`).
 
-              [What are attention masks?](../glossary#attention-mask)
+              [What are attention masks?](../glossary#attention-mask) @lint-ignore
 
             - **overflowing_tokens** -- List of overflowing tokens sequences (when a `max_length` is specified and
               `return_overflowing_tokens=True`).
@@ -3495,7 +3495,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin):
                 Whether to return the attention mask. If left to the default, will return the attention mask according
                 to the specific tokenizer's default, defined by the `return_outputs` attribute.
 
-                [What are attention masks?](../glossary#attention-mask)
+                [What are attention masks?](../glossary#attention-mask)  @lint-ignore
             return_tensors (`str` or [`~utils.TensorType`], *optional*):
                 If set, will return tensors instead of list of python integers. Acceptable values are:
 
@@ -3621,7 +3621,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin):
     ) -> List[int]:
         """Create the token type IDs corresponding to the sequences passed.
 
-        [What are token type IDs?](../glossary#token-type-ids)
+        [What are token type IDs?](../glossary#token-type-ids) @lint-ignore
 
         Should be overridden in a subclass if the model has a special way of building those.
 
diff --git a/examples/mediatek/aot_utils/mllm_utils/preprocessor_whisper.py b/examples/mediatek/aot_utils/mllm_utils/preprocessor_whisper.py
index ce90a0b1cd..b9e88a9e8e 100644
--- a/examples/mediatek/aot_utils/mllm_utils/preprocessor_whisper.py
+++ b/examples/mediatek/aot_utils/mllm_utils/preprocessor_whisper.py
@@ -175,7 +175,7 @@ class WhisperAudioProcessor(SequenceFeatureExtractor):
                 Whether to return the attention mask. If left to the default, will return the attention mask according
                 to the specific feature_extractor's default.
 
-                [What are attention masks?](../glossary#attention-mask)
+                [What are attention masks?](../glossary#attention-mask) @lint-ignore
 
                 <Tip>
 

@cccclai cccclai added this to the 1.0.0 milestone Sep 12, 2025
@cccclai
Copy link
Contributor

cccclai commented Sep 12, 2025

After double checking the xref errors, it looks like it's due to the large files in the PRs. Can you make the following changes

  1. Remove examples/mediatek/aot_utils/mllm_utils/audio/Jimmy.mp3. It's a bit too large
  2. Remove the tokenizers stored at examples/mediatek/models/llm_models/weights/whisper-large-v3 (for other models as well), and in each repro, ask users to download from the website. Can you also rename weights as tokenizers as they aren't weights.

With the changes above, I'm able to run ./scripts/lint_xrefs.sh successfully. You should be able to repro by

cd executorch
./scripts/lint_xrefs.sh

@neuropilot-captain
Copy link
Collaborator Author

We have remove the large files and ran ./scripts/lint_xrefs.sh successfully on our side, the issue should be solved now.
For the naming part, the idea is to let users download the weights and move them to the respective model folder in the 'weights' repo, hence called weights. Hope that clarifies.
Thanks!

@cccclai
Copy link
Contributor

cccclai commented Sep 12, 2025

We have remove the large files and ran ./scripts/lint_xrefs.sh successfully on our side, the issue should be solved now. For the naming part, the idea is to let users download the weights and move them to the respective model folder in the 'weights' repo, hence called weights. Hope that clarifies. Thanks!

I see, thanks a lot! It passes on my side too. Once the CI job finish, I'll merge

@cccclai cccclai merged commit 95e3b53 into pytorch:main Sep 12, 2025
118 of 121 checks passed
StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request Sep 23, 2025
### Summary
1. Added AoT support for qwen2, qwen2.5, qwen3, gemma2, gemma3, phi3,
phi4, whisper
2. Added runner support for qwen, gemma2, phi3

### TODO
1. Add runner support for gemma3, phi4 and whisper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants