Skip to content

Commit bf1412e

Browse files
authored
Add support for MobileLLM (#1003)
1 parent f41416c commit bf1412e

File tree

5 files changed

+14
-0
lines changed

5 files changed

+14
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -352,6 +352,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
352352
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
353353
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
354354
1. **MobileCLIP** (from Apple) released with the paper [MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training](https://arxiv.org/abs/2311.17049) by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.
355+
1. **MobileLLM** (from Meta) released with the paper [MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases](https://arxiv.org/abs/2402.14905) by Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra.
355356
1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (from Google Inc.) released with the paper [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.
356357
1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (from Google Inc.) released with the paper [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.
357358
1. **MobileNetV3** (from Google Inc.) released with the paper [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) by Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
6868
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
6969
1. **MobileCLIP** (from Apple) released with the paper [MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training](https://arxiv.org/abs/2311.17049) by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.
70+
1. **MobileLLM** (from Meta) released with the paper [MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases](https://arxiv.org/abs/2402.14905) by Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra.
7071
1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (from Google Inc.) released with the paper [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.
7172
1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (from Google Inc.) released with the paper [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.
7273
1. **MobileNetV3** (from Google Inc.) released with the paper [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) by Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam.

scripts/convert.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
'starcoder2',
4242
'openelm',
4343
'gemma',
44+
'mobilellm',
4445

4546
# Encoder-decoder models
4647
'whisper',

src/configs.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ function getNormalizedConfig(config) {
9191
mapping['hidden_size'] = 'hidden_size';
9292
break;
9393
case 'llama':
94+
case 'mobilellm':
9495
case 'granite':
9596
case 'cohere':
9697
case 'mistral':

src/models.js

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3810,6 +3810,14 @@ export class LlamaForCausalLM extends LlamaPreTrainedModel { }
38103810
//////////////////////////////////////////////////
38113811

38123812

3813+
//////////////////////////////////////////////////
3814+
// MobileLLM models
3815+
export class MobileLLMPreTrainedModel extends PreTrainedModel { }
3816+
export class MobileLLMModel extends MobileLLMPreTrainedModel { }
3817+
export class MobileLLMForCausalLM extends MobileLLMPreTrainedModel { }
3818+
//////////////////////////////////////////////////
3819+
3820+
38133821
//////////////////////////////////////////////////
38143822
// Granite models
38153823
export class GranitePreTrainedModel extends PreTrainedModel { }
@@ -6125,6 +6133,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
61256133
['gpt_neox', ['GPTNeoXModel', GPTNeoXModel]],
61266134
['codegen', ['CodeGenModel', CodeGenModel]],
61276135
['llama', ['LlamaModel', LlamaModel]],
6136+
['mobilellm', ['MobileLLMModel', MobileLLMModel]],
61286137
['granite', ['GraniteModel', GraniteModel]],
61296138
['cohere', ['CohereModel', CohereModel]],
61306139
['gemma', ['GemmaModel', GemmaModel]],
@@ -6214,6 +6223,7 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
62146223
['gpt_neox', ['GPTNeoXForCausalLM', GPTNeoXForCausalLM]],
62156224
['codegen', ['CodeGenForCausalLM', CodeGenForCausalLM]],
62166225
['llama', ['LlamaForCausalLM', LlamaForCausalLM]],
6226+
['mobilellm', ['MobileLLMForCausalLM', MobileLLMForCausalLM]],
62176227
['granite', ['GraniteForCausalLM', GraniteForCausalLM]],
62186228
['cohere', ['CohereForCausalLM', CohereForCausalLM]],
62196229
['gemma', ['GemmaForCausalLM', GemmaForCausalLM]],

0 commit comments

Comments
 (0)