Skip to content

Commit 214bb78

Browse files
committed
Add support for OLMo models
1 parent 6bd45ac commit 214bb78

File tree

6 files changed

+66
-0
lines changed

6 files changed

+66
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -365,6 +365,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
365365
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
366366
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
367367
1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (from Meta AI) released with the paper [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic.
368+
1. **[OLMo](https://huggingface.co/docs/transformers/master/model_doc/olmo)** (from AI2) released with the paper [OLMo: Accelerating the Science of Language Models](https://arxiv.org/abs/2402.00838) by Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi.
368369
1. **OpenELM** (from Apple) released with the paper [OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework](https://arxiv.org/abs/2404.14619) by Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari.
369370
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
370371
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@
8080
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
8181
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
8282
1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (from Meta AI) released with the paper [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic.
83+
1. **[OLMo](https://huggingface.co/docs/transformers/master/model_doc/olmo)** (from AI2) released with the paper [OLMo: Accelerating the Science of Language Models](https://arxiv.org/abs/2402.00838) by Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi.
8384
1. **OpenELM** (from Apple) released with the paper [OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework](https://arxiv.org/abs/2404.14619) by Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari.
8485
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
8586
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.

scripts/convert.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
'starcoder2',
4242
'openelm',
4343
'mobilellm',
44+
'olmo',
4445

4546
# Encoder-decoder models
4647
'whisper',

src/configs.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ function getNormalizedConfig(config) {
9191
mapping['hidden_size'] = 'hidden_size';
9292
break;
9393
case 'llama':
94+
case 'olmo':
9495
case 'mobilellm':
9596
case 'granite':
9697
case 'cohere':

src/models.js

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3818,6 +3818,14 @@ export class MobileLLMForCausalLM extends MobileLLMPreTrainedModel { }
38183818
//////////////////////////////////////////////////
38193819

38203820

3821+
//////////////////////////////////////////////////
3822+
// OLMo models
3823+
export class OlmoPreTrainedModel extends PreTrainedModel { }
3824+
export class OlmoModel extends OlmoPreTrainedModel { }
3825+
export class OlmoForCausalLM extends OlmoPreTrainedModel { }
3826+
//////////////////////////////////////////////////
3827+
3828+
38213829
//////////////////////////////////////////////////
38223830
// Granite models
38233831
export class GranitePreTrainedModel extends PreTrainedModel { }
@@ -6133,6 +6141,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
61336141
['gpt_neox', ['GPTNeoXModel', GPTNeoXModel]],
61346142
['codegen', ['CodeGenModel', CodeGenModel]],
61356143
['llama', ['LlamaModel', LlamaModel]],
6144+
['olmo', ['OlmoModel', OlmoModel]],
61366145
['mobilellm', ['MobileLLMModel', MobileLLMModel]],
61376146
['granite', ['GraniteModel', GraniteModel]],
61386147
['cohere', ['CohereModel', CohereModel]],
@@ -6223,6 +6232,7 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
62236232
['gpt_neox', ['GPTNeoXForCausalLM', GPTNeoXForCausalLM]],
62246233
['codegen', ['CodeGenForCausalLM', CodeGenForCausalLM]],
62256234
['llama', ['LlamaForCausalLM', LlamaForCausalLM]],
6235+
['olmo', ['OlmoForCausalLM', OlmoForCausalLM]],
62266236
['mobilellm', ['MobileLLMForCausalLM', MobileLLMForCausalLM]],
62276237
['granite', ['GraniteForCausalLM', GraniteForCausalLM]],
62286238
['cohere', ['CohereForCausalLM', CohereForCausalLM]],

tests/tiny_random.test.js

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ import {
2323

2424
// Models
2525
LlamaForCausalLM,
26+
OlmoForCausalLM,
2627
GraniteForCausalLM,
2728
CohereModel,
2829
CohereForCausalLM,
@@ -1033,6 +1034,57 @@ describe("Tiny random models", () => {
10331034
});
10341035
});
10351036

1037+
describe("olmo", () => {
1038+
describe("OlmoForCausalLM", () => {
1039+
const model_id = "onnx-community/tiny-random-olmo-hf";
1040+
/** @type {OlmoForCausalLM} */
1041+
let model;
1042+
/** @type {GPTNeoXTokenizer} */
1043+
let tokenizer;
1044+
beforeAll(async () => {
1045+
model = await OlmoForCausalLM.from_pretrained(model_id, {
1046+
// TODO move to config
1047+
...DEFAULT_MODEL_OPTIONS,
1048+
});
1049+
tokenizer = await GPTNeoXTokenizer.from_pretrained(model_id);
1050+
tokenizer.padding_side = "left";
1051+
}, MAX_MODEL_LOAD_TIME);
1052+
1053+
it(
1054+
"batch_size=1",
1055+
async () => {
1056+
const inputs = tokenizer("hello");
1057+
const outputs = await model.generate({
1058+
...inputs,
1059+
max_length: 10,
1060+
});
1061+
expect(outputs.tolist()).toEqual([[25521n, 10886n, 44936n, 38777n, 33038n, 18557n, 1810n, 33853n, 9517n, 28892n]]);
1062+
},
1063+
MAX_TEST_EXECUTION_TIME,
1064+
);
1065+
1066+
it(
1067+
"batch_size>1",
1068+
async () => {
1069+
const inputs = tokenizer(["hello", "hello world"], { padding: true });
1070+
const outputs = await model.generate({
1071+
...inputs,
1072+
max_length: 10,
1073+
});
1074+
expect(outputs.tolist()).toEqual([
1075+
[1n, 25521n, 10886n, 44936n, 38777n, 33038n, 18557n, 1810n, 33853n, 9517n],
1076+
[25521n, 1533n, 37199n, 27362n, 30594n, 39261n, 8824n, 19175n, 8545n, 29335n],
1077+
]);
1078+
},
1079+
MAX_TEST_EXECUTION_TIME,
1080+
);
1081+
1082+
afterAll(async () => {
1083+
await model?.dispose();
1084+
}, MAX_MODEL_DISPOSE_TIME);
1085+
});
1086+
});
1087+
10361088
describe("granite", () => {
10371089
describe("GraniteForCausalLM", () => {
10381090
const model_id = "hf-internal-testing/tiny-random-GraniteForCausalLM";

0 commit comments

Comments
 (0)