Skip to content

Commit da03a0a

Browse files
committed
Add support for granite models
1 parent 7a0f77c commit da03a0a

File tree

5 files changed

+76
-0
lines changed

5 files changed

+76
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
311311
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
312312
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
313313
1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
314+
1. **[Granite](https://huggingface.co/docs/transformers/main/model_doc/granite)** (from IBM) released with the paper [Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler](https://arxiv.org/abs/2408.13359) by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda.
314315
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
315316
1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (from Allegro.pl, AGH University of Science and Technology) released with the paper [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik.
316317
1. **[Hiera](https://huggingface.co/docs/transformers/model_doc/hiera)** (from Meta) released with the paper [Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles](https://arxiv.org/pdf/2306.00989) by Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@
4747
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
4848
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
4949
1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
50+
1. **[Granite](https://huggingface.co/docs/transformers/main/model_doc/granite)** (from IBM) released with the paper [Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler](https://arxiv.org/abs/2408.13359) by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda.
5051
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
5152
1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (from Allegro.pl, AGH University of Science and Technology) released with the paper [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik.
5253
1. **[Hiera](https://huggingface.co/docs/transformers/model_doc/hiera)** (from Meta) released with the paper [Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles](https://arxiv.org/pdf/2306.00989) by Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer.

src/configs.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ function getNormalizedConfig(config) {
9191
mapping['hidden_size'] = 'hidden_size';
9292
break;
9393
case 'llama':
94+
case 'granite':
9495
case 'cohere':
9596
case 'mistral':
9697
case 'starcoder2':

src/models.js

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3952,6 +3952,26 @@ export class LlamaModel extends LlamaPreTrainedModel { }
39523952
export class LlamaForCausalLM extends LlamaPreTrainedModel { }
39533953
//////////////////////////////////////////////////
39543954

3955+
3956+
//////////////////////////////////////////////////
3957+
// Granite models
3958+
export class GranitePreTrainedModel extends PreTrainedModel {
3959+
/**
3960+
* Creates a new instance of the `GranitePreTrainedModel` class.
3961+
* @param {Object} config The model configuration.
3962+
* @param {Record<string, any>} sessions The inference sessions for the model.
3963+
* @param {GenerationConfig} generation_config The generation configuration.
3964+
*/
3965+
constructor(config, sessions, generation_config) {
3966+
super(config, sessions);
3967+
this.generation_config = generation_config;
3968+
}
3969+
}
3970+
export class GraniteModel extends GranitePreTrainedModel { }
3971+
export class GraniteForCausalLM extends GranitePreTrainedModel { }
3972+
//////////////////////////////////////////////////
3973+
3974+
39553975
//////////////////////////////////////////////////
39563976
// Cohere models
39573977

@@ -6471,6 +6491,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
64716491
['gpt_neox', ['GPTNeoXModel', GPTNeoXModel]],
64726492
['codegen', ['CodeGenModel', CodeGenModel]],
64736493
['llama', ['LlamaModel', LlamaModel]],
6494+
['granite', ['GraniteModel', GraniteModel]],
64746495
['cohere', ['CohereModel', CohereModel]],
64756496
['gemma', ['GemmaModel', GemmaModel]],
64766497
['gemma2', ['Gemma2Model', Gemma2Model]],
@@ -6559,6 +6580,7 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
65596580
['gpt_neox', ['GPTNeoXForCausalLM', GPTNeoXForCausalLM]],
65606581
['codegen', ['CodeGenForCausalLM', CodeGenForCausalLM]],
65616582
['llama', ['LlamaForCausalLM', LlamaForCausalLM]],
6583+
['granite', ['GraniteForCausalLM', GraniteForCausalLM]],
65626584
['cohere', ['CohereForCausalLM', CohereForCausalLM]],
65636585
['gemma', ['GemmaForCausalLM', GemmaForCausalLM]],
65646586
['gemma2', ['Gemma2ForCausalLM', Gemma2ForCausalLM]],

tests/tiny_random.test.js

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ import {
2323

2424
// Models
2525
LlamaForCausalLM,
26+
GraniteForCausalLM,
2627
CohereModel,
2728
CohereForCausalLM,
2829
GemmaForCausalLM,
@@ -945,6 +946,56 @@ describe("Tiny random models", () => {
945946
});
946947
});
947948

949+
describe("granite", () => {
950+
describe("GraniteForCausalLM", () => {
951+
const model_id = "hf-internal-testing/tiny-random-GraniteForCausalLM";
952+
/** @type {GraniteForCausalLM} */
953+
let model;
954+
/** @type {GPT2Tokenizer} */
955+
let tokenizer;
956+
beforeAll(async () => {
957+
model = await GraniteForCausalLM.from_pretrained(model_id, {
958+
// TODO move to config
959+
...DEFAULT_MODEL_OPTIONS,
960+
});
961+
tokenizer = await GPT2Tokenizer.from_pretrained(model_id);
962+
}, MAX_MODEL_LOAD_TIME);
963+
964+
it(
965+
"batch_size=1",
966+
async () => {
967+
const inputs = tokenizer("hello");
968+
const outputs = await model.generate({
969+
...inputs,
970+
max_length: 10,
971+
});
972+
expect(outputs.tolist()).toEqual([[7656n, 23147n, 31291n, 1011n, 8768n, 30904n, 9256n, 28368n, 16199n, 26560n]]);
973+
},
974+
MAX_TEST_EXECUTION_TIME,
975+
);
976+
977+
it(
978+
"batch_size>1",
979+
async () => {
980+
const inputs = tokenizer(["hello", "hello world"], { padding: true });
981+
const outputs = await model.generate({
982+
...inputs,
983+
max_length: 10,
984+
});
985+
expect(outputs.tolist()).toEqual([
986+
[0n, 7656n, 23147n, 31291n, 1011n, 8768n, 30904n, 9256n, 28368n, 16199n],
987+
[7656n, 5788n, 9477n, 14490n, 18374n, 28650n, 10907n, 2989n, 14096n, 27403n],
988+
]);
989+
},
990+
MAX_TEST_EXECUTION_TIME,
991+
);
992+
993+
afterAll(async () => {
994+
await model?.dispose();
995+
}, MAX_MODEL_DISPOSE_TIME);
996+
});
997+
});
998+
948999
describe("cohere", () => {
9491000
describe("CohereModel", () => {
9501001
const model_id = "hf-internal-testing/tiny-random-CohereModel";

0 commit comments

Comments
 (0)