Skip to content

Commit c91248c

Browse files
authored
Add support for RoFormer models (#464)
* Add `RoFormerTokenizer * Use `clean_text` in bert normalizer config * Add control characters test * Add support for RoFormer models * Use default label if id2label is not specified * Update requirements.txt * Skip roformer tokenizer tests
1 parent 7636a1c commit c91248c

File tree

8 files changed

+183
-6
lines changed

8 files changed

+183
-6
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -324,6 +324,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
324324
1. **[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
325325
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
326326
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
327+
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
327328
1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
328329
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
329330
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@
5959
1. **[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
6060
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
6161
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
62+
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
6263
1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
6364
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
6465
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.

scripts/supported_models.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -650,6 +650,44 @@
650650
'microsoft/resnet-152',
651651
],
652652
},
653+
'roformer': {
654+
# Feature extraction
655+
'feature-extraction': [
656+
'hf-tiny-model-private/tiny-random-RoFormerModel',
657+
],
658+
659+
# Text classification
660+
'text-classification': [
661+
'hf-tiny-model-private/tiny-random-RoFormerForSequenceClassification',
662+
],
663+
664+
# Token classification
665+
'token-classification': [
666+
'hf-tiny-model-private/tiny-random-RoFormerForTokenClassification',
667+
],
668+
669+
# TODO
670+
# # Text generation
671+
# 'text-generation': [
672+
# 'hf-tiny-model-private/tiny-random-RoFormerForCausalLM',
673+
# ],
674+
675+
# Masked language modelling
676+
'fill-mask': [
677+
'alchemab/antiberta2',
678+
'hf-tiny-model-private/tiny-random-RoFormerForMaskedLM',
679+
],
680+
681+
# Question answering
682+
'question-answering': [
683+
'hf-tiny-model-private/tiny-random-RoFormerForQuestionAnswering',
684+
],
685+
686+
# Multiple choice
687+
'multiple-choice': [
688+
'hf-tiny-model-private/tiny-random-RoFormerForMultipleChoice',
689+
],
690+
},
653691
'phi': {
654692
# Text generation
655693
'text-generation': [
@@ -747,6 +785,8 @@
747785
'MBZUAI/LaMini-T5-61M',
748786
'MBZUAI/LaMini-T5-223M',
749787
'MBZUAI/LaMini-T5-738M',
788+
'declare-lab/flan-alpaca-base',
789+
'declare-lab/flan-alpaca-large',
750790
],
751791

752792
# Feature extraction

src/models.js

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1464,6 +1464,78 @@ export class BertForQuestionAnswering extends BertPreTrainedModel {
14641464
}
14651465
//////////////////////////////////////////////////
14661466

1467+
//////////////////////////////////////////////////
1468+
// RoFormer models
1469+
export class RoFormerPreTrainedModel extends PreTrainedModel { }
1470+
1471+
/**
1472+
* The bare RoFormer Model transformer outputting raw hidden-states without any specific head on top.
1473+
*/
1474+
export class RoFormerModel extends RoFormerPreTrainedModel { }
1475+
1476+
/**
1477+
* RoFormer Model with a `language modeling` head on top.
1478+
*/
1479+
export class RoFormerForMaskedLM extends RoFormerPreTrainedModel {
1480+
/**
1481+
* Calls the model on new inputs.
1482+
*
1483+
* @param {Object} model_inputs The inputs to the model.
1484+
* @returns {Promise<MaskedLMOutput>} An object containing the model's output logits for masked language modeling.
1485+
*/
1486+
async _call(model_inputs) {
1487+
return new MaskedLMOutput(await super._call(model_inputs));
1488+
}
1489+
}
1490+
1491+
/**
1492+
* RoFormer Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output)
1493+
*/
1494+
export class RoFormerForSequenceClassification extends RoFormerPreTrainedModel {
1495+
/**
1496+
* Calls the model on new inputs.
1497+
*
1498+
* @param {Object} model_inputs The inputs to the model.
1499+
* @returns {Promise<SequenceClassifierOutput>} An object containing the model's output logits for sequence classification.
1500+
*/
1501+
async _call(model_inputs) {
1502+
return new SequenceClassifierOutput(await super._call(model_inputs));
1503+
}
1504+
}
1505+
1506+
/**
1507+
* RoFormer Model with a token classification head on top (a linear layer on top of the hidden-states output)
1508+
* e.g. for Named-Entity-Recognition (NER) tasks.
1509+
*/
1510+
export class RoFormerForTokenClassification extends RoFormerPreTrainedModel {
1511+
/**
1512+
* Calls the model on new inputs.
1513+
*
1514+
* @param {Object} model_inputs The inputs to the model.
1515+
* @returns {Promise<TokenClassifierOutput>} An object containing the model's output logits for token classification.
1516+
*/
1517+
async _call(model_inputs) {
1518+
return new TokenClassifierOutput(await super._call(model_inputs));
1519+
}
1520+
}
1521+
1522+
/**
1523+
* RoFormer Model with a span classification head on top for extractive question-answering tasks like SQuAD
1524+
* (a linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
1525+
*/
1526+
export class RoFormerForQuestionAnswering extends RoFormerPreTrainedModel {
1527+
/**
1528+
* Calls the model on new inputs.
1529+
*
1530+
* @param {Object} model_inputs The inputs to the model.
1531+
* @returns {Promise<QuestionAnsweringModelOutput>} An object containing the model's output logits for question answering.
1532+
*/
1533+
async _call(model_inputs) {
1534+
return new QuestionAnsweringModelOutput(await super._call(model_inputs));
1535+
}
1536+
}
1537+
// TODO: Add RoFormerForCausalLM and RoFormerForMultipleChoice
1538+
//////////////////////////////////////////////////
14671539

14681540
//////////////////////////////////////////////////
14691541
// ConvBert models
@@ -4671,6 +4743,7 @@ export class PretrainedMixin {
46714743

46724744
const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
46734745
['bert', ['BertModel', BertModel]],
4746+
['roformer', ['RoFormerModel', RoFormerModel]],
46744747
['electra', ['ElectraModel', ElectraModel]],
46754748
['esm', ['EsmModel', EsmModel]],
46764749
['convbert', ['ConvBertModel', ConvBertModel]],
@@ -4756,6 +4829,7 @@ const MODEL_FOR_TEXT_TO_SPECTROGRAM_MAPPING_NAMES = new Map([
47564829

47574830
const MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = new Map([
47584831
['bert', ['BertForSequenceClassification', BertForSequenceClassification]],
4832+
['roformer', ['RoFormerForSequenceClassification', RoFormerForSequenceClassification]],
47594833
['electra', ['ElectraForSequenceClassification', ElectraForSequenceClassification]],
47604834
['esm', ['EsmForSequenceClassification', EsmForSequenceClassification]],
47614835
['convbert', ['ConvBertForSequenceClassification', ConvBertForSequenceClassification]],
@@ -4776,6 +4850,7 @@ const MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = new Map([
47764850

47774851
const MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = new Map([
47784852
['bert', ['BertForTokenClassification', BertForTokenClassification]],
4853+
['roformer', ['RoFormerForTokenClassification', RoFormerForTokenClassification]],
47794854
['electra', ['ElectraForTokenClassification', ElectraForTokenClassification]],
47804855
['esm', ['EsmForTokenClassification', EsmForTokenClassification]],
47814856
['convbert', ['ConvBertForTokenClassification', ConvBertForTokenClassification]],
@@ -4821,6 +4896,7 @@ const MODEL_WITH_LM_HEAD_MAPPING_NAMES = new Map([
48214896

48224897
const MODEL_FOR_MASKED_LM_MAPPING_NAMES = new Map([
48234898
['bert', ['BertForMaskedLM', BertForMaskedLM]],
4899+
['roformer', ['RoFormerForMaskedLM', RoFormerForMaskedLM]],
48244900
['electra', ['ElectraForMaskedLM', ElectraForMaskedLM]],
48254901
['esm', ['EsmForMaskedLM', EsmForMaskedLM]],
48264902
['convbert', ['ConvBertForMaskedLM', ConvBertForMaskedLM]],
@@ -4839,6 +4915,7 @@ const MODEL_FOR_MASKED_LM_MAPPING_NAMES = new Map([
48394915

48404916
const MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = new Map([
48414917
['bert', ['BertForQuestionAnswering', BertForQuestionAnswering]],
4918+
['roformer', ['RoFormerForQuestionAnswering', RoFormerForQuestionAnswering]],
48424919
['electra', ['ElectraForQuestionAnswering', ElectraForQuestionAnswering]],
48434920
['convbert', ['ConvBertForQuestionAnswering', ConvBertForQuestionAnswering]],
48444921
['camembert', ['CamembertForQuestionAnswering', CamembertForQuestionAnswering]],

src/pipelines.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ export class TokenClassificationPipeline extends Pipeline {
280280
let tokenData = batch[j];
281281
let topScoreIndex = max(tokenData.data)[1];
282282

283-
let entity = id2label[topScoreIndex];
283+
let entity = id2label ? id2label[topScoreIndex] : `LABEL_${topScoreIndex}`;
284284
if (ignore_labels.includes(entity)) {
285285
// We predicted a token that should be ignored. So, we skip it.
286286
continue;

src/tokenizers.js

Lines changed: 56 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1182,17 +1182,61 @@ class BertNormalizer extends Normalizer {
11821182
return text.normalize('NFD').replace(/[\u0300-\u036f]/g, '');
11831183
}
11841184

1185+
1186+
/**
1187+
* Checks whether `char` is a control character.
1188+
* @param {string} char The character to check.
1189+
* @returns {boolean} Whether `char` is a control character.
1190+
* @private
1191+
*/
1192+
_is_control(char) {
1193+
switch (char) {
1194+
case '\t':
1195+
case '\n':
1196+
case '\r':
1197+
// These are technically control characters but we count them as whitespace characters.
1198+
return false;
1199+
1200+
default:
1201+
// Check if unicode category starts with C:
1202+
// Cc - Control
1203+
// Cf - Format
1204+
// Co - Private Use
1205+
// Cs - Surrogate
1206+
return /^\p{Cc}|\p{Cf}|\p{Co}|\p{Cs}$/u.test(char);
1207+
}
1208+
}
1209+
1210+
/**
1211+
* Performs invalid character removal and whitespace cleanup on text.
1212+
* @param {string} text The text to clean.
1213+
* @returns {string} The cleaned text.
1214+
* @private
1215+
*/
1216+
_clean_text(text) {
1217+
const output = [];
1218+
for (const char of text) {
1219+
const cp = char.charCodeAt(0);
1220+
if (cp === 0 || cp === 0xFFFD || this._is_control(char)) {
1221+
continue;
1222+
}
1223+
if (/^\s$/.test(char)) { // is whitespace
1224+
output.push(" ");
1225+
} else {
1226+
output.push(char);
1227+
}
1228+
}
1229+
return output.join("");
1230+
}
11851231
/**
11861232
* Normalizes the given text based on the configuration.
11871233
* @param {string} text The text to normalize.
11881234
* @returns {string} The normalized text.
11891235
*/
11901236
normalize(text) {
1191-
// TODO use rest of config
1192-
// config.clean_text,
1193-
// config.handle_chinese_chars,
1194-
// config.strip_accents,
1195-
// config.lowercase,
1237+
if (this.config.clean_text) {
1238+
text = this._clean_text(text);
1239+
}
11961240

11971241
if (this.config.handle_chinese_chars) {
11981242
text = this._tokenize_chinese_chars(text);
@@ -2944,6 +2988,12 @@ export class ConvBertTokenizer extends PreTrainedTokenizer {
29442988
return add_token_types(inputs);
29452989
}
29462990
}
2991+
export class RoFormerTokenizer extends PreTrainedTokenizer {
2992+
/** @type {add_token_types} */
2993+
prepare_model_inputs(inputs) {
2994+
return add_token_types(inputs);
2995+
}
2996+
}
29472997
export class DistilBertTokenizer extends PreTrainedTokenizer { }
29482998
export class CamembertTokenizer extends PreTrainedTokenizer { }
29492999
export class XLMTokenizer extends PreTrainedTokenizer {
@@ -4136,6 +4186,7 @@ export class AutoTokenizer {
41364186
BertTokenizer,
41374187
HerbertTokenizer,
41384188
ConvBertTokenizer,
4189+
RoFormerTokenizer,
41394190
XLMTokenizer,
41404191
ElectraTokenizer,
41414192
MobileBertTokenizer,

tests/generate_tests.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@
3939

4040
# TODO: remove when https://github.com/huggingface/transformers/issues/26547 is fixed
4141
'speecht5',
42+
43+
# TODO: remove when https://github.com/huggingface/transformers/issues/28164 is fixed
44+
'roformer',
4245
]
4346

4447
TOKENIZERS_TO_IGNORE = [
@@ -80,6 +83,9 @@
8083
"<s>\n",
8184
" </s> test </s> ",
8285
"</s>test</s>",
86+
87+
# Control characters
88+
"1\u00002\uFFFD3",
8389
],
8490
"custom_by_model_type": {
8591
"llama": [

tests/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@ transformers[torch]@git+https://github.com/huggingface/transformers
22
sacremoses==0.0.53
33
sentencepiece==0.1.99
44
protobuf==4.24.3
5+
rjieba==0.1.11

0 commit comments

Comments
 (0)