Add support for Deberta models (#244)

xenova · Aschen · web-flow · commit 1e157ba2d870 · 2023-08-09T11:58:16.000+02:00
* add documentation for zero shot classification

* add multi_label example

* review comments

* edit examples data

* Add deberta and deberta-v2 model definitions

* Update model mapping

* Implement missing `Strip` normalizer

* Add deberta and deberta-v2 tokenizers

* Add fast path to `Strip` normalizer

* Add token types to deberta tokenizer output

* Update supported_models.py

* Fix default Precompiled normalization

* Update supported models list

* Update JSDoc

* Support `not_entailment` label

* Update mult-label example JSDoc

---------

Co-authored-by: Aschen &lt;amaret93@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -258,6 +258,8 @@ You can refine your search by selecting the task you're interested in (e.g., [te
 1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
 1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
 1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
+1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
+1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
 1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
 1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
 1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
diff --git a/docs/snippets/6_supported-models.snippet b/docs/snippets/6_supported-models.snippet
@@ -6,6 +6,8 @@
 1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
 1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
 1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
+1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
+1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
 1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
 1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
 1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
diff --git a/scripts/supported_models.py b/scripts/supported_models.py
@@ -82,6 +82,24 @@
         'Salesforce/codegen-350M-multi',
         'Salesforce/codegen-350M-nl',
     ],
+    'deberta': [
+        'cross-encoder/nli-deberta-base',
+        'Narsil/deberta-large-mnli-zero-cls',
+    ],
+    'deberta-v2': [
+        'cross-encoder/nli-deberta-v3-xsmall',
+        'cross-encoder/nli-deberta-v3-small',
+        'cross-encoder/nli-deberta-v3-base',
+        'cross-encoder/nli-deberta-v3-large',
+        'MoritzLaurer/DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary',
+        'MoritzLaurer/DeBERTa-v3-base-mnli',
+        'MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli',
+        'MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli',
+        'MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7',
+        'navteca/nli-deberta-v3-xsmall',
+        'sileod/deberta-v3-base-tasksource-nli',
+        'sileod/deberta-v3-large-tasksource-nli',
+    ],
     'detr': [
         'facebook/detr-resnet-50',
         'facebook/detr-resnet-101',
@@ -133,7 +151,7 @@
         # https://github.com/huggingface/optimum/issues/1027
         # 'google/mobilebert-uncased',
     ],
-    'mobilevit':[
+    'mobilevit': [
         'apple/mobilevit-small',
         'apple/mobilevit-x-small',
         'apple/mobilevit-xx-small',
diff --git a/src/models.js b/src/models.js
@@ -1283,6 +1283,148 @@ export class BertForQuestionAnswering extends BertPreTrainedModel {
 }
 //////////////////////////////////////////////////
 
+//////////////////////////////////////////////////
+// DeBERTa models
+export class DebertaPreTrainedModel extends PreTrainedModel { }
+
+/**
+ * The bare DeBERTa Model transformer outputting raw hidden-states without any specific head on top.
+ */
+export class DebertaModel extends DebertaPreTrainedModel { }
+
+/**
+ * DeBERTa Model with a `language modeling` head on top.
+ */
+export class DebertaForMaskedLM extends DebertaPreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<MaskedLMOutput>} An object containing the model's output logits for masked language modeling.
+     */
+    async _call(model_inputs) {
+        return new MaskedLMOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * DeBERTa Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output)
+ */
+export class DebertaForSequenceClassification extends DebertaPreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<SequenceClassifierOutput>} An object containing the model's output logits for sequence classification.
+     */
+    async _call(model_inputs) {
+        return new SequenceClassifierOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * DeBERTa Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
+ */
+export class DebertaForTokenClassification extends DebertaPreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<TokenClassifierOutput>} An object containing the model's output logits for token classification.
+     */
+    async _call(model_inputs) {
+        return new TokenClassifierOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * DeBERTa Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
+ * layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
+ */
+export class DebertaForQuestionAnswering extends DebertaPreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<QuestionAnsweringModelOutput>} An object containing the model's output logits for question answering.
+     */
+    async _call(model_inputs) {
+        return new QuestionAnsweringModelOutput(await super._call(model_inputs));
+    }
+}
+//////////////////////////////////////////////////
+
+//////////////////////////////////////////////////
+// DeBERTa-v2 models
+export class DebertaV2PreTrainedModel extends PreTrainedModel { }
+
+/**
+ * The bare DeBERTa-V2 Model transformer outputting raw hidden-states without any specific head on top.
+ */
+export class DebertaV2Model extends DebertaV2PreTrainedModel { }
+
+/**
+ * DeBERTa-V2 Model with a `language modeling` head on top.
+ */
+export class DebertaV2ForMaskedLM extends DebertaV2PreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<MaskedLMOutput>} An object containing the model's output logits for masked language modeling.
+     */
+    async _call(model_inputs) {
+        return new MaskedLMOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * DeBERTa-V2 Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output)
+ */
+export class DebertaV2ForSequenceClassification extends DebertaV2PreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<SequenceClassifierOutput>} An object containing the model's output logits for sequence classification.
+     */
+    async _call(model_inputs) {
+        return new SequenceClassifierOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * DeBERTa-V2 Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
+ */
+export class DebertaV2ForTokenClassification extends DebertaV2PreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<TokenClassifierOutput>} An object containing the model's output logits for token classification.
+     */
+    async _call(model_inputs) {
+        return new TokenClassifierOutput(await super._call(model_inputs));
+    }
+}
+
+/**
+ * DeBERTa-V2 Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
+ * layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
+ */
+export class DebertaV2ForQuestionAnswering extends DebertaV2PreTrainedModel {
+    /**
+     * Calls the model on new inputs.
+     *
+     * @param {Object} model_inputs The inputs to the model.
+     * @returns {Promise<QuestionAnsweringModelOutput>} An object containing the model's output logits for question answering.
+     */
+    async _call(model_inputs) {
+        return new QuestionAnsweringModelOutput(await super._call(model_inputs));
+    }
+}
+//////////////////////////////////////////////////
+
 //////////////////////////////////////////////////
 // DistilBert models
 export class DistilBertPreTrainedModel extends PreTrainedModel { }
@@ -3089,6 +3231,8 @@ export class PretrainedMixin {
 
 const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
     ['bert', BertModel],
+    ['deberta', DebertaModel],
+    ['deberta-v2', DebertaV2Model],
     ['mpnet', MPNetModel],
     ['albert', AlbertModel],
     ['distilbert', DistilBertModel],
@@ -3120,6 +3264,8 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
 
 const MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = new Map([
     ['bert', BertForSequenceClassification],
+    ['deberta', DebertaForSequenceClassification],
+    ['deberta-v2', DebertaV2ForSequenceClassification],
     ['mpnet', MPNetForSequenceClassification],
     ['albert', AlbertForSequenceClassification],
     ['distilbert', DistilBertForSequenceClassification],
@@ -3132,6 +3278,8 @@ const MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = new Map([
 
 const MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = new Map([
     ['bert', BertForTokenClassification],
+    ['deberta', DebertaForTokenClassification],
+    ['deberta-v2', DebertaV2ForTokenClassification],
     ['mpnet', MPNetForTokenClassification],
     ['distilbert', DistilBertForTokenClassification],
     ['roberta', RobertaForTokenClassification],
@@ -3156,6 +3304,8 @@ const MODEL_WITH_LM_HEAD_MAPPING_NAMES = new Map([
 
 const MODEL_FOR_MASKED_LM_MAPPING_NAMES = new Map([
     ['bert', BertForMaskedLM],
+    ['deberta', DebertaForMaskedLM],
+    ['deberta-v2', DebertaV2ForMaskedLM],
     ['mpnet', MPNetForMaskedLM],
     ['albert', AlbertForMaskedLM],
     ['distilbert', DistilBertForMaskedLM],
@@ -3167,6 +3317,8 @@ const MODEL_FOR_MASKED_LM_MAPPING_NAMES = new Map([
 
 const MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = new Map([
     ['bert', BertForQuestionAnswering],
+    ['deberta', DebertaForQuestionAnswering],
+    ['deberta-v2', DebertaV2ForQuestionAnswering],
     ['mpnet', MPNetForQuestionAnswering],
     ['albert', AlbertForQuestionAnswering],
     ['distilbert', DistilBertForQuestionAnswering],
diff --git a/src/pipelines.js b/src/pipelines.js
@@ -460,17 +460,17 @@ export class TranslationPipeline extends Text2TextGenerationPipeline {
  * **Example:** Text generation with `Xenova/distilgpt2` (default settings).
  * ```javascript
  * let text = 'I enjoy walking with my cute dog,';
- * let generator = await pipeline('text-generation', 'Xenova/distilgpt2');
- * let output = await generator(text);
+ * let classifier = await pipeline('text-generation', 'Xenova/distilgpt2');
+ * let output = await classifier(text);
  * console.log(output);
  * // [{ generated_text: "I enjoy walking with my cute dog, and I love to play with the other dogs." }]
  * ```
  * 
  * **Example:** Text generation with `Xenova/distilgpt2` (custom settings).
  * ```javascript
  * let text = 'Once upon a time, there was';
- * let generator = await pipeline('text-generation', 'Xenova/distilgpt2');
- * let output = await generator(text, {
+ * let classifier = await pipeline('text-generation', 'Xenova/distilgpt2');
+ * let output = await classifier(text, {
  *     temperature: 2,
  *     max_new_tokens: 10,
  *     repetition_penalty: 1.5,
@@ -489,8 +489,8 @@ export class TranslationPipeline extends Text2TextGenerationPipeline {
  * **Example:** Run code generation with `Xenova/codegen-350M-mono`.
  * ```javascript
  * let text = 'def fib(n):';
- * let generator = await pipeline('text-generation', 'Xenova/codegen-350M-mono');
- * let output = await generator(text, {
+ * let classifier = await pipeline('text-generation', 'Xenova/codegen-350M-mono');
+ * let output = await classifier(text, {
  *     max_new_tokens: 40,
  * });
  * console.log(output[0].generated_text);
@@ -550,6 +550,35 @@ export class TextGenerationPipeline extends Pipeline {
  * trained on NLI (natural language inference) tasks. Equivalent of `text-classification`
  * pipelines, but these models don't require a hardcoded number of potential classes, they
  * can be chosen at runtime. It usually means it's slower but it is **much** more flexible.
+ * 
+ * **Example:** Zero shot classification with `Xenova/mobilebert-uncased-mnli`.
+ * ```javascript
+ * let text = 'Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.';
+ * let labels = [ 'mobile', 'billing', 'website', 'account access' ];
+ * let classifier = await pipeline('zero-shot-classification', 'Xenova/mobilebert-uncased-mnli');
+ * let output = await classifier(text, labels);
+ * console.log(output);
+ * //  {
+ * //    sequence: 'Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.',
+ * //    labels: [ 'mobile', 'website', 'billing', 'account access' ],
+ * //    scores: [ 0.5562091040482018, 0.1843621307860853, 0.13942646639336376, 0.12000229877234923 ]
+ * //  }
+ * ```
+ * 
+ * **Example:** Zero shot classification with `Xenova/nli-deberta-v3-xsmall` (multi-label).
+ * ```javascript
+ * let text = 'I have a problem with my iphone that needs to be resolved asap!';
+ * let labels = [ 'urgent', 'not urgent', 'phone', 'tablet', 'computer' ];
+ * let classifier = await pipeline('zero-shot-classification', 'Xenova/nli-deberta-v3-xsmall');
+ * let output = await classifier(text, labels, { multi_label: true });
+ * console.log(output);
+ * // {
+ * //   sequence: 'I have a problem with my iphone that needs to be resolved asap!',
+ * //   labels: [ 'urgent', 'phone', 'computer', 'tablet', 'not urgent' ],
+ * //   scores: [ 0.9958870956360275, 0.9923963400697035, 0.002333537946160235, 0.0015134138567598765, 0.0010699384208377163 ]
+ * // }
+ * ```
+ * 
  * @extends Pipeline
  */
 export class ZeroShotClassificationPipeline extends Pipeline {
@@ -576,7 +605,7 @@ export class ZeroShotClassificationPipeline extends Pipeline {
             this.entailment_id = 2;
         }
 
-        this.contradiction_id = this.label2id['contradiction'];
+        this.contradiction_id = this.label2id['contradiction'] ?? this.label2id['not_entailment'];
         if (this.contradiction_id === undefined) {
             console.warn("Could not find 'contradiction' in label2id mapping. Using 0 as contradiction_id.");
             this.contradiction_id = 0;
diff --git a/src/tokenizers.js b/src/tokenizers.js
diff --git a/tests/generate_tests.py b/tests/generate_tests.py