huggingface
diff --git a/‎docs/source/en/internal/generation_utils.md
Lines changed: 0 additions & 9 deletions b/‎docs/source/en/internal/generation_utils.md
Lines changed: 0 additions & 9 deletions
diff --git a/‎docs/source/en/model_doc/deberta.md
Lines changed: 64 additions & 47 deletions b/‎docs/source/en/model_doc/deberta.md
Lines changed: 64 additions & 47 deletions
diff --git a/‎docs/source/en/model_doc/textnet.md
Lines changed: 5 additions & 0 deletions b/‎docs/source/en/model_doc/textnet.md
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/source/ko/internal/generation_utils.md
Lines changed: 0 additions & 9 deletions b/‎docs/source/ko/internal/generation_utils.md
Lines changed: 0 additions & 9 deletions
diff --git a/‎src/transformers/__init__.py
Lines changed: 0 additions & 4 deletions b/‎src/transformers/__init__.py
Lines changed: 0 additions & 4 deletions
@@ -426,15 +426,6 @@ A [`Constraint`] can be used to force the generation to include specific tokens
     - to_legacy_cache
     - from_legacy_cache
 
-[[autodoc]] MambaCache
-    - update_conv_state
-    - update_ssm_state
-    - reset
-
-[[autodoc]] CacheConfig
-
-[[autodoc]] QuantizedCacheConfig
-
 
 ## Watermark Utils
 
 
@@ -14,72 +14,89 @@ rendered properly in your Markdown viewer.
 
 -->
 
+<div style="float: right;">
+    <div class="flex flex-wrap space-x-1">
+        <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
+        <img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
+</div>
+    </div>
+</div>
+
 # DeBERTa
 
-<div class="flex flex-wrap space-x-1">
-<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
-<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
-</div>
+[DeBERTa](https://huggingface.co/papers/2006.03654) improves the pretraining efficiency of BERT and RoBERTa with two key ideas, disentangled attention and an enhanced mask decoder. Instead of mixing everything together like BERT, DeBERTa separates a word's *content* from its *position* and processes them independently. This gives it a clearer sense of what's being said and where in the sentence it's happening.
+
+The enhanced mask decoder replaces the traditional softmax decoder to make better predictions.
+
+Even with less training data than RoBERTa, DeBERTa manages to outperform it on several benchmarks.
+
+You can find all the original DeBERTa checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=deberta) organization.
+
 
-## Overview
+> [!TIP]
+> Click on the DeBERTa models in the right sidebar for more examples of how to apply DeBERTa to different language tasks.
 
-The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://huggingface.co/papers/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's
-BERT model released in 2018 and Facebook's RoBERTa model released in 2019.
+The example below demonstrates how to classify text with [`Pipeline`], [`AutoModel`], and from the command line.
 
-It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in
-RoBERTa.
+<hfoptions id="usage">
+<hfoption id="Pipeline">
 
-The abstract from the paper is the following:
+```py
+import torch
+from transformers import pipeline
 
-*Recent progress in pre-trained neural language models has significantly improved the performance of many natural
-language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with
-disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the
-disentangled attention mechanism, where each word is represented using two vectors that encode its content and
-position, respectively, and the attention weights among words are computed using disentangled matrices on their
-contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to
-predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency
-of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of
-the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9%
-(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and
-pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
+classifier = pipeline(
+    task="text-classification",
+    model="microsoft/deberta-base-mnli",
+    device=0,
+)
 
+classifier({
+    "text": "A soccer game with multiple people playing.",
+    "text_pair": "Some people are playing a sport."
+})
+```
 
-This model was contributed by [DeBERTa](https://huggingface.co/DeBERTa). This model TF 2.0 implementation was
-contributed by [kamalkraj](https://huggingface.co/kamalkraj) . The original code can be found [here](https://github.com/microsoft/DeBERTa).
+</hfoption>
+<hfoption id="AutoModel">
 
-## Resources
+```py
+import torch
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
 
-A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DeBERTa. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
+model_name = "microsoft/deberta-base-mnli"
+tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-base-mnli")
+model = AutoModelForSequenceClassification.from_pretrained("microsoft/deberta-base-mnli", device_map="auto")
 
-<PipelineTag pipeline="text-classification"/>
+inputs = tokenizer(
+    "A soccer game with multiple people playing.",
+    "Some people are playing a sport.",
+    return_tensors="pt"
+).to("cuda")
 
-- A blog post on how to [Accelerate Large Model Training using DeepSpeed](https://huggingface.co/blog/accelerate-deepspeed) with DeBERTa.
-- A blog post on [Supercharged Customer Service with Machine Learning](https://huggingface.co/blog/supercharge-customer-service-with-machine-learning) with DeBERTa.
-- [`DebertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb).
-- [`TFDebertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb).
-- [Text classification task guide](../tasks/sequence_classification)
+with torch.no_grad():
+    logits = model(**inputs).logits
+    predicted_class = logits.argmax().item()
 
-<PipelineTag pipeline="token-classification" />
+labels = ["contradiction", "neutral", "entailment"]
+print(f"The predicted relation is: {labels[predicted_class]}")
 
-- [`DebertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb).
-- [`TFDebertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb).
-- [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the 🤗 Hugging Face Course.
-- [Byte-Pair Encoding tokenization](https://huggingface.co/course/chapter6/5?fw=pt) chapter of the 🤗 Hugging Face Course.
-- [Token classification task guide](../tasks/token_classification)
+```
 
-<PipelineTag pipeline="fill-mask"/>
+</hfoption>
+<hfoption id="transformers CLI">
 
-- [`DebertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb).
-- [`TFDebertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
-- [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the 🤗 Hugging Face Course.
-- [Masked language modeling task guide](../tasks/masked_language_modeling)
+```bash
+echo -e '{"text": "A soccer game with multiple people playing.", "text_pair": "Some people are playing a sport."}' | transformers run --task text-classification --model microsoft/deberta-base-mnli --device 0
+```
 
-<PipelineTag pipeline="question-answering"/>
+</hfoption>
+</hfoptions>
 
-- [`DebertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb).
-- [`TFDebertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb).
-- [Question answering](https://huggingface.co/course/chapter7/7?fw=pt) chapter of the 🤗 Hugging Face Course.
-- [Question answering task guide](../tasks/question_answering)
+## Notes
+- DeBERTa uses **relative position embeddings**, so it does not require **right-padding** like BERT.
+- For best results, use DeBERTa on sentence-level or sentence-pair classification tasks like MNLI, RTE, or SST-2.
+- If you're using DeBERTa for token-level tasks like masked language modeling, make sure to load a checkpoint specifically pretrained or fine-tuned for token-level tasks.
 
 ## DebertaConfig
 
 
@@ -47,6 +47,11 @@ TextNet is the backbone for Fast, but can also be used as an efficient text/imag
 [[autodoc]] TextNetImageProcessor
     - preprocess
 
+## TextNetImageProcessorFast
+
+[[autodoc]] TextNetImageProcessorFast
+    - preprocess
+
 ## TextNetModel
 
 [[autodoc]] TextNetModel
 
@@ -405,15 +405,6 @@ generation_output[:2]
     - to_legacy_cache
     - from_legacy_cache
 
-[[autodoc]] MambaCache
-    - update_conv_state
-    - update_ssm_state
-    - reset
-
-[[autodoc]] CacheConfig
-
-[[autodoc]] QuantizedCacheConfig
-
 ## 워터마크 유틸리티 (Watermark Utils) [[transformers.WatermarkDetector]]
 
 [[autodoc]] WatermarkDetector
 
@@ -380,7 +380,6 @@
         "QuantoQuantizedLayer",
         "HQQQuantizedLayer",
         "Cache",
-        "CacheConfig",
         "DynamicCache",
         "EncoderDecoderCache",
         "HQQQuantizedCache",
@@ -389,7 +388,6 @@
         "OffloadedCache",
         "OffloadedStaticCache",
         "QuantizedCache",
-        "QuantizedCacheConfig",
         "QuantoQuantizedCache",
         "SinkCache",
         "SlidingWindowCache",
@@ -580,7 +578,6 @@
 if TYPE_CHECKING:
     # All modeling imports
     from .cache_utils import Cache as Cache
-    from .cache_utils import CacheConfig as CacheConfig
     from .cache_utils import ChunkedSlidingLayer as ChunkedSlidingLayer
     from .cache_utils import DynamicCache as DynamicCache
     from .cache_utils import DynamicLayer as DynamicLayer
@@ -592,7 +589,6 @@
     from .cache_utils import OffloadedCache as OffloadedCache
     from .cache_utils import OffloadedStaticCache as OffloadedStaticCache
     from .cache_utils import QuantizedCache as QuantizedCache
-    from .cache_utils import QuantizedCacheConfig as QuantizedCacheConfig
     from .cache_utils import QuantoQuantizedCache as QuantoQuantizedCache
     from .cache_utils import QuantoQuantizedLayer as QuantoQuantizedLayer
     from .cache_utils import SinkCache as SinkCache