Skip to content

Commit 64b8242

Browse files
authored
Merge branch 'main' into bugfix/fsdp-pytree-memory-regression
2 parents 16e9c5c + 1cea763 commit 64b8242

25 files changed

+355
-562
lines changed

docs/source/en/internal/generation_utils.md

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -426,15 +426,6 @@ A [`Constraint`] can be used to force the generation to include specific tokens
426426
- to_legacy_cache
427427
- from_legacy_cache
428428

429-
[[autodoc]] MambaCache
430-
- update_conv_state
431-
- update_ssm_state
432-
- reset
433-
434-
[[autodoc]] CacheConfig
435-
436-
[[autodoc]] QuantizedCacheConfig
437-
438429

439430
## Watermark Utils
440431

docs/source/en/model_doc/deberta.md

Lines changed: 64 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -14,72 +14,89 @@ rendered properly in your Markdown viewer.
1414
1515
-->
1616

17+
<div style="float: right;">
18+
<div class="flex flex-wrap space-x-1">
19+
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
20+
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
21+
</div>
22+
</div>
23+
</div>
24+
1725
# DeBERTa
1826

19-
<div class="flex flex-wrap space-x-1">
20-
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
21-
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
22-
</div>
27+
[DeBERTa](https://huggingface.co/papers/2006.03654) improves the pretraining efficiency of BERT and RoBERTa with two key ideas, disentangled attention and an enhanced mask decoder. Instead of mixing everything together like BERT, DeBERTa separates a word's *content* from its *position* and processes them independently. This gives it a clearer sense of what's being said and where in the sentence it's happening.
28+
29+
The enhanced mask decoder replaces the traditional softmax decoder to make better predictions.
30+
31+
Even with less training data than RoBERTa, DeBERTa manages to outperform it on several benchmarks.
32+
33+
You can find all the original DeBERTa checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=deberta) organization.
34+
2335

24-
## Overview
36+
> [!TIP]
37+
> Click on the DeBERTa models in the right sidebar for more examples of how to apply DeBERTa to different language tasks.
2538
26-
The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://huggingface.co/papers/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's
27-
BERT model released in 2018 and Facebook's RoBERTa model released in 2019.
39+
The example below demonstrates how to classify text with [`Pipeline`], [`AutoModel`], and from the command line.
2840

29-
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in
30-
RoBERTa.
41+
<hfoptions id="usage">
42+
<hfoption id="Pipeline">
3143

32-
The abstract from the paper is the following:
44+
```py
45+
import torch
46+
from transformers import pipeline
3347

34-
*Recent progress in pre-trained neural language models has significantly improved the performance of many natural
35-
language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with
36-
disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the
37-
disentangled attention mechanism, where each word is represented using two vectors that encode its content and
38-
position, respectively, and the attention weights among words are computed using disentangled matrices on their
39-
contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to
40-
predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency
41-
of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of
42-
the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9%
43-
(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and
44-
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
48+
classifier = pipeline(
49+
task="text-classification",
50+
model="microsoft/deberta-base-mnli",
51+
device=0,
52+
)
4553

54+
classifier({
55+
"text": "A soccer game with multiple people playing.",
56+
"text_pair": "Some people are playing a sport."
57+
})
58+
```
4659

47-
This model was contributed by [DeBERTa](https://huggingface.co/DeBERTa). This model TF 2.0 implementation was
48-
contributed by [kamalkraj](https://huggingface.co/kamalkraj) . The original code can be found [here](https://github.com/microsoft/DeBERTa).
60+
</hfoption>
61+
<hfoption id="AutoModel">
4962

50-
## Resources
63+
```py
64+
import torch
65+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
5166

52-
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DeBERTa. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
67+
model_name = "microsoft/deberta-base-mnli"
68+
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-base-mnli")
69+
model = AutoModelForSequenceClassification.from_pretrained("microsoft/deberta-base-mnli", device_map="auto")
5370

54-
<PipelineTag pipeline="text-classification"/>
71+
inputs = tokenizer(
72+
"A soccer game with multiple people playing.",
73+
"Some people are playing a sport.",
74+
return_tensors="pt"
75+
).to("cuda")
5576

56-
- A blog post on how to [Accelerate Large Model Training using DeepSpeed](https://huggingface.co/blog/accelerate-deepspeed) with DeBERTa.
57-
- A blog post on [Supercharged Customer Service with Machine Learning](https://huggingface.co/blog/supercharge-customer-service-with-machine-learning) with DeBERTa.
58-
- [`DebertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb).
59-
- [`TFDebertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb).
60-
- [Text classification task guide](../tasks/sequence_classification)
77+
with torch.no_grad():
78+
logits = model(**inputs).logits
79+
predicted_class = logits.argmax().item()
6180

62-
<PipelineTag pipeline="token-classification" />
81+
labels = ["contradiction", "neutral", "entailment"]
82+
print(f"The predicted relation is: {labels[predicted_class]}")
6383

64-
- [`DebertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb).
65-
- [`TFDebertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb).
66-
- [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the 🤗 Hugging Face Course.
67-
- [Byte-Pair Encoding tokenization](https://huggingface.co/course/chapter6/5?fw=pt) chapter of the 🤗 Hugging Face Course.
68-
- [Token classification task guide](../tasks/token_classification)
84+
```
6985

70-
<PipelineTag pipeline="fill-mask"/>
86+
</hfoption>
87+
<hfoption id="transformers CLI">
7188

72-
- [`DebertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb).
73-
- [`TFDebertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
74-
- [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the 🤗 Hugging Face Course.
75-
- [Masked language modeling task guide](../tasks/masked_language_modeling)
89+
```bash
90+
echo -e '{"text": "A soccer game with multiple people playing.", "text_pair": "Some people are playing a sport."}' | transformers run --task text-classification --model microsoft/deberta-base-mnli --device 0
91+
```
7692

77-
<PipelineTag pipeline="question-answering"/>
93+
</hfoption>
94+
</hfoptions>
7895

79-
- [`DebertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb).
80-
- [`TFDebertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb).
81-
- [Question answering](https://huggingface.co/course/chapter7/7?fw=pt) chapter of the 🤗 Hugging Face Course.
82-
- [Question answering task guide](../tasks/question_answering)
96+
## Notes
97+
- DeBERTa uses **relative position embeddings**, so it does not require **right-padding** like BERT.
98+
- For best results, use DeBERTa on sentence-level or sentence-pair classification tasks like MNLI, RTE, or SST-2.
99+
- If you're using DeBERTa for token-level tasks like masked language modeling, make sure to load a checkpoint specifically pretrained or fine-tuned for token-level tasks.
83100

84101
## DebertaConfig
85102

docs/source/en/model_doc/textnet.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,11 @@ TextNet is the backbone for Fast, but can also be used as an efficient text/imag
4747
[[autodoc]] TextNetImageProcessor
4848
- preprocess
4949

50+
## TextNetImageProcessorFast
51+
52+
[[autodoc]] TextNetImageProcessorFast
53+
- preprocess
54+
5055
## TextNetModel
5156

5257
[[autodoc]] TextNetModel

docs/source/ko/internal/generation_utils.md

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -405,15 +405,6 @@ generation_output[:2]
405405
- to_legacy_cache
406406
- from_legacy_cache
407407

408-
[[autodoc]] MambaCache
409-
- update_conv_state
410-
- update_ssm_state
411-
- reset
412-
413-
[[autodoc]] CacheConfig
414-
415-
[[autodoc]] QuantizedCacheConfig
416-
417408
## 워터마크 유틸리티 (Watermark Utils) [[transformers.WatermarkDetector]]
418409

419410
[[autodoc]] WatermarkDetector

src/transformers/__init__.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -380,7 +380,6 @@
380380
"QuantoQuantizedLayer",
381381
"HQQQuantizedLayer",
382382
"Cache",
383-
"CacheConfig",
384383
"DynamicCache",
385384
"EncoderDecoderCache",
386385
"HQQQuantizedCache",
@@ -389,7 +388,6 @@
389388
"OffloadedCache",
390389
"OffloadedStaticCache",
391390
"QuantizedCache",
392-
"QuantizedCacheConfig",
393391
"QuantoQuantizedCache",
394392
"SinkCache",
395393
"SlidingWindowCache",
@@ -580,7 +578,6 @@
580578
if TYPE_CHECKING:
581579
# All modeling imports
582580
from .cache_utils import Cache as Cache
583-
from .cache_utils import CacheConfig as CacheConfig
584581
from .cache_utils import ChunkedSlidingLayer as ChunkedSlidingLayer
585582
from .cache_utils import DynamicCache as DynamicCache
586583
from .cache_utils import DynamicLayer as DynamicLayer
@@ -592,7 +589,6 @@
592589
from .cache_utils import OffloadedCache as OffloadedCache
593590
from .cache_utils import OffloadedStaticCache as OffloadedStaticCache
594591
from .cache_utils import QuantizedCache as QuantizedCache
595-
from .cache_utils import QuantizedCacheConfig as QuantizedCacheConfig
596592
from .cache_utils import QuantoQuantizedCache as QuantoQuantizedCache
597593
from .cache_utils import QuantoQuantizedLayer as QuantoQuantizedLayer
598594
from .cache_utils import SinkCache as SinkCache

0 commit comments

Comments
 (0)