Skip to content

Commit d73780a

Browse files
sahil-kabirSahil Kabirstevhliu
authored andcommitted
Model card for NLLB (huggingface#40074)
* initializing branch and draft PR * updated model card .md file * minor * minor * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <[email protected]> * resolving comments + adding visuals * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <[email protected]> * NllbTokenizerFast and NllbTokenizer added * endline * minor * Update nllb.md --------- Co-authored-by: Sahil Kabir <[email protected]> Co-authored-by: Steven Liu <[email protected]>
1 parent 3511e6a commit d73780a

File tree

1 file changed

+100
-157
lines changed

1 file changed

+100
-157
lines changed

docs/source/en/model_doc/nllb.md

Lines changed: 100 additions & 157 deletions
Original file line numberDiff line numberDiff line change
@@ -13,136 +13,140 @@ specific language governing permissions and limitations under the License.
1313
rendered properly in your Markdown viewer.
1414
1515
-->
16-
*This model was released on 2022-07-11 and added to Hugging Face Transformers on 2022-07-18.*
17-
18-
# NLLB
1916

20-
<div class="flex flex-wrap space-x-1">
21-
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
22-
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
23-
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
17+
<div style="float: right;">
18+
<div class="flex flex-wrap space-x-1">
19+
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
20+
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
21+
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
22+
</div>
2423
</div>
2524

26-
## Updated tokenizer behavior
25+
*This model was released on 2022-07-11 and added to Hugging Face Transformers on 2022-07-18.*
2726

28-
**DISCLAIMER:** The default behaviour for the tokenizer was fixed and thus changed in April 2023.
29-
The previous version adds `[self.eos_token_id, self.cur_lang_code]` at the end of the token sequence for both target and source tokenization. This is wrong as the NLLB paper mentions (page 48, 6.1.1. Model Architecture) :
3027

31-
*Note that we prefix the source sequence with the source language, as opposed to the target
32-
language as previously done in several works (Arivazhagan et al., 2019; Johnson et al.,
33-
2017). This is primarily because we prioritize optimizing zero-shot performance of our
34-
model on any pair of 200 languages at a minor cost to supervised performance.*
28+
# NLLB
3529

36-
Previous behaviour:
30+
[NLLB: No Language Left Behind](https://huggingface.co/papers/2207.04672) is a multilingual translation model. It's trained on data using data mining techniques tailored for low-resource languages and supports over 200 languages. NLLB features a conditional compute architecture using a Sparsely Gated Mixture of Experts.
3731

38-
```python
39-
>>> from transformers import NllbTokenizer
4032

41-
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
42-
>>> tokenizer("How was your day?").input_ids
43-
[13374, 1398, 4260, 4039, 248130, 2, 256047]
33+
You can find all the original NLLB checkpoints under the [AI at Meta](https://huggingface.co/facebook/models?search=nllb) organization.
4434

45-
>>> # 2: '</s>'
46-
>>> # 256047 : 'eng_Latn'
47-
```
48-
New behaviour
35+
> [!TIP]
36+
> This model was contributed by [Lysandre](https://huggingface.co/lysandre).
37+
> Click on the NLLB models in the right sidebar for more examples of how to apply NLLB to different translation tasks.
4938
50-
```python
51-
>>> from transformers import NllbTokenizer
39+
The example below demonstrates how to translate text with [`Pipeline`] or the [`AutoModel`] class.
5240

53-
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
54-
>>> tokenizer("How was your day?").input_ids
55-
[256047, 13374, 1398, 4260, 4039, 248130, 2]
56-
```
41+
<hfoptions id="usage">
42+
<hfoption id="Pipeline">
5743

58-
Enabling the old behaviour can be done as follows:
5944
```python
60-
>>> from transformers import NllbTokenizer
45+
import torch
46+
from transformers import pipeline
6147

62-
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", legacy_behaviour=True)
48+
pipeline = pipeline(task="translation", model="facebook/nllb-200-distilled-600M", src_lang="eng_Latn", tgt_lang="fra_Latn", torch_dtype=torch.float16, device=0)
49+
pipeline("UN Chief says there is no military solution in Syria")
6350
```
6451

65-
For more details, feel free to check the linked [PR](https://github.com/huggingface/transformers/pull/22313) and [Issue](https://github.com/huggingface/transformers/issues/19943).
66-
67-
## Overview
52+
</hfoption>
53+
<hfoption id="AutoModel">
6854

69-
The NLLB model was presented in [No Language Left Behind: Scaling Human-Centered Machine Translation](https://huggingface.co/papers/2207.04672) by Marta R. Costa-jussà, James Cross, Onur Çelebi,
70-
Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula,
71-
Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews,
72-
Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers,
73-
Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
55+
```python
56+
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
7457

75-
The abstract of the paper is the following:
58+
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
59+
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M", torch_dtype="auto", attn_implementaiton="sdpa")
7660

77-
*Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today.
78-
However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the
79-
200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, we took on this challenge by
80-
first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed
81-
at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of
82-
Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. We propose multiple architectural and training
83-
improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using
84-
a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety.
85-
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system.*
61+
article = "UN Chief says there is no military solution in Syria"
62+
inputs = tokenizer(article, return_tensors="pt")
8663

87-
This implementation contains the dense models available on release.
64+
translated_tokens = model.generate(
65+
**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("fra_Latn"), max_length=30
66+
)
67+
print(tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0])
68+
```
8869

89-
**The sparse model NLLB-MoE (Mixture of Expert) is now available! More details [here](nllb-moe)**
70+
</hfoption>
71+
<hfoption id="transformers CLI">
9072

91-
This model was contributed by [Lysandre](https://huggingface.co/lysandre). The authors' code can be found [here](https://github.com/facebookresearch/fairseq/tree/nllb).
73+
```bash
74+
echo -e "UN Chief says there is no military solution in Syria" | transformers run --task "translation_en_to_fr" --model facebook/nllb-200-distilled-600M --device 0
75+
```
9276

93-
## Generating with NLLB
77+
</hfoption>
78+
</hfoptions>
9479

95-
While generating the target text set the `forced_bos_token_id` to the target language id. The following
96-
example shows how to translate English to French using the *facebook/nllb-200-distilled-600M* model.
80+
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
9781

98-
Note that we're using the BCP-47 code for French `fra_Latn`. See [here](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)
99-
for the list of all BCP-47 in the Flores 200 dataset.
82+
The example below uses [bitsandbytes](../quantization/bitsandbytes) to quantize the weights to 8-bits.
10083

10184
```python
102-
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
103-
104-
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
105-
>>> model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
106-
107-
>>> article = "UN Chief says there is no military solution in Syria"
108-
>>> inputs = tokenizer(article, return_tensors="pt")
109-
110-
>>> translated_tokens = model.generate(
111-
... **inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("fra_Latn"), max_length=30
112-
... )
113-
>>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
114-
Le chef de l'ONU dit qu'il n'y a pas de solution militaire en Syrie
85+
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, BitsAndBytesConfig
86+
87+
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
88+
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-1.3B", quantization_config=bnb_config)
89+
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-1.3B")
90+
91+
article = "UN Chief says there is no military solution in Syria"
92+
inputs = tokenizer(article, return_tensors="pt").to("cuda")
93+
translated_tokens = model.generate(
94+
**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("fra_Latn"), max_length=30,
95+
)
96+
print(tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0])
11597
```
11698

117-
### Generating from any other language than English
118-
119-
English (`eng_Latn`) is set as the default language from which to translate. In order to specify that you'd like to translate from a different language,
120-
you should specify the BCP-47 code in the `src_lang` keyword argument of the tokenizer initialization.
121-
122-
See example below for a translation from romanian to german:
99+
Use the [AttentionMaskVisualizer](https://github.com/huggingface/transformers/blob/main/src/transformers/utils/attention_visualizer.py#L139) to better understand what tokens the model can and cannot attend to.
123100

124-
```py
125-
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
126-
127-
>>> tokenizer = AutoTokenizer.from_pretrained(
128-
... "facebook/nllb-200-distilled-600M", token=True, src_lang="ron_Latn"
129-
... )
130-
>>> model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M", token=True)
131-
132-
>>> article = "Şeful ONU spune că nu există o soluţie militară în Siria"
133-
>>> inputs = tokenizer(article, return_tensors="pt")
101+
```python
102+
from transformers.utils.attention_visualizer import AttentionMaskVisualizer
134103

135-
>>> translated_tokens = model.generate(
136-
... **inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("deu_Latn"), max_length=30
137-
... )
138-
>>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
139-
UN-Chef sagt, es gibt keine militärische Lösung in Syrien
104+
visualizer = AttentionMaskVisualizer("facebook/nllb-200-distilled-600M")
105+
visualizer("UN Chief says there is no military solution in Syria")
140106
```
141107

142-
## Resources
108+
<div class="flex justify-center">
109+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/NLLB-Attn-Mask.png"/>
110+
</div>
143111

144-
- [Translation task guide](../tasks/translation)
145-
- [Summarization task guide](../tasks/summarization)
112+
## Notes
113+
114+
- The tokenizer was updated in April 2023 to prefix the source sequence with the source language rather than the target language. This prioritizes zero-shot performance at a minor cost to supervised performance.
115+
116+
```python
117+
>>> from transformers import NllbTokenizer
118+
119+
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
120+
>>> tokenizer("How was your day?").input_ids
121+
[256047, 13374, 1398, 4260, 4039, 248130, 2]
122+
```
123+
124+
To revert to the legacy behavior, use the code example below.
125+
126+
```python
127+
>>> from transformers import NllbTokenizer
128+
129+
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", legacy_behaviour=True)
130+
```
131+
132+
- For non-English languages, specify the language's [BCP-47](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200) code with the `src_lang` keyword as shown below.
133+
134+
- See example below for a translation from Romanian to German.
135+
```python
136+
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
137+
138+
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
139+
>>> model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
140+
141+
>>> article = "UN Chief says there is no military solution in Syria"
142+
>>> inputs = tokenizer(article, return_tensors="pt")
143+
144+
>>> translated_tokens = model.generate(
145+
... **inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("fra_Latn"), max_length=30
146+
... )
147+
>>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
148+
Le chef de l'ONU dit qu'il n'y a pas de solution militaire en Syrie
149+
```
146150

147151
## NllbTokenizer
148152

@@ -152,64 +156,3 @@ UN-Chef sagt, es gibt keine militärische Lösung in Syrien
152156
## NllbTokenizerFast
153157

154158
[[autodoc]] NllbTokenizerFast
155-
156-
## Using Flash Attention 2
157-
158-
Flash Attention 2 is a faster, optimized version of the attention scores computation which relies on `cuda` kernels.
159-
160-
### Installation
161-
162-
First, check whether your hardware is compatible with Flash Attention 2. The latest list of compatible hardware can be found in the [official documentation](https://github.com/Dao-AILab/flash-attention#installation-and-features).
163-
164-
Next, [install](https://github.com/Dao-AILab/flash-attention#installation-and-features) the latest version of Flash Attention 2:
165-
166-
```bash
167-
pip install -U flash-attn --no-build-isolation
168-
```
169-
170-
### Usage
171-
172-
To load a model using Flash Attention 2, we can pass the argument `attn_implementation="flash_attention_2"` to [`.from_pretrained`](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained). You can use either `torch.float16` or `torch.bfloat16` precision.
173-
174-
```python
175-
>>> import torch
176-
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
177-
178-
>>> model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M", torch_dtype=torch.float16, attn_implementation="flash_attention_2").to("cuda").eval()
179-
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
180-
181-
>>> article = "Şeful ONU spune că nu există o soluţie militară în Siria"
182-
>>> inputs = tokenizer(article, return_tensors="pt").to("cuda")
183-
184-
>>> translated_tokens = model.generate(
185-
... **inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("deu_Latn"), max_length=30
186-
... )
187-
>>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
188-
"UN-Chef sagt, es gibt keine militärische Lösung in Syrien"
189-
```
190-
191-
### Expected speedups
192-
193-
Below is an expected speedup diagram that compares pure inference time between the native implementation and the Flash Attention 2.
194-
195-
<div style="text-align: center">
196-
<img src="https://huggingface.co/datasets/visheratin/documentation-images/resolve/main/nllb-speedup.webp">
197-
</div>
198-
199-
## Using Scaled Dot Product Attention (SDPA)
200-
PyTorch includes a native scaled dot-product attention (SDPA) operator as part of `torch.nn.functional`. This function
201-
encompasses several implementations that can be applied depending on the inputs and the hardware in use. See the
202-
[official documentation](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html)
203-
or the [GPU Inference](https://huggingface.co/docs/transformers/main/en/perf_infer_gpu_one#pytorch-scaled-dot-product-attention)
204-
page for more information.
205-
206-
SDPA is used by default for `torch>=2.1.1` when an implementation is available, but you may also set
207-
`attn_implementation="sdpa"` in `from_pretrained()` to explicitly request SDPA to be used.
208-
209-
```python
210-
from transformers import AutoModelForSeq2SeqLM
211-
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M", torch_dtype=torch.float16, attn_implementation="sdpa")
212-
...
213-
```
214-
215-
For the best speedups, we recommend loading the model in half-precision (e.g. `torch.float16` or `torch.bfloat16`).

0 commit comments

Comments
 (0)