Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Commit 1020fae

Browse files
rshragaRoman Shraga
andauthored
Add DistilRoberta Model to OSS (cherry picked commit) (#1998)
* Add DistilRoberta Model to OSS Summary: This diff adds a DistilRoberta to torchtext oss This model is a distilled version of the full Roberta Base model. Weights for this model are taken from HF https://huggingface.co/distilroberta-base The state dict is loaded and modified to work with the internal Roberta implementation here: https://www.internalfb.com/intern/anp/view/?id=2794739 Comparison of DistilRoberta to Roberta-base on the GLUE benchmark (as reported here https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md) {F806809901} DistilRoBERTa reaches 95% of RoBERTa-base's performance on GLUE while being twice faster and 35% smaller. Reviewed By: Nayef211 Differential Revision: D41590601 fbshipit-source-id: 394d10c45bbee5d2e71e14e30edf9b1a9d9380e6 * Add DistilRoberta Model to OSS Summary: This diff adds a DistilRoberta to torchtext oss This model is a distilled version of the full Roberta Base model. Weights for this model are taken from HF https://huggingface.co/distilroberta-base The state dict is loaded and modified to work with the internal Roberta implementation here: https://www.internalfb.com/intern/anp/view/?id=2794739 Comparison of DistilRoberta to Roberta-base on the GLUE benchmark (as reported here https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md) {F806809901} DistilRoBERTa reaches 95% of RoBERTa-base's performance on GLUE while being twice faster and 35% smaller. Reviewed By: Nayef211 Differential Revision: D41590601 fbshipit-source-id: 394d10c45bbee5d2e71e14e30edf9b1a9d9380e6 Co-authored-by: Roman Shraga <[email protected]>
1 parent b1d9447 commit 1020fae

File tree

5 files changed

+43
-7
lines changed

5 files changed

+43
-7
lines changed

README.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ Models
114114
The library currently consist of following pre-trained models:
115115

116116
* RoBERTa: `Base and Large Architecture <https://github.com/pytorch/fairseq/tree/main/examples/roberta#pre-trained-models>`_
117+
* `DistilRoBERTa <https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md>`_
117118
* XLM-RoBERTa: `Base and Large Architure <https://github.com/pytorch/fairseq/tree/main/examples/xlmr#pre-trained-models>`_
118119

119120
Tokenizers

test/integration_tests/test_models.py

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from torchtext.models import (
55
ROBERTA_BASE_ENCODER,
66
ROBERTA_LARGE_ENCODER,
7+
ROBERTA_DISTILLED_ENCODER,
78
XLMR_BASE_ENCODER,
89
XLMR_LARGE_ENCODER,
910
)
@@ -15,13 +16,7 @@
1516
"xlmr_large": XLMR_LARGE_ENCODER,
1617
"roberta_base": ROBERTA_BASE_ENCODER,
1718
"roberta_large": ROBERTA_LARGE_ENCODER,
18-
}
19-
20-
BUNDLERS = {
21-
"xlmr_base": XLMR_BASE_ENCODER,
22-
"xlmr_large": XLMR_LARGE_ENCODER,
23-
"roberta_base": ROBERTA_BASE_ENCODER,
24-
"roberta_large": ROBERTA_LARGE_ENCODER,
19+
"roberta_distilled": ROBERTA_DISTILLED_ENCODER,
2520
}
2621

2722

@@ -32,6 +27,7 @@
3227
("xlmr_large",),
3328
("roberta_base",),
3429
("roberta_large",),
30+
("roberta_distilled",),
3531
],
3632
)
3733
class TestRobertaEncoders(TorchtextTestCase):
21.8 KB
Binary file not shown.

torchtext/models/roberta/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from .bundler import (
22
ROBERTA_BASE_ENCODER,
33
ROBERTA_LARGE_ENCODER,
4+
ROBERTA_DISTILLED_ENCODER,
45
RobertaBundle,
56
XLMR_BASE_ENCODER,
67
XLMR_LARGE_ENCODER,
@@ -16,4 +17,5 @@
1617
"XLMR_LARGE_ENCODER",
1718
"ROBERTA_BASE_ENCODER",
1819
"ROBERTA_LARGE_ENCODER",
20+
"ROBERTA_DISTILLED_ENCODER",
1921
]

torchtext/models/roberta/bundler.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -294,3 +294,40 @@ def encoderConf(self) -> RobertaEncoderConf:
294294
295295
Please refer to :func:`torchtext.models.RobertaBundle` for the usage.
296296
"""
297+
298+
299+
ROBERTA_DISTILLED_ENCODER = RobertaBundle(
300+
_path=urljoin(_TEXT_BUCKET, "roberta.distilled.encoder.pt"),
301+
_encoder_conf=RobertaEncoderConf(
302+
num_encoder_layers=6,
303+
padding_idx=1,
304+
),
305+
transform=lambda: T.Sequential(
306+
T.GPT2BPETokenizer(
307+
encoder_json_path=urljoin(_TEXT_BUCKET, "gpt2_bpe_encoder.json"),
308+
vocab_bpe_path=urljoin(_TEXT_BUCKET, "gpt2_bpe_vocab.bpe"),
309+
),
310+
T.VocabTransform(load_state_dict_from_url(urljoin(_TEXT_BUCKET, "roberta.vocab.pt"))),
311+
T.Truncate(510),
312+
T.AddToken(token=0, begin=True),
313+
T.AddToken(token=2, begin=False),
314+
),
315+
)
316+
317+
ROBERTA_DISTILLED_ENCODER.__doc__ = """
318+
Roberta Encoder with Distilled Weights
319+
320+
DistilRoBERTa is trained using knowledge distillation, a technique to compress a large
321+
model called the teacher into a smaller model called the student. By distillating RoBERTa,
322+
a smaller and faster Transformer model is obtained while maintaining most of the performance.
323+
324+
DistilRoBERTa was pretrained solely on OpenWebTextCorpus, a reproduction of OpenAI's WebText dataset.
325+
On average DistilRoBERTa is twice as fast as RoBERTa Base.
326+
327+
Originally published by Hugging Face under the Apache 2.0 License
328+
and redistributed with the same license.
329+
[`License <https://www.apache.org/licenses/LICENSE-2.0>`__,
330+
`Source <https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation>`__]
331+
332+
Please refer to :func:`torchtext.models.RobertaBundle` for the usage.
333+
"""

0 commit comments

Comments
 (0)