Skip to content

Commit 3351ab0

Browse files
authored
Upgrade Roberta tokenizer (#1821)
* update roberta * update roberta tokenizer * update roberta tokenizer * update * update
1 parent 46cbe60 commit 3351ab0

File tree

6 files changed

+269
-494
lines changed

6 files changed

+269
-494
lines changed

paddlenlp/transformers/auto/tokenizer.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,8 @@
4848
("MBartTokenizer", "mbart"),
4949
("MPNetTokenizer", "mpnet"),
5050
("NeZhaTokenizer", "nezha"),
51-
("RobertaTokenizer", "roberta"),
51+
("RobertaChineseTokenizer", "roberta"),
52+
("RobertaBPETokenizer", "roberta"),
5253
("RoFormerTokenizer", "roformer"),
5354
("ReformerTokenizer", "reformer"),
5455
("SqueezeBertTokenizer", "squeezebert"),

paddlenlp/transformers/bert/modeling.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -499,7 +499,9 @@ def forward(self,
499499
else:
500500
if attention_mask.ndim == 2:
501501
# attention_mask [batch_size, sequence_length] -> [batch_size, 1, 1, sequence_length]
502-
attention_mask = attention_mask.unsqueeze(axis=[1, 2])
502+
attention_mask = attention_mask.unsqueeze(
503+
axis=[1, 2]).astype(paddle.get_default_dtype())
504+
attention_mask = (1.0 - attention_mask) * -1e4
503505

504506
embedding_output = self.embeddings(
505507
input_ids=input_ids,

0 commit comments

Comments
 (0)