Skip to content

Commit 1096726

Browse files
yuanlehomeliuyuanle
andauthored
[BugFix] fix decode_token (#2544)
Co-authored-by: liuyuanle <[email protected]>
1 parent b54902c commit 1096726

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

paddleformers/transformers/legacy/tokenizer_utils_base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3491,7 +3491,7 @@ def decode_token(
34913491
all_input_ids[prefix_offset:], skip_special_tokens=skip_special_tokens, clean_up_tokenization_spaces=False
34923492
)
34933493

3494-
if len(new_text) > len(prefix_text) and not prefix_text.endswith("�") and not new_text.endswith("�"):
3494+
if len(new_text) > len(prefix_text) and "�" not in prefix_text and "�" not in new_text:
34953495
# utf-8 char at the end means it's a potential unfinished byte sequence
34963496
# from byte fallback tokenization.
34973497
# If it's in the middle, it's probably a real invalid id generated

0 commit comments

Comments
 (0)