Skip to content

Commit 17502c1

Browse files
Fix plato-2 tokenization (#770)
* fix unified transformer dtype problem * fix win dtype bug * Fix plato-2 and plato-mini dtype bug * Fix plato-2 tokenization Co-authored-by: Jiaqi Liu <[email protected]>
1 parent b86d5b7 commit 17502c1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/dialogue/plato-2/utils/tokenization.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ def convert_to_unicode(text):
8686
def load_vocab(vocab_file):
8787
"""Loads a vocabulary file into a dictionary."""
8888
vocab = collections.OrderedDict()
89-
fin = open(vocab_file)
89+
fin = open(vocab_file, 'r', encoding="UTF-8")
9090
for num, line in enumerate(fin):
9191
items = convert_to_unicode(line.rstrip()).split("\t")
9292
if len(items) > 2:

0 commit comments

Comments
 (0)