-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
If I run prepare_vocab.py for German text corpus, I get the following error:
Traceback (most recent call last):
File "prepare_vocab.py", line 41, in
for index, line in enumerate(source_file):
File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 111: ordinal not in range(128)
The command I've run is:
python3 prepare_vocab.py /docker_files/german_ds/text_corpus/German_sentences_8mil_filtered_maryfied.txt /docker_files/german_ds/output/clean_vocab.txt
Metadata
Metadata
Assignees
Labels
No labels