Skip to content

prepare_vocab.py throwing UniDecodeError #9

@rpratesh

Description

@rpratesh

If I run prepare_vocab.py for German text corpus, I get the following error:

Traceback (most recent call last):
File "prepare_vocab.py", line 41, in
for index, line in enumerate(source_file):
File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 111: ordinal not in range(128)

The command I've run is:

python3 prepare_vocab.py /docker_files/german_ds/text_corpus/German_sentences_8mil_filtered_maryfied.txt /docker_files/german_ds/output/clean_vocab.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions