MRPC and CoLA Dataset UnicodeDecodeError

Error message：UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 147: invalid continuation byte

I can't train properly after loading these two data sets. Still report an error after using "ISO-8859-1" and "latin-1" code

After checking the train.txt file of the MRPC dataset, I found that the error byte code corresponds to the character "é", but I modified train.txt and test.txt and preprocessed again to get train.tsv and test.tsv (the file also checked that it did not contain the character "é"). Finally, I still reported an error in training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRPC and CoLA Dataset UnicodeDecodeError #1405

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MRPC and CoLA Dataset UnicodeDecodeError #1405

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions