Skip to content

Encoding issue in reported.tsv #16

@somerandomguyontheweb

Description

@somerandomguyontheweb

As discussed here, comments in reported.tsv for Belarusian, which were filled in by the contributors, are not displayed correctly: all Cyrillic characters have been replaced with question marks (probably an encoding issue at some stage of the data pipeline).

Steps to reproduce:

  • Download the Belarusian dataset, unpack it and open cv-corpus-7.0-2021-07-21/be/reported.tsv.
  • Filter by reason, hiding all sentences with the reason grammar-or-spelling.
  • Observe that most of the remaining reasons are not displayed correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions