Skip to content

UTF-8 with emojis detected as pure ascii with 100% confidence #161

@piranna

Description

@piranna

I think here there are two bugs:

  1. a pure ascii string (0x00-0x7F) is also a valid UTF-8 string, so it should detect both of them, if not with a 100% confidence maybe a 99% for the UTF-8 case to give priority to the ascii one
  2. if text has emojis or any code sequence outside of the ones of pure ascii, definitely it's NOT a pure ascii string

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions