Skip to content

UTF-8 is detected with inconsistent case #32

@shaicoleman

Description

@shaicoleman

Texts are usually returned as 'utf-8', but when they have a BOM they are returned as 'UTF-8'

CharDet.detect("UTF-8¡¡¡") # "encoding"=>"utf-8"
CharDet.detect("\xEF\xBB\xBF BOM¡¡¡") # "encoding"=>"UTF-8"

If it's intentional behaviour, it should be documented.
If not, it should probably return things in a consistent case, but that might be a breaking change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions