`detect_encoding` modifies the string

Hi so firstly really nice work. In some simple testing I was shocked to see this running 140 times faster than charlock_holmes on a 36MB file. I also love the idea of being able to remove a build step and external dependencies from my docker container. :)

Looking at the code I noticed that `detect_encoding` will actually modify the string in two ways which is probably not what people would expect.

It uses force encoding but then does not set the encoding back to the original state. This makes sense for the `punch_encoding` method, but I would expect `detect_encoding` to record the original encoding and then set it back to that before exiting.

Secondly it removes any UTF8 BOM from the string. Again this makes sense for the punch method but probably doesn't for detect. Is it necessary to remove this BOM before running the `force_encoding().valid_encoding?` checks? I get that you wouldn't want to clone the string so as to avoid an excess memory overhead, but maybe detect encoding should add it back if it has been removed.

Great work btw, and I'm surprised this doesn't have more stars.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`detect_encoding` modifies the string #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

detect_encoding modifies the string #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`detect_encoding` modifies the string #1