-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi so firstly really nice work. In some simple testing I was shocked to see this running 140 times faster than charlock_holmes on a 36MB file. I also love the idea of being able to remove a build step and external dependencies from my docker container. :)
Looking at the code I noticed that detect_encoding will actually modify the string in two ways which is probably not what people would expect.
It uses force encoding but then does not set the encoding back to the original state. This makes sense for the punch_encoding method, but I would expect detect_encoding to record the original encoding and then set it back to that before exiting.
Secondly it removes any UTF8 BOM from the string. Again this makes sense for the punch method but probably doesn't for detect. Is it necessary to remove this BOM before running the force_encoding().valid_encoding? checks? I get that you wouldn't want to clone the string so as to avoid an excess memory overhead, but maybe detect encoding should add it back if it has been removed.
Great work btw, and I'm surprised this doesn't have more stars.