You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: Change JSON loader to be able to handle UTF-8-BOM files (#138)
Current parser will fail to ingest files that were encoded with the BOM
bytes at the start. This is common for Windows-saved files and has been
an issue for datasets where I can't ensure the default Unix encoding of
UTF-8.
As far as I'm aware, decoding using utf-8-sig has no downsides when used
on basic UTF-8 beyond small per-file processing overhead to check for
the 3 bytes at the start, but it enables the code to correctly open
files that have the BOM prefix.
---------
Co-authored-by: Eugene Yurtsev <[email protected]>
0 commit comments