You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While using unstructured-ingest fully locally for partitioning odt/doc/docx files, I get this error a lot in the uploading stage:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 244474: character maps to
I think this has to do with the default encoding on Windows not being utf-8. The issue is fixed when changing:
with path.open() as f:" with
with
path.open(encoding="utf-8") as f:
inside the function get_json_data that is located inside utils/data_prep.py.
It might be nice if this problem is fixed in a future update.