Mimetype based detection in SimpleDirectoryReader #15436
Blackskyliner
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
Is there interest in a patch to extend the SimpleDirectoryReader to also being able to parse file magic values and guess possible extensions for lookup in the file extractor dict?
I implemented this for myself by blatantly copying the current SimpleDirectoryCode and adding detection through the packages
python-magic
andmimetypes
.Background of why I think this is useful: If you use S3 storage and/or implement/use your own deduplication filesystem system based hashed file-names the extension of a file may just not exist. But the magic number within the file will always exist. (Thats the case I needed it for)
But before I would go through the cleanup needed for proper PR-quality I would like to know if the general idea/addition would get accepted or if there is any need for it (or maybe I am uninformed and there is already a Mime-Type based detection somewhere hidden in another reader component).
Beta Was this translation helpful? Give feedback.
All reactions