Conversation
Encord Agents test report90 tests 89 ✅ 3m 18s ⏱️ Results for commit 17d4bb9. ♻️ This comment has been updated with latest results. |
| import pymupdf | ||
|
|
||
|
|
||
| def extract_page( |
There was a problem hiding this comment.
Can't really be done by like other method as doc.close() apparently divorces pix so best to cast to file immediately
There was a problem hiding this comment.
I think that we should fix this straight away. The pumupdf.Pixmap has a pil_image method so we can make the optional dependencies [mymupdf, PILLOW] and then use that. Then, no need to store on disk.
|
On a high level, this would be improved if instead we didn't write the intermediary crops to a file. As is, I'm just emulating the behaviour for videos but this could be improved for sure. Ticket already made around: Serverlessness |
c5f658e to
e472680
Compare
pyproject.toml
Outdated
| numpy = ">=1.26.4" | ||
| opencv-python = ">=4.1" | ||
|
|
||
| PyMuPdf = ">1.25.0" |
There was a problem hiding this comment.
This is a big no no IMO. I agree with your comment that this should be made optional.
So you do pip install "encord-agents[pdf]" if you need it. encord-agents` has to be light weight in terms of dependencies. Cause what happens when we need to support DICOM or some exotic geospatial data type.
| import pymupdf | ||
|
|
||
|
|
||
| def extract_page( |
There was a problem hiding this comment.
I think that we should fix this straight away. The pumupdf.Pixmap has a pil_image method so we can make the optional dependencies [mymupdf, PILLOW] and then use that. Then, no need to store on disk.
Gives support for pdf package and provides method for interfacing with pdfs. Makes this package optional and slightly modify CI to ensure testing With pdf support
Feature, Tests, Docs not yet.
Currently adds dependency. Could technically become optional dependency