-
Notifications
You must be signed in to change notification settings - Fork 102
Open
Description
Currently ImageRefMode only reasons about a single representation of the document https://github.com/docling-project/docling-core/blob/main/docling_core/types/doc/base.py#L9-L14 and can embed, or refernece the image - or set a placeholder.
In case of a data pipelien where the same document should be stored in different representations (raw json, plus HTML for example)
document.save_as_json(
Path(f"{mypath}.json"), image_mode=ImageRefMode.REFERENCED
)
document.save_as_html(
Path(f"{mypath}.html"), image_mode=ImageRefMode.REFERENCED
)
document.save_as_markdown(
Path(f"{mypath}.md"), image_mode=ImageRefMode.REFERENCED
)by setting ImageRefMode.REFERENCED 3x the IO is incurred - instead setting the 1st one to REFERENCED and others to NONE would save resources/ be faster:
document.save_as_json(
Path(f"{mypath}.json"), image_mode=ImageRefMode.REFERENCED
)
document.save_as_html(
Path(f"{mypath}.html"), image_mode=ImageRefMode.NONE
)
document.save_as_markdown(
Path(f"{mypath}.md"), image_mode=ImageRefMode.NONE
)Would this be a sensible addition?
Or possibly modifying _with_pictures_refs to only generate the output if it does not already exist might be an alternative solution
Metadata
Metadata
Assignees
Labels
No labels