Extract PDF images text using custom LLM model and placing image doc in proper order of pdf #1049
Unanswered
Navanit-git
asked this question in
Q&A
Replies: 2 comments 6 replies
-
I think you are doing what the picture description option allows you to do. See https://ds4sd.github.io/docling/examples/pictures_description/. You will be able to define the the vision model you prefer. |
Beta Was this translation helpful? Give feedback.
6 replies
-
Is #1085 similar? I also face the issue of not getting the images in the location where I would expect to see them |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am working on extracting the text and image from the pdf.
This is the pdf I am using link
In this I have used below code
and this is using my 4gb GPU.
Parallely I am using vlm model too for the images
and this is taking around 16gb of GPU. Is there a way to combine both of these to get the image details in the place of image placeholder in md file.
Also if you view the pdf page three I am getting md file response like this
So how should I do that the A image gets down with A doc and lastly instead of image I want that the image description from the vlm model, without overusing the GPU.
Beta Was this translation helpful? Give feedback.
All reactions