PDF to markdown conversion does not preserve hyperlinks in the source in the target markdown and issue with list hierarchy. #1953
Unanswered
abdulFarooqui
asked this question in
Q&A
Replies: 1 comment
-
yes facing same issue for hyperlink extraction from pdf. Markdown export does not extract hidden URLs. Is there a way to extract the URLs and export them right after the link that appears in the text? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you for your hard work and generosity.
I am converting PDF that has some hyperlinks to external URLs. I found that the generated markdown does not retain the links.
I also notice that if a list in PDF has multiple levels, then the output markdown makes the list flat (all at one level), and if there was a bullet at lower levels, that is preserved in output.
I am using:
docling==2.41.0
docling-core==2.42.0
docling-ibm-models==3.8.1
docling-parse==4.1.0
I am using the defaults for DocumentConverter, with no extra pipeline options.
I am not sure if this is an issue with my usage, like choosing right Markdown Options or pipeline options.
I will appreciate any help.
Beta Was this translation helpful? Give feedback.
All reactions