PDF to markdown conversion does not preserve hyperlinks in the source in the target markdown and issue with list hierarchy. #1953

abdulFarooqui · 2025-07-16T18:46:01Z

abdulFarooqui
Jul 16, 2025

Thank you for your hard work and generosity.

I am converting PDF that has some hyperlinks to external URLs. I found that the generated markdown does not retain the links.
I also notice that if a list in PDF has multiple levels, then the output markdown makes the list flat (all at one level), and if there was a bullet at lower levels, that is preserved in output.
I am using:
docling==2.41.0
docling-core==2.42.0
docling-ibm-models==3.8.1
docling-parse==4.1.0

I am using the defaults for DocumentConverter, with no extra pipeline options.

I am not sure if this is an issue with my usage, like choosing right Markdown Options or pipeline options.

I will appreciate any help.

gawaliakshata2-tech · 2025-09-04T20:42:56Z

gawaliakshata2-tech
Sep 4, 2025

yes facing same issue for hyperlink extraction from pdf. Markdown export does not extract hidden URLs. Is there a way to extract the URLs and export them right after the link that appears in the text?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF to markdown conversion does not preserve hyperlinks in the source in the target markdown and issue with list hierarchy. #1953

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PDF to markdown conversion does not preserve hyperlinks in the source in the target markdown and issue with list hierarchy. #1953

Uh oh!

abdulFarooqui Jul 16, 2025

Replies: 1 comment

Uh oh!

gawaliakshata2-tech Sep 4, 2025

abdulFarooqui
Jul 16, 2025

gawaliakshata2-tech
Sep 4, 2025