Skip to content

Conversation

@fabianofranz
Copy link
Contributor

RHELAI-4141

Adds a tutorial exposing a set of flags and options that are effective fixes for some of the most common issues faced in document parsing with Docling.

Copy link
Contributor

@iamemilio iamemilio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from a nit pick, this is great!

@fabianofranz fabianofranz force-pushed the docling-conversion-tutorials branch from b02e195 to 7938ce2 Compare June 2, 2025 14:47
@fabianofranz
Copy link
Contributor Author

@JustinXHale Let me know if you have any comments about the folder structure, or anything else. Thank you!

Copy link
Member

@JustinXHale JustinXHale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the folder structure under docs/docling-conversion/ looks clean and intuitive. I like that each .py script corresponds directly to a documented conversion technique. Maybe think about better naming structure for "mostly default settings", maybe something like baseline, default, or quickstart, so that the naming isnt as vague.

@fabianofranz fabianofranz force-pushed the docling-conversion-tutorials branch 5 times, most recently from 3b42046 to d84b0f1 Compare June 5, 2025 15:50
@fabianofranz fabianofranz requested a review from alinaryan as a code owner June 5, 2025 15:50
@fabianofranz fabianofranz force-pushed the docling-conversion-tutorials branch from d84b0f1 to a934487 Compare June 5, 2025 15:52
@fabianofranz
Copy link
Contributor Author

@JustinXHale I renamed it to "standard settings". Thank you for the review!

Copy link
Contributor

@alimaredia alimaredia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job getting this off the ground, I love how concise this is.

The only thing I could think about adding are examples of using the wrong document conversion pipeline on a document, what they look like and how users should adjust and see better results. Something like this could always be added as a follow up

@fabianofranz fabianofranz force-pushed the docling-conversion-tutorials branch from a934487 to 5dcd845 Compare June 11, 2025 14:48
@alimaredia alimaredia merged commit 0f9ec2c into instructlab:main Jun 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants