How to extract both text content and graphs (charts) from an XLSX file using Docling? #2019

HashirSaikhani · 2025-07-30T10:32:51Z

HashirSaikhani
Jul 30, 2025

Hello, I am currently using Docling to parse an .xlsx file. It successfully extracts the tables and cell data, however, I noticed it does not extract:

Simple text content (such as plain text)
Graphs or charts embedded within the Excel file
Is there a recommended way to also extract this non-tabular text content and visual elements (graphs/charts) when processing Excel files with Docling?
If Docling does not support this directly, are there suggested workarounds or integrations to achieve this?
Thank you!

@HashirSaikhani · 2025-07-30T10:37:04Z

dosubot[bot]
bot Jul 30, 2025

Hi @HashirSaikhani! I'm Dosu and I’m helping the docling team.

Docling doesn't natively extract non-tabular text (like text boxes, comments, or shapes) or embedded charts/graphs from XLSX files—it focuses on tables and cell data only. Its internal document model can represent text and charts, but XLSX parsing for these elements isn't implemented in Docling itself (source).

To extract non-tabular text and charts from XLSX files, use external Python libraries such as openpyxl (for cell comments, text boxes, and chart XML) or xlrd. For charts, you may need to extract the embedded chart images or XML definitions, then process them further (e.g., convert images to text using OCR if needed). Once extracted, you can integrate this content into Docling's document model by programmatically creating TextItem or PictureItem objects.

This workflow—extracting images or charts with external tools, then running OCR or further processing before integrating into Docling—is similar to Docling's recommended approach for other formats (source). There are no built-in utilities or examples for this in Docling, so external extraction and integration is currently the best path.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

2 replies

HashirSaikhani Jul 30, 2025
Author

@dosu
Thank you very much for the detailed explanation!
Just to clarify:
Do you know if native extraction of non-tabular text (text boxes, comments) and embedded charts/graphs from XLSX files is planned for implementation in Docling in the near future?
Would be great to know if this feature is on the roadmap.
Thanks again!

dosubot[bot] bot Jul 30, 2025

There's no public evidence that native extraction of non-tabular text (like text boxes or comments) or embedded charts/graphs from XLSX files is planned for Docling or on the roadmap. Recent issues, discussions, and pull requests focus on table and cell data extraction, with no mention of expanding XLSX support to include these features. For now, the recommended approach is to use external tools (such as openpyxl) to extract this content and then integrate it into Docling if needed. If this functionality becomes a priority, keep an eye on the repositories for updates or changes in direction.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to extract both text content and graphs (charts) from an XLSX file using Docling? #2019

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to extract both text content and graphs (charts) from an XLSX file using Docling? #2019

Uh oh!

HashirSaikhani Jul 30, 2025

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Jul 30, 2025

Uh oh!

HashirSaikhani Jul 30, 2025 Author

Uh oh!

dosubot[bot] bot Jul 30, 2025

HashirSaikhani
Jul 30, 2025

Replies: 1 comment 2 replies

dosubot[bot]
bot Jul 30, 2025

HashirSaikhani Jul 30, 2025
Author