How to extract both text content and graphs (charts) from an XLSX file using Docling? #2019
Replies: 1 comment 2 replies
-
Hi @HashirSaikhani! I'm Dosu and I’m helping the docling team. Docling doesn't natively extract non-tabular text (like text boxes, comments, or shapes) or embedded charts/graphs from XLSX files—it focuses on tables and cell data only. Its internal document model can represent text and charts, but XLSX parsing for these elements isn't implemented in Docling itself (source). To extract non-tabular text and charts from XLSX files, use external Python libraries such as This workflow—extracting images or charts with external tools, then running OCR or further processing before integrating into Docling—is similar to Docling's recommended approach for other formats (source). There are no built-in utilities or examples for this in Docling, so external extraction and integration is currently the best path. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, I am currently using Docling to parse an .xlsx file. It successfully extracts the tables and cell data, however, I noticed it does not extract:
Is there a recommended way to also extract this non-tabular text content and visual elements (graphs/charts) when processing Excel files with Docling?
If Docling does not support this directly, are there suggested workarounds or integrations to achieve this?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions