如何用MinerU解析WORD文档（.doc和.docx）？ #3746

Sgzmust · 2025-10-16T04:28:09Z

Sgzmust
Oct 16, 2025

如何用MinerU解析WORD文档（.doc和.docx）？

cuihangrui · 2025-10-16T04:30:14Z

dosubot[bot]
bot Oct 16, 2025

MinerU 当前并不支持直接解析 Word 文档（.doc 和 .docx），上传 DOCX 文件会提示格式不支持。官方推荐做法是先用 LibreOffice 等工具将 Word 文档转换为 PDF，再用 MinerU 进行解析，具体说明见官方讨论。

如果你需要直接提取 Word 文档的内容和样式（如字体、字号、标题层级、缩进等），建议使用专门的解析工具，比如 python-docx（Python）、Apache POI（Java）、Aspose.Words（商业）、docx4j（Java）。例如用 python-docx 提取段落和样式信息：

from docx import Document
doc = Document('your.docx')
for para in doc.paragraphs:
    print({
        "text": para.text,
        "style": para.style.name,
        "font": para.runs[0].font.name if para.runs else None,
        "size": para.runs[0].font.size.pt if para.runs and para.runs[0].font.size else None,
        "indent": para.paragraph_format.left_indent.pt if para.paragraph_format.left_indent else None
    })

如需将 Word 文档内容转为 Markdown 或 JSON，可用上述工具自定义导出。更多细节和工具推荐见相关讨论。

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

1 reply

cuihangrui Mar 26, 2026

@dosu Has the latest version supported the recognition of docx and doc files?

bulatovv · 2025-10-17T09:55:57Z

bulatovv
Oct 17, 2025

Conversion of DOCX to PDF format strips all layout information from the original document, which must then be restored through computationally expensive methods in MinerU pipeline. This process is not very optimal. I suggest you use a specialized library like markitdown for DOCX to PDF conversion.

0 replies

myhloli · 2026-03-28T20:26:56Z

myhloli
Mar 28, 2026
Maintainer

更新3.0版本即可解析docx文档

3 replies

finch-xu Mar 30, 2026

请问docx文档里有visio图的场景3.0能解析吗，日常跑解析有大量的这种文档，之前word转pdf都经常遇到liberoffice崩溃，换了wps才好的，

myhloli Mar 30, 2026
Maintainer

应该不支持这种，只支持常规的

finch-xu Mar 30, 2026

好的大佬

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如何用MinerU解析WORD文档（.doc和.docx）？ #3746

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

如何用MinerU解析WORD文档（.doc和.docx）？ #3746

Uh oh!

Sgzmust Oct 16, 2025

Replies: 3 comments · 4 replies

Uh oh!

dosubot[bot] bot Oct 16, 2025

Uh oh!

cuihangrui Mar 26, 2026

Uh oh!

bulatovv Oct 17, 2025

Uh oh!

myhloli Mar 28, 2026 Maintainer

Uh oh!

finch-xu Mar 30, 2026

Uh oh!

myhloli Mar 30, 2026 Maintainer

Uh oh!

finch-xu Mar 30, 2026

Sgzmust
Oct 16, 2025

Replies: 3 comments 4 replies

dosubot[bot]
bot Oct 16, 2025

bulatovv
Oct 17, 2025

myhloli
Mar 28, 2026
Maintainer

myhloli Mar 30, 2026
Maintainer