v6.0.1 - 02.01.2026 #72
harshankur
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
v6.0.1 - 02.01.2026
Changes: v5.2.2..v6.0.1
We are thrilled to announce the release of officeParser v6.0.1, a major overhaul that transforms the library from a simple text extractor into a powerful, format-agnostic document analysis engine.
🌟 Key Highlights (v6.0.0+)
🌳 Abstract Syntax Tree (AST) Output
The core parsing engine now produces a rich, hierarchical Abstract Syntax Tree. This allows you to traverse documents structurally—accessing paragraphs, headings, tables, and lists with their original nesting and metadata preserved.
🖼️ OCR & Attachment Extraction
📄 New Format Support & Improvements
.rtf) files, including complex nested tables and lists. (Fixes Add support for *.rtf files #54)pagenodes, matching the structure of slides and sheets.slideandsheetdelimiter nodes for cleaner visualization and processing. (Fixes Feature Request: Add slide delimiter support for PowerPoint files #64)🔗 Enhanced Hyperlinks
<a>tags.🛠️ Bug Fixes & Refinements
.docxparsing. (Fixes Numbered elements aren't preserved as they show up in .docx files #29)🎨 Interactive AST Visualizer (v6.0.1 Fix)
The Live Visualizer has been revamped and fixed for stable deployment:
/docsfolder for standard GitHub Pages hosting at the repository's root. (Fixed in v6.0.1)OfficeParserASTobject instead of a raw string.ast.toText()on the returned object.This discussion was created from the release v6.0.1 - 02.01.2026.
Beta Was this translation helpful? Give feedback.
All reactions