A comprehensive set of tools for analyzing and dumping Microsoft Office file formats.
MSO-Dumper is a package for analyzing and dumping various Microsoft Office file formats, including binary formats like DOC, XLS, PPT, and graphics formats like EMF, WMF. It provides detailed structural analysis and can extract content from these files.
- Authors: See https://github.com/LibreOffice/mso-dumper/graphs/contributors
- Email: libreoffice@lists.freedesktop.org
- License: Mozilla Public License 2.0
python setup.py installAnalyzes and dumps PowerPoint (.ppt) binary format files.
./ppt-dump.py [options] [ppt file]Options:
--help- displays help message--no-struct-output- suppress normal structure analysis output--dump-text- extract and print textual content--no-raw-dumps- suppress raw hex dumps of uninterpreted areas--id-select=id1[,id2 ...]- limit output to selected record IDs
Example:
./ppt-dump.py presentation.ppt
./ppt-dump.py --dump-text --no-raw-dumps slides.pptAnalyzes and dumps Word (.doc) binary format files.
./doc-dump.py [doc file]Example:
./doc-dump.py document.docAnalyzes and dumps Excel (.xls) binary format files with extensive options.
./xls-dump.py [options] [xls file]Options:
-d, --debug- turn on debug mode--show-sector-chain- show sector chain information at start of output--show-stream-pos- show position of each record relative to the stream--dump-mode MODE- specify dump mode: 'flat' (default), 'xml', or 'canonical-xml'--catch- catch exceptions and try to continue--utf-8- output strings as UTF-8
Examples:
./xls-dump.py spreadsheet.xls
./xls-dump.py --dump-mode xml --debug workbook.xls
./xls-dump.py --show-stream-pos --utf-8 data.xlsAnalyzes and dumps Visio (.vsd) format files.
./vsd-dump.py [vsd file]Example:
./vsd-dump.py diagram.vsdAnalyzes and dumps Enhanced Metafile (.emf) format files.
./emf-dump.py [emf file]Example:
./emf-dump.py image.emfAnalyzes and dumps Windows Metafile (.wmf) format files.
./wmf-dump.py [wmf file]Example:
./wmf-dump.py graphic.wmfDumps OLE1 embedded objects according to [MS-OLEDS] 2.2.5 specification.
./ole1-dump.py [ole1 file]Example:
./ole1-dump.py embedded_object.ole1Dumps OLE2 preview streams according to [MS-OLEDS] 2.3.4 specification.
./ole2preview-dump.py [ole2 file]Example:
./ole2preview-dump.py preview_stream.ole2Extracts and analyzes VBA (Visual Basic for Applications) code from Office documents.
./vbadump.py [office file with VBA]Example:
./vbadump.py macro_document.xlsDumps Star Writer binary layout cache format.
./swlaycache-dump.py [cache file]Example:
./swlaycache-dump.py layout.cacheCompresses VBA streams using Microsoft's compression algorithm.
./compress.py [offset]Takes input from stdin and outputs compressed stream to stdout. Optional offset parameter.
Decompresses VBA streams.
./decompress.py [offset]Takes compressed input from stdin and outputs decompressed stream to stdout. Optional offset parameter.
Replaces UUIDs in PowerPoint XML streams with sequential integers for easier analysis.
cat ppt/diagrams/data1.xml | ./pptx-kill-uuid.pyUtility script for converting enumerations (see source for specific usage).
Most dump tools output XML-formatted analysis data that includes:
- File structure information
- Record-by-record analysis
- Raw hex dumps of binary data
- Extracted text content (where applicable)
- Stream hierarchies for compound document formats
The core parsing logic is contained in the msodumper/ package with specialized modules for each format:
docstream.py,docrecord.py- Word document parsingxlsstream.py,xlsrecord.py,xlsmodel.py- Excel parsingpptstream.py,pptrecord.py- PowerPoint parsingemfrecord.py,wmfrecord.py- Graphics format parsingole.py,olestream.py- OLE compound document parsingvbahelper.py- VBA macro analysis- etc.
Submit Patches to LibreOffice Gerrit:
This project is licensed under the Mozilla Public License 2.0 - see the license header in each source file for details.