Skip to content

LibreOffice/mso-dumper

Repository files navigation

MSO-Dumper

A comprehensive set of tools for analyzing and dumping Microsoft Office file formats.

Description

MSO-Dumper is a package for analyzing and dumping various Microsoft Office file formats, including binary formats like DOC, XLS, PPT, and graphics formats like EMF, WMF. It provides detailed structural analysis and can extract content from these files.

Author Information

Installation

python setup.py install

Tools and Usage

Document Format Dumpers

ppt-dump.py - PowerPoint File Dumper

Analyzes and dumps PowerPoint (.ppt) binary format files.

./ppt-dump.py [options] [ppt file]

Options:

  • --help - displays help message
  • --no-struct-output - suppress normal structure analysis output
  • --dump-text - extract and print textual content
  • --no-raw-dumps - suppress raw hex dumps of uninterpreted areas
  • --id-select=id1[,id2 ...] - limit output to selected record IDs

Example:

./ppt-dump.py presentation.ppt
./ppt-dump.py --dump-text --no-raw-dumps slides.ppt

doc-dump.py - Word Document Dumper

Analyzes and dumps Word (.doc) binary format files.

./doc-dump.py [doc file]

Example:

./doc-dump.py document.doc

xls-dump.py - Excel Spreadsheet Dumper

Analyzes and dumps Excel (.xls) binary format files with extensive options.

./xls-dump.py [options] [xls file]

Options:

  • -d, --debug - turn on debug mode
  • --show-sector-chain - show sector chain information at start of output
  • --show-stream-pos - show position of each record relative to the stream
  • --dump-mode MODE - specify dump mode: 'flat' (default), 'xml', or 'canonical-xml'
  • --catch - catch exceptions and try to continue
  • --utf-8 - output strings as UTF-8

Examples:

./xls-dump.py spreadsheet.xls
./xls-dump.py --dump-mode xml --debug workbook.xls
./xls-dump.py --show-stream-pos --utf-8 data.xls

vsd-dump.py - Visio Document Dumper

Analyzes and dumps Visio (.vsd) format files.

./vsd-dump.py [vsd file]

Example:

./vsd-dump.py diagram.vsd

Graphics Format Dumpers

emf-dump.py - Enhanced Metafile Dumper

Analyzes and dumps Enhanced Metafile (.emf) format files.

./emf-dump.py [emf file]

Example:

./emf-dump.py image.emf

wmf-dump.py - Windows Metafile Dumper

Analyzes and dumps Windows Metafile (.wmf) format files.

./wmf-dump.py [wmf file]

Example:

./wmf-dump.py graphic.wmf

OLE Format Dumpers

ole1-dump.py - OLE1 Embedded Object Dumper

Dumps OLE1 embedded objects according to [MS-OLEDS] 2.2.5 specification.

./ole1-dump.py [ole1 file]

Example:

./ole1-dump.py embedded_object.ole1

ole2preview-dump.py - OLE2 Preview Stream Dumper

Dumps OLE2 preview streams according to [MS-OLEDS] 2.3.4 specification.

./ole2preview-dump.py [ole2 file]

Example:

./ole2preview-dump.py preview_stream.ole2

VBA and Macro Analysis

vbadump.py - VBA Project Dumper

Extracts and analyzes VBA (Visual Basic for Applications) code from Office documents.

./vbadump.py [office file with VBA]

Example:

./vbadump.py macro_document.xls

Special Format Tools

swlaycache-dump.py - StarWriter Layout Cache Dumper

Dumps Star Writer binary layout cache format.

./swlaycache-dump.py [cache file]

Example:

./swlaycache-dump.py layout.cache

Utility Scripts

compress.py - VBA Stream Compressor

Compresses VBA streams using Microsoft's compression algorithm.

./compress.py [offset]

Takes input from stdin and outputs compressed stream to stdout. Optional offset parameter.

decompress.py - VBA Stream Decompressor

Decompresses VBA streams.

./decompress.py [offset]

Takes compressed input from stdin and outputs decompressed stream to stdout. Optional offset parameter.

pptx-kill-uuid.py - PowerPoint UUID Replacement Tool

Replaces UUIDs in PowerPoint XML streams with sequential integers for easier analysis.

cat ppt/diagrams/data1.xml | ./pptx-kill-uuid.py

convert-enum.py

Utility script for converting enumerations (see source for specific usage).

Output Formats

Most dump tools output XML-formatted analysis data that includes:

  • File structure information
  • Record-by-record analysis
  • Raw hex dumps of binary data
  • Extracted text content (where applicable)
  • Stream hierarchies for compound document formats

Development

The core parsing logic is contained in the msodumper/ package with specialized modules for each format:

  • docstream.py, docrecord.py - Word document parsing
  • xlsstream.py, xlsrecord.py, xlsmodel.py - Excel parsing
  • pptstream.py, pptrecord.py - PowerPoint parsing
  • emfrecord.py, wmfrecord.py - Graphics format parsing
  • ole.py, olestream.py - OLE compound document parsing
  • vbahelper.py - VBA macro analysis
  • etc.

Submit Patches to LibreOffice Gerrit:

License

This project is licensed under the Mozilla Public License 2.0 - see the license header in each source file for details.

About

READ ONLY MIRROR

Resources

Stars

Watchers

Forks

Contributors

Languages