Markitdown is a local Ruby document-to-Markdown converter for AI ingestion. It is inspired by Microsoft MarkItDown, but implemented as a Ruby gem.
The rule is strict: this gem converts files locally first. AI services should receive the resulting Markdown, not raw document pages, for analysis.
- Plain text and Markdown
- HTML
- CSV, JSON, and XML
- PDFs with embedded text
- DOCX files
- Images through the bundled
rtesseractRuby wrapper, when the nativetesseractexecutable is installed
Scanned PDFs and images require local OCR tooling. The gem installs the Ruby
OCR wrapper, but your machine or deployment image still needs the native
Tesseract executable, for example brew install tesseract on macOS. If OCR is
unavailable or a file cannot be converted locally, the result includes warnings
and empty Markdown instead of calling an AI fallback.
result = Markitdown.convert_pages([
{
"base64" => Base64.strict_encode64(File.binread("document.pdf")),
"mime_type" => "application/pdf",
"name" => "document.pdf"
}
])
result.markdown
result.warnings
result.metadataRun the gem test suite from this directory:
bundle exec rake testFor local app development, the gem can be used as a path dependency:
gem "markitdown", path: "path/to/markitdown"MIT