Skip to content

aliumairdev/markitdown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Markitdown

Markitdown is a local Ruby document-to-Markdown converter for AI ingestion. It is inspired by Microsoft MarkItDown, but implemented as a Ruby gem.

The rule is strict: this gem converts files locally first. AI services should receive the resulting Markdown, not raw document pages, for analysis.

Supported Inputs

  • Plain text and Markdown
  • HTML
  • CSV, JSON, and XML
  • PDFs with embedded text
  • DOCX files
  • Images through the bundled rtesseract Ruby wrapper, when the native tesseract executable is installed

Scanned PDFs and images require local OCR tooling. The gem installs the Ruby OCR wrapper, but your machine or deployment image still needs the native Tesseract executable, for example brew install tesseract on macOS. If OCR is unavailable or a file cannot be converted locally, the result includes warnings and empty Markdown instead of calling an AI fallback.

Usage

result = Markitdown.convert_pages([
  {
    "base64" => Base64.strict_encode64(File.binread("document.pdf")),
    "mime_type" => "application/pdf",
    "name" => "document.pdf"
  }
])

result.markdown
result.warnings
result.metadata

Development

Run the gem test suite from this directory:

bundle exec rake test

For local app development, the gem can be used as a path dependency:

gem "markitdown", path: "path/to/markitdown"

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors