|
| 1 | +# CodebaseMD |
| 2 | + |
| 3 | +Convert any file to Markdown with our powerful web interface, powered by [MarkItDown](https://github.com/microsoft/markitdown). |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +CodebaseMD is a web interface and VS Code extension that leverages the powerful MarkItDown library to convert various file formats to clean, structured Markdown. Perfect for documentation, knowledge bases, and preparing content for LLMs. |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +- **Convert Any File**: Transform PDFs, Office documents, images, audio, and more to clean Markdown |
| 12 | +- **Preserve Structure**: Maintains headings, lists, tables, and other formatting elements |
| 13 | +- **AI Enhancements**: Uses Gemini API for rich image descriptions |
| 14 | +- **VS Code Integration**: Seamlessly convert files directly in your editor |
| 15 | +- **Web Interface**: Upload and convert files from anywhere |
| 16 | + |
| 17 | +## Supported File Types |
| 18 | + |
| 19 | +- 📄 PDF |
| 20 | +- 📊 Excel (.xlsx, .xls) |
| 21 | +- 📝 Word (.docx) |
| 22 | +- 🖼️ PowerPoint (.pptx) |
| 23 | +- 📷 Images (with EXIF metadata and OCR) |
| 24 | +- 🎵 Audio (with metadata and speech transcription) |
| 25 | +- 📰 HTML |
| 26 | +- 🗄️ Text formats (CSV, JSON, XML) |
| 27 | +- 📚 EPub |
| 28 | +- 📦 ZIP (iterates through contents) |
| 29 | +- 🎬 YouTube URLs (transcription) |
| 30 | +- ...and more! |
| 31 | + |
| 32 | +## Coming Soon |
| 33 | + |
| 34 | +Our web application is currently in development using: |
| 35 | + |
| 36 | +- **Frontend**: Next.js, Tailwind CSS, shadcn/ui, Lucide React icons |
| 37 | +- **Backend**: FastAPI Python server |
| 38 | +- **Deployment**: Docker containerization with frontend on Vercel and backend on Linux VPS |
| 39 | + |
| 40 | +## Why Use CodebaseMD? |
| 41 | + |
| 42 | +### LLM-Ready Output |
| 43 | + |
| 44 | +Markdown is the perfect format for LLMs, as they're trained on vast amounts of Markdown-formatted text. The simple structure preserves document semantics while remaining highly token-efficient. |
| 45 | + |
| 46 | +### VS Code Integration |
| 47 | + |
| 48 | +Our VS Code extension allows you to: |
| 49 | +- Convert files without leaving your editor |
| 50 | +- Preview Markdown output side-by-side |
| 51 | +- Batch convert multiple files |
| 52 | + |
| 53 | +### Advanced AI Features |
| 54 | + |
| 55 | +We leverage AI to enhance conversion quality: |
| 56 | +- Smart image captioning via Gemini API |
| 57 | +- Improved OCR for scanned documents |
| 58 | +- Structure preservation with intelligent formatting |
| 59 | + |
| 60 | +## Get Started |
| 61 | + |
| 62 | +Stay tuned for our web application launch! In the meantime: |
| 63 | + |
| 64 | +1. Install our VS Code extension (coming soon) |
| 65 | +2. Try [MarkItDown](https://github.com/microsoft/markitdown) directly via pip: |
| 66 | + ``` |
| 67 | + pip install 'markitdown[all]' |
| 68 | + markitdown your-file.pdf > output.md |
| 69 | + ``` |
| 70 | + |
| 71 | +## Contributing |
| 72 | + |
| 73 | +We welcome contributions to CodebaseMD! Check back soon for our contributing guidelines. |
| 74 | + |
| 75 | +## License |
| 76 | + |
| 77 | +This project is licensed under the MIT License - see the LICENSE file for details. |
0 commit comments