Skip to content

Commit e447e45

Browse files
committed
docs: add README.md for CodebaseMD web interface
- Introduce CodebaseMD and its features - Outline supported file types and upcoming developments - Highlight VS Code integration and AI enhancements - Provide installation instructions and contribution guidelines
1 parent 4d8209f commit e447e45

File tree

1 file changed

+77
-0
lines changed

1 file changed

+77
-0
lines changed

CodebaseMD-Web/README.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# CodebaseMD
2+
3+
Convert any file to Markdown with our powerful web interface, powered by [MarkItDown](https://github.com/microsoft/markitdown).
4+
5+
## Overview
6+
7+
CodebaseMD is a web interface and VS Code extension that leverages the powerful MarkItDown library to convert various file formats to clean, structured Markdown. Perfect for documentation, knowledge bases, and preparing content for LLMs.
8+
9+
## Features
10+
11+
- **Convert Any File**: Transform PDFs, Office documents, images, audio, and more to clean Markdown
12+
- **Preserve Structure**: Maintains headings, lists, tables, and other formatting elements
13+
- **AI Enhancements**: Uses Gemini API for rich image descriptions
14+
- **VS Code Integration**: Seamlessly convert files directly in your editor
15+
- **Web Interface**: Upload and convert files from anywhere
16+
17+
## Supported File Types
18+
19+
- 📄 PDF
20+
- 📊 Excel (.xlsx, .xls)
21+
- 📝 Word (.docx)
22+
- 🖼️ PowerPoint (.pptx)
23+
- 📷 Images (with EXIF metadata and OCR)
24+
- 🎵 Audio (with metadata and speech transcription)
25+
- 📰 HTML
26+
- 🗄️ Text formats (CSV, JSON, XML)
27+
- 📚 EPub
28+
- 📦 ZIP (iterates through contents)
29+
- 🎬 YouTube URLs (transcription)
30+
- ...and more!
31+
32+
## Coming Soon
33+
34+
Our web application is currently in development using:
35+
36+
- **Frontend**: Next.js, Tailwind CSS, shadcn/ui, Lucide React icons
37+
- **Backend**: FastAPI Python server
38+
- **Deployment**: Docker containerization with frontend on Vercel and backend on Linux VPS
39+
40+
## Why Use CodebaseMD?
41+
42+
### LLM-Ready Output
43+
44+
Markdown is the perfect format for LLMs, as they're trained on vast amounts of Markdown-formatted text. The simple structure preserves document semantics while remaining highly token-efficient.
45+
46+
### VS Code Integration
47+
48+
Our VS Code extension allows you to:
49+
- Convert files without leaving your editor
50+
- Preview Markdown output side-by-side
51+
- Batch convert multiple files
52+
53+
### Advanced AI Features
54+
55+
We leverage AI to enhance conversion quality:
56+
- Smart image captioning via Gemini API
57+
- Improved OCR for scanned documents
58+
- Structure preservation with intelligent formatting
59+
60+
## Get Started
61+
62+
Stay tuned for our web application launch! In the meantime:
63+
64+
1. Install our VS Code extension (coming soon)
65+
2. Try [MarkItDown](https://github.com/microsoft/markitdown) directly via pip:
66+
```
67+
pip install 'markitdown[all]'
68+
markitdown your-file.pdf > output.md
69+
```
70+
71+
## Contributing
72+
73+
We welcome contributions to CodebaseMD! Check back soon for our contributing guidelines.
74+
75+
## License
76+
77+
This project is licensed under the MIT License - see the LICENSE file for details.

0 commit comments

Comments
 (0)