Awesome-markdown-ebooks

Your GitHub PDFs, Now AI-Ready.

Project Introduction

The project gathers high-quality e-book repositories from GitHub and leverages MinerU 2.0 to transform the PDF content into Markdown format.

Each directory represents a repository originally hosted on GitHub.

If you possess any high-quality e-book resources that require conversion, you’re welcome to submit the links in an issue and we will assist with the PDF-to-Markdown extraction.

Our goal is to convert more high-quality knowledge data into AI-ready data.

Converted Repositories List

Repo url	Download
ChinaTextbook	opendatalab/awesome-markdown-ebooks/ChinaTextbook

Output File Structure Documentation (Based on MinerU2 vlm, Output File Structure)

1. Markdown Files and Images

File Type: .md file + images/ folder
Description: Final result of PDF to Markdown conversion
Content: Document text content and image references

2. Model Output File

Filename: model_output.txt
Description: Intermediate inference data from VLM model
Content: Model's visual understanding results of pages

3. Intermediate Processing File

Filename: middle.json
Description: Processed result from model_output.txt
Content: Contains position information of text, images, formulas, tables, etc. in PDF

4. Content List File

Filename: content_list.json
Description: Final result converted from middle.json
Content: Document conversion results segmented by elements, including page information

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
README.md		README.md
README_zh-CN.md		README_zh-CN.md
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-markdown-ebooks

Your GitHub PDFs, Now AI-Ready.

Project Introduction

Converted Repositories List

Output File Structure Documentation (Based on MinerU2 vlm, Output File Structure)

1. Markdown Files and Images

2. Model Output File

3. Intermediate Processing File

4. Content List File

Star History

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

opendatalab/awesome-markdown-ebooks

Folders and files

Latest commit

History

Repository files navigation

Awesome-markdown-ebooks

Your GitHub PDFs, Now AI-Ready.

Project Introduction

Converted Repositories List

Output File Structure Documentation (Based on MinerU2 vlm, Output File Structure)

1. Markdown Files and Images

2. Model Output File

3. Intermediate Processing File

4. Content List File

Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages