Skip to content

opendatalab/awesome-markdown-ebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-markdown-ebooks

the project's logo

Your GitHub PDFs, Now AI-Ready.

English | 简体中文

Project Introduction

The project gathers high-quality e-book repositories from GitHub and leverages MinerU 2.0 to transform the PDF content into Markdown format.

Each directory represents a repository originally hosted on GitHub.

If you possess any high-quality e-book resources that require conversion, you’re welcome to submit the links in an issue and we will assist with the PDF-to-Markdown extraction.

Our goal is to convert more high-quality knowledge data into AI-ready data.

Converted Repositories List

Repo url Download
ChinaTextbook opendatalab/awesome-markdown-ebooks/ChinaTextbook

Output File Structure Documentation (Based on MinerU2 vlm, Output File Structure)

1. Markdown Files and Images

  • File Type: .md file + images/ folder
  • Description: Final result of PDF to Markdown conversion
  • Content: Document text content and image references

2. Model Output File

  • Filename: model_output.txt
  • Description: Intermediate inference data from VLM model
  • Content: Model's visual understanding results of pages

3. Intermediate Processing File

  • Filename: middle.json
  • Description: Processed result from model_output.txt
  • Content: Contains position information of text, images, formulas, tables, etc. in PDF

4. Content List File

  • Filename: content_list.json
  • Description: Final result converted from middle.json
  • Content: Document conversion results segmented by elements, including page information

Star History

Star History Chart

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •