Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions user/apps/mineru-mcp-dragonos/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/target
Cargo.lock
/install/
2 changes: 2 additions & 0 deletions user/apps/mineru-mcp-dragonos/.mineru.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
export MCP_PORT=8080
export MINERU_API_KEY=eyJ0eXBlIjoiSldUIiwiYWxnIjoiSFM1MTIifQ.eyJqdGkiOiI5MjIwMDY5MCIsInJvbCI6IlJPTEVfUkVHSVNURVIiLCJpc3MiOiJPcGVuWExhYiIsImlhdCI6MTc3MzMyNzM2OCwiY2xpZW50SWQiOiJsa3pkeDU3bnZ5MjJqa3BxOXgydyIsInBob25lIjoiIiwib3BlbklkIjpudWxsLCJ1dWlkIjoiZTc2ZWZmZmYtZWQyOC00YWU0LWE1ZGQtMDAyOGQxNzAxOTk1IiwiZW1haWwiOiIiLCJleHAiOjE3ODExMDMzNjh9.fu7u5TLrOOyqpC3p2jBZyIJmC5zok6sT1qw9Zf7KCd3N7ECkBAzNM7VEzdUz02eg2p8TO2990QsTiv-QFiFLEw
30 changes: 30 additions & 0 deletions user/apps/mineru-mcp-dragonos/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
[package]
name = "mineru-mcp-dragonos"
version = "0.1.0"
edition = "2021"
description = "no"
authors = [ "yuming <mingjiangyu1@qq.com>" ]

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
axum = "0.7"
bytes = "1.10.1"
chrono = { version = "0.4", features = ["serde"] }
reqwest = { version = "0.12.12", default-features = false, features = ["json", "rustls-tls"] }
rmcp = { version = "0.14.0", features = ["transport-streamable-http-server"] }
schemars = "1.2.0"
serde = { version = "1.0.226", features = ["derive"] }
serde_json = "1.0.145"
thiserror = "1.0.69"
tokio = { version = "1.47.1", features = ["fs", "macros", "rt-multi-thread", "time", "sync"] }
tokio-util = { version = "0.7", features = ["rt"] }
tracing = "0.1.41"
tracing-subscriber = { version = "0.3.20", features = ["env-filter", "fmt"] }
uuid = { version = "1.18.1", features = ["v4"] }
walkdir = "2.5.0"
zip = "0.6.6"

[dev-dependencies]
tempfile = "3.23.0"
wiremock = "0.6.5"
56 changes: 56 additions & 0 deletions user/apps/mineru-mcp-dragonos/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
TOOLCHAIN=
RUSTFLAGS=

ifdef DADK_CURRENT_BUILD_DIR
# 如果是在dadk中编译,那么安装到dadk的安装目录中
INSTALL_DIR = $(DADK_CURRENT_BUILD_DIR)
else
# 如果是在本地编译,那么安装到当前目录下的install目录中
INSTALL_DIR = ./install
endif

ifeq ($(ARCH), x86_64)
export RUST_TARGET=x86_64-unknown-linux-musl
else ifeq ($(ARCH), riscv64)
export RUST_TARGET=riscv64gc-unknown-linux-gnu
else
# 默认为x86_86,用于本地编译
export RUST_TARGET=x86_64-unknown-linux-musl
endif

run:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) run --target $(RUST_TARGET)

build:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) build --target $(RUST_TARGET)

clean:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) clean --target $(RUST_TARGET)

test:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) test --target $(RUST_TARGET)

doc:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) doc --target $(RUST_TARGET)

fmt:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) fmt

fmt-check:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) fmt --check

run-release:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) run --target $(RUST_TARGET) --release

build-release:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) build --target $(RUST_TARGET) --release

clean-release:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) clean --target $(RUST_TARGET) --release

test-release:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) test --target $(RUST_TARGET) --release

.PHONY: install
install:
RUSTFLAGS=$(RUSTFLAGS) cargo $(TOOLCHAIN) install --target $(RUST_TARGET) --path . --no-track --root $(INSTALL_DIR) --force
90 changes: 90 additions & 0 deletions user/apps/mineru-mcp-dragonos/Prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
你是资深 Rust 工程师。请在一个全新的 Cargo 项目中实现一个 MCP stdio server(用于 Dragon S 环境),功能等价于 “mineru-mcp”。

硬性要求
- 语言:Rust(tokio async)
- MCP SDK:使用官方 Rust SDK rmcp,stdio transport
- 提供两个 tools:
1) parse_documents
2) get_ocr_languages
- 代码需可编译、可运行、可测试:提供单元/集成测试(不依赖真实 MinerU 线上服务和真实 API key)

MCP 工具规格
1) parse_documents
- 入参(用 JSON schema 暴露给 MCP):
- file_sources: string (一个或多个来源,逗号/空格/换行分隔;每个来源要么是 URL,要么是本地文件路径)
- enable_ocr: bool = false
- language: string = "ch"
- page_ranges: string? (仅远程 URL/远程上传模式支持)
- 行为:
- 解析 file_sources,分成 urls 与 local_paths
- 如果 USE_LOCAL_API=true:忽略 urls,只处理 local_paths(行为需与 mineru-mcp 一致)
- 如果 USE_LOCAL_API=false:同时处理 urls 与 local_paths
- 对每个 source 执行 MinerU 解析链路(见“MinerU API 规格”)
- 下载 full_zip_url 的 zip,解压到 OUTPUT_DIR 下独立目录
- 在解压目录中递归查找 md(优先:与输入文件名同名;否则第一个 .md),读取内容
- 返回值(结构化 JSON,便于测试):
{
"results": [
{
"source": "...",
"mode": "remote_url|remote_upload|local_api",
"task_id": "...?" ,
"batch_id": "...?" ,
"markdown": "...",
"output_dir": "...",
"assets": ["images/xxx.jpg", ...]
}
]
}
返回值要求(强兼容模式):parse_documents 必须返回 JSON,且 JSON 的字段结构必须与官方 Python mineru-mcp 一致。请先阅读找到 parse_documents 的返回值结构(字段名/层级/类型),在 Rust 中用 serde 定义对应 struct 并严格序列化一致;测试用例需断言返回 JSON 的字段结构与样例一致。

1) get_ocr_languages
- 返回 MinerU 支持的 OCR 语言列表(至少包含 ch/en 等常用项),并附上 PaddleOCR 多语言列表链接:
https://www.paddleocr.ai/latest/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html

环境变量(需实现读取与默认值)
- MINERU_API_BASE 默认 https://mineru.net
- MINERU_API_KEY 必填(远程模式下)
- OUTPUT_DIR 默认 ./downloads
- USE_LOCAL_API 默认 false
- LOCAL_MINERU_API_BASE 默认 http://localhost:8080

MinerU API 规格(远程)
A) URL 模式
- POST {MINERU_API_BASE}/api/v4/extract/task
body: { url, model_version, is_ocr?, enable_formula?, enable_table?, language?, page_ranges? }
- GET {MINERU_API_BASE}/api/v4/extract/task/{task_id}
轮询直到 state=done,取 data.full_zip_url

B) 本地文件上传模式
- POST {MINERU_API_BASE}/api/v4/file-urls/batch
body: { files:[{name,data_id?,is_ocr?,page_ranges?}], model_version, enable_formula?, enable_table?, language? }
-> 返回 batch_id + file_urls[]
- PUT file_urls[i] 上传文件 bytes
- GET {MINERU_API_BASE}/api/v4/extract-results/batch/{batch_id}
轮询每个文件直到 state=done,取 full_zip_url

实现建议
- HTTP client 用 reqwest;加 timeout、重试(对 5xx/网络错误),轮询间隔可配置(默认 2s),最大等待 10min
- zip 解压用 zip crate;目录操作用 std::fs / walkdir
- 日志用 tracing

测试要求(关键)
- 使用 wiremock/httpmock 在测试里模拟 MinerU API:
- mock POST /api/v4/extract/task -> 返回 task_id
- mock GET /api/v4/extract/task/{id} -> 先返回 running 再返回 done + full_zip_url
- mock GET full_zip_url -> 返回你在测试里动态生成的 zip(包含 1 个 markdown 文件和 images/ 目录)
- 测试 parse_documents 能正确:
- 解析多个 file_sources
- 轮询并下载 zip
- 解压并找到 md
- 返回结构化 results,markdown 字段匹配预期
- 提供 README:如何设置 env、如何 cargo run、如何与 MCP client 对接

交付物
- Cargo.toml / src/main.rs(或模块化)
- tests/…(可直接 cargo test)
- README.md

-----
参考的仓库:[mineru-mcp](https://github.com/linxule/mineru-mcp?tab=readme-ov-file)
71 changes: 71 additions & 0 deletions user/apps/mineru-mcp-dragonos/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# mineru-mcp (Rust)

该项目在 **Rust + Tokio** 上实现 MinerU MCP stdio server,功能对齐官方 `mineru-mcp` 的 `parse_documents` 与 `get_ocr_languages`。

## 功能

- MCP stdio server(基于 `rmcp`)
- 支持 URL 与本地文件解析
- 可选择远程 MinerU API 或本地部署 API
- 自动下载解析结果 zip、解压并读取 Markdown

## 环境变量

| 变量 | 默认值 | 说明 |
| --- | --- | --- |
| `MINERU_API_BASE` | `https://mineru.net` | 远程 MinerU API 基址 |
| `MINERU_API_KEY` | (必填,远程模式) | 远程 API Key |
| `OUTPUT_DIR` | `./downloads` | 解压输出目录 |
| `USE_LOCAL_API` | `false` | 是否启用本地 API |
| `LOCAL_MINERU_API_BASE` | `http://localhost:8080` | 本地 API 基址 |
| `MINERU_POLL_INTERVAL_SECS` | `2` | 轮询间隔(秒) |
| `MINERU_MAX_WAIT_SECS` | `600` | 最大等待时间(秒) |

## 运行

```bash
cd Availiable_Mcp/mineru-mcp
export MINERU_API_KEY=your-api-key
cargo run
```

默认通过 stdio transport 提供 MCP 服务,可直接被 MCP client 启动/托管。

### MCP Client 对接示例(Claude Desktop)

```json
{
"mcpServers": {
"mineru": {
"command": "cargo",
"args": ["run", "--manifest-path", "Availiable_Mcp/mineru-mcp/Cargo.toml"],
"env": {
"MINERU_API_KEY": "your-api-key"
}
}
}
}
```

## 工具

### parse_documents

入参:

- `file_sources`: 以逗号/空格/换行分隔的 URL 或本地路径
- `enable_ocr`: 是否启用 OCR(默认 `false`)
- `language`: 语言(默认 `ch`)
- `page_ranges`: 页码范围(可选)

返回:与官方 Python `mineru-mcp` 保持一致的 JSON 结构(单结果或批量结果)。

### get_ocr_languages

返回 OCR 语言列表,并附带 PaddleOCR 多语言支持链接。

## 测试

```bash
cargo test
```
Loading
Loading