paddleocr_vl调用本地部署的模型进行pdf高并发解析 #17511

liguoyu666 · 2026-01-16T10:26:00Z

liguoyu666
Jan 16, 2026

使用代码如下：
from pathlib import Path
from paddleocr import PaddleOCRVL

input_file = "./test_02.pdf"
output_path = Path("./output")

英伟达 GPU

pipeline = PaddleOCRVL()

pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", vl_rec_server_url="http://127.0.0.1:8118/v1")

昆仑芯 XPU

pipeline = PaddleOCRVL(device="xpu")

海光 DCU

pipeline = PaddleOCRVL(device="dcu")

沐曦 GPU

pipeline = PaddleOCRVL(device="metax_gpu")

output = pipeline.predict(input=input_file)

markdown_list = []
markdown_images = []

for res in output:
md_info = res.markdown
markdown_list.append(md_info)
markdown_images.append(md_info.get("markdown_images", {}))

markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)

print(markdown_texts)

mkd_file_path = output_path / f"{Path(input_file).stem}.md"
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)

with open(mkd_file_path, "w", encoding="utf-8") as f:
f.write(markdown_texts)

for item in markdown_images:
if item:
for path, image in item.items():
file_path = output_path / path
file_path.parent.mkdir(parents=True, exist_ok=True)
image.save(file_path)

在对187页的pdf进行解析时，响应时间有点久，耗时约8、9分钟，请问对单一pdf解析时，如何高并发解析，比如说几十页同时解析，最后汇总到一起

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paddleocr_vl调用本地部署的模型进行pdf高并发解析 #17511

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

paddleocr_vl调用本地部署的模型进行pdf高并发解析 #17511

Uh oh!

liguoyu666 Jan 16, 2026

英伟达 GPU

pipeline = PaddleOCRVL()

昆仑芯 XPU

pipeline = PaddleOCRVL(device="xpu")

海光 DCU

pipeline = PaddleOCRVL(device="dcu")

沐曦 GPU

pipeline = PaddleOCRVL(device="metax_gpu")

print(markdown_texts)

Replies: 0 comments

liguoyu666
Jan 16, 2026