paddleocr_vl调用本地部署的模型进行pdf高并发解析 #17511
Unanswered
liguoyu666
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
使用代码如下:
from pathlib import Path
from paddleocr import PaddleOCRVL
input_file = "./test_02.pdf"
output_path = Path("./output")
英伟达 GPU
pipeline = PaddleOCRVL()
pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", vl_rec_server_url="http://127.0.0.1:8118/v1")
昆仑芯 XPU
pipeline = PaddleOCRVL(device="xpu")
海光 DCU
pipeline = PaddleOCRVL(device="dcu")
沐曦 GPU
pipeline = PaddleOCRVL(device="metax_gpu")
output = pipeline.predict(input=input_file)
markdown_list = []
markdown_images = []
for res in output:
md_info = res.markdown
markdown_list.append(md_info)
markdown_images.append(md_info.get("markdown_images", {}))
markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)
print(markdown_texts)
mkd_file_path = output_path / f"{Path(input_file).stem}.md"
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)
with open(mkd_file_path, "w", encoding="utf-8") as f:
f.write(markdown_texts)
for item in markdown_images:
if item:
for path, image in item.items():
file_path = output_path / path
file_path.parent.mkdir(parents=True, exist_ok=True)
image.save(file_path)
在对187页的pdf进行解析时,响应时间有点久,耗时约8、9分钟,请问对单一pdf解析时,如何高并发解析,比如说几十页同时解析,最后汇总到一起
Beta Was this translation helpful? Give feedback.
All reactions