Replies: 2 comments 14 replies
-
|
MinerU的VLM backend在单卡上实现并发推理有几种方式: 1. 使用异步Backend (推荐) 切换到 from mineru.backend.vlm.vlm_analyze import aio_doc_analyze
import asyncio
async def process_multiple_pdfs(pdf_bytes_list):
tasks = [
aio_doc_analyze(
pdf_bytes=pdf_bytes,
backend="vllm-async-engine", # 或 "lmdeploy-engine"
model_path=model_path
)
for pdf_bytes in pdf_bytes_list
]
return await asyncio.gather(*tasks)
results = asyncio.run(process_multiple_pdfs([pdf1, pdf2, pdf3]))2. 调整Batch Size 对于 transformers backend,batch_size会根据GPU内存自动配置(≥16GB为8,≥8GB为4),也可以手动指定: result = doc_analyze(
pdf_bytes=pdf_bytes,
backend="transformers",
batch_size=8
)3. HTTP Client并发 如果使用 注意: 当前doc_analyze对单个PDF的处理是将所有页面打包成batch一次性推理,没有实现页面级别的并发。如果需要页面级并发,需要修改底层实现。 To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
使用 http-client 架构,我在4090上运行mineru-openai-server --port 30000,2个并发比1个并发快10s,3个并发直接卡了,是怎么做到支持100高并发请求的呢 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
我看代码在使用vlm作为backend时,doc_analyze的调用似乎是没有并发的,如果我希望在单卡上并发推理,有没有什么参数可以设置,或者代码应该怎么修改呢
Beta Was this translation helpful? Give feedback.
All reactions