使用fastdeploy (triton)服务化部署了docker, 复刻了ppstructure + ppocrV4 ,gpu不释放,10分钟 160G显存满了~~! #12312
Unanswered
zhuxiaobin
asked this question in
Q&A
Replies: 2 comments 1 reply
-
您好,这个问题我们已经知晓,跟进中 |
Beta Was this translation helpful? Give feedback.
1 reply
-
看了很多这样的问题,都不知道该如何部署模型了 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
使用fastdeploy (triton)服务化部署了docker, 复刻了ppstructure + paddleocr , 高并发响应很快,就是gpu不释放,10分钟 160G显存满了~~!
容器镜像:registry.baidubce.com/paddlepaddle/fastdeploy:1.0.7-gpu-cuda11.4-trt8.5-21.10
模型:PPYOLOE + PPOCRV4 (det server+cls+rec server)
请求量:800页扫描件,每个批次50页,单页2M, 预估PPYOLOE 接收16次,det server 接收 6000+ ,cls + rec server 各自接收 20-30w +
错误:W0314 04:50:46.438977 62225 memory.cc:135] Failed to allocate CUDA memory with byte size 79027200 on GPU 1: CNMEM_STATUS_OUT_OF_MEMORY, falling back to pinned system memory
0314 05:01:17.338640 62420 pb_stub.cc:402] Failed to process the request(s) for model 'det_postprocess_0_0', message: TritonModelException: in ensemble 'rec_pp', softmax_2.tmp_0: failed to perform CUDA copy: an illegal memory access was encountered
我看fastdeploy + triton 是官方现在推荐的高并发server ,并发等各方面都很好的解决,就是没办法释放gpu,fastdeploy issues给的建议是子进程启动模型后释放,这种只适合跑批,不太适合常驻高并发的server模式。
Beta Was this translation helpful? Give feedback.
All reactions