使用fastdeploy (triton)服务化部署了docker, 复刻了ppstructure + ppocrV4 ,gpu不释放，10分钟 160G显存满了~~！ #12312

zhuxiaobin · 2024-03-14T10:47:37Z

zhuxiaobin
Mar 14, 2024

使用fastdeploy (triton)服务化部署了docker, 复刻了ppstructure + paddleocr , 高并发响应很快，就是gpu不释放，10分钟 160G显存满了~~！
容器镜像：registry.baidubce.com/paddlepaddle/fastdeploy:1.0.7-gpu-cuda11.4-trt8.5-21.10

模型：PPYOLOE + PPOCRV4 (det server+cls+rec server)

请求量：800页扫描件，每个批次50页，单页2M, 预估PPYOLOE 接收16次，det server 接收 6000+ ，cls + rec server 各自接收 20-30w +

错误：W0314 04:50:46.438977 62225 memory.cc:135] Failed to allocate CUDA memory with byte size 79027200 on GPU 1: CNMEM_STATUS_OUT_OF_MEMORY, falling back to pinned system memory
0314 05:01:17.338640 62420 pb_stub.cc:402] Failed to process the request(s) for model 'det_postprocess_0_0', message: TritonModelException: in ensemble 'rec_pp', softmax_2.tmp_0: failed to perform CUDA copy: an illegal memory access was encountered

我看fastdeploy + triton 是官方现在推荐的高并发server ，并发等各方面都很好的解决，就是没办法释放gpu，fastdeploy issues给的建议是子进程启动模型后释放，这种只适合跑批，不太适合常驻高并发的server模式。

cuicheng01 · 2024-03-15T06:55:33Z

cuicheng01
Mar 15, 2024
Maintainer

您好，这个问题我们已经知晓，跟进中

1 reply

eshine996 May 19, 2025

你好请问这个问题有解决？或者临时解决方案

KyleWang-Hunter · 2024-03-19T08:44:34Z

KyleWang-Hunter
Mar 19, 2024

看了很多这样的问题，都不知道该如何部署模型了

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

使用fastdeploy (triton)服务化部署了docker, 复刻了ppstructure + ppocrV4 ,gpu不释放，10分钟 160G显存满了~~！ #12312

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

使用fastdeploy (triton)服务化部署了docker, 复刻了ppstructure + ppocrV4 ,gpu不释放，10分钟 160G显存满了~~！ #12312

Uh oh!

zhuxiaobin Mar 14, 2024

Replies: 2 comments · 1 reply

Uh oh!

cuicheng01 Mar 15, 2024 Maintainer

Uh oh!

eshine996 May 19, 2025

Uh oh!

KyleWang-Hunter Mar 19, 2024

zhuxiaobin
Mar 14, 2024

Replies: 2 comments 1 reply

cuicheng01
Mar 15, 2024
Maintainer

KyleWang-Hunter
Mar 19, 2024