pdserver部署paddleocr 2.7，测试显存不足 #13023

Linlp · 2024-06-11T07:15:04Z

Linlp
Jun 11, 2024

一、问题：
使用pdserver进行推理paddleocr模型，显存会爆掉，具体是：调用第一次模型，显卡内存占用到15GB，第二次调用模型推理直接out of memory，哪些设置不对呢？
疑问1、第一次启动服务时候，占用系统显存为5GB，这个通过设置哪些参数不占用系统gpu？

疑问2、第一次请求推理，占用显存为15GB，再次请求则显示out of memory，此后系统显存一直被占用不被释放，应该如何设置哪些参数支持高并发，而不out of memory？

二、系统环境：（Ubuntu 22.04 ）NVIDIA A5000 16GB）
paddle-bfloat 0.1.7
paddle-serving-app 0.8.3
paddle-serving-client 0.8.3
paddle-serving-server-gpu 0.8.3.post112
paddleocr 2.7.0.0
paddlepaddle-gpu 2.3.2.post112

按照教程：https://github.com/PaddlePaddle/PaddleOCR/blob/main/deploy/pdserving/README_CN.md#%E9%83%A8%E7%BD%B2进行部署；
部署过程：
#1 格式转换
python -m paddle_serving_client.convert --dirname ./ch_PP-OCRv4_det_server_infer --model_filename inference.pdmodel --params_filename inference.pdiparams --serving_server ./ppocr_det_v4_serving/ --serving_client ./ppocr_det_v3_client/

python -m paddle_serving_client.convert --dirname ./ch_PP-OCRv4_rec_server_infer --model_filename inference.pdmodel --params_filename inference.pdiparams --serving_server ./ppocr_rec_v4_serving/ --serving_client ./ppocr_rec_v4_client/

2 将 ppocr_det_v4_client、ppocr_det_v4_serving、 ppocr_rec_v4_client/和ppocr_rec_v4_serving/文件夹移动至ocr-PaddleOCR-2.7/deploy/pdserving下；

#3 修改config文件
#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
rpc_port: 18091

#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
http_port: 9998

#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
##当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
worker_num: 10

#build_dag_each_worker, False，框架在进程内创建一条DAG；True，框架会每个进程内创建多个独立的DAG
build_dag_each_worker: False

dag:
#op资源类型, True, 为线程模型；False，为进程模型
is_thread_op: False

#重试次数
retry: 10

#使用性能分析, True，生成Timeline性能数据，对性能有一定影响；False为不使用
use_profile: True

tracer:
    interval_s: 10

op:
det:
#并发数，is_thread_op=True时，为线程并发；否则为进程并发
concurrency: 8

    #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
    local_service_conf:
        #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
        client_type: local_predictor

        #det模型路径
        model_config: ./ppocr_det_v4_serving

        #Fetch结果列表，以client_config中fetch_var的alias_name为准，不设置默认取全部输出变量
        #fetch_list: ["sigmoid_0.tmp_0"]

        #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
        devices: "0"

        ir_optim: True
rec:
    #并发数，is_thread_op=True时，为线程并发；否则为进程并发
    concurrency: 4

    #超时时间, 单位ms
    timeout: -1

    #Serving交互重试次数，默认不重试
    retry: 1

    #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
    local_service_conf:

        #client类型，包括brpc, grpc和local_predictor。local_predictor不启动Serving服务，进程内预测
        client_type: local_predictor

        #rec模型路径
        model_config: ./ppocr_rec_v4_serving

        #Fetch结果列表，以client_config中fetch_var的alias_name为准, 不设置默认取全部输出变量
        #fetch_list:

        #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
        devices: "0"

        ir_optim: True

4、启动服务：python web_service.py --config=config.yml

可以成功启动：

5、执行python pipeline_http_client.py

第一张图可以执行结果，第二张图就会显示显存已满；
记录如下：

../../doc/imgs/00006737.jpg
erro_no:0, err_msg:
('www.997788.c0m中国收藏热线', 0.93801534), [[2.0, 7.0], [329.0, 4.0], [329.0, 27.0], [2.0, 29.0]]
('BOARDING', 0.9911648), [[422.0, 23.0], [657.0, 18.0], [658.0, 58.0], [423.0, 62.0]]
('登机牌', 0.9993414), [[153.0, 24.0], [355.0, 21.0], [356.0, 70.0], [154.0, 74.0]]
('PASS', 0.9975725), [[703.0, 15.0], [820.0, 13.0], [821.0, 56.0], [704.0, 58.0]]
('座位号', 0.99840087), [[676.0, 99.0], [738.0, 97.0], [739.0, 118.0], [677.0, 120.0]]
('序号', 0.99921644), [[490.0, 103.0], [534.0, 103.0], [534.0, 122.0], [490.0, 122.0]]
('SERIALNO', 0.9913897), [[545.0, 103.0], [647.0, 101.0], [647.0, 118.0], [545.0, 121.0]]
('舱位', 0.9949764), [[340.0, 105.0], [385.0, 103.0], [385.0, 126.0], [341.0, 127.0]]
('CLASS', 0.98982906), [[399.0, 105.0], [456.0, 105.0], [456.0, 123.0], [399.0, 123.0]]
('日期DATE', 0.99608606), [[214.0, 107.0], [317.0, 107.0], [317.0, 130.0], [214.0, 130.0]]
('SEATNO', 0.9924663), [[753.0, 99.0], [833.0, 96.0], [833.0, 114.0], [754.0, 117.0]]
('航班', 0.99966276), [[63.0, 111.0], [108.0, 111.0], [108.0, 132.0], [63.0, 132.0]]
('FLIGHT', 0.9781428), [[119.0, 111.0], [189.0, 109.0], [189.0, 127.0], [119.0, 129.0]]
('W', 0.7744087), [[406.0, 133.0], [430.0, 133.0], [430.0, 155.0], [406.0, 155.0]]
('03DEC', 0.98559016), [[234.0, 136.0], [327.0, 135.0], [327.0, 157.0], [234.0, 159.0]]
('MU2379', 0.99838823), [[81.0, 140.0], [210.0, 137.0], [210.0, 160.0], [81.0, 162.0]]
('035', 0.99874926), [[509.0, 131.0], [567.0, 129.0], [567.0, 153.0], [510.0, 155.0]]
('始发地', 0.9990115), [[343.0, 174.0], [406.0, 173.0], [406.0, 194.0], [343.0, 196.0]]
('FROM', 0.98515224), [[420.0, 174.0], [468.0, 174.0], [468.0, 193.0], [420.0, 193.0]]
('登机口', 0.99886125), [[490.0, 174.0], [556.0, 174.0], [556.0, 194.0], [490.0, 194.0]]
('GATE', 0.9974543), [[566.0, 174.0], [613.0, 172.0], [614.0, 190.0], [567.0, 192.0]]
('目的地T0', 0.918258), [[66.0, 179.0], [169.0, 178.0], [169.0, 201.0], [66.0, 202.0]]
('登机时间BDT', 0.99218565), [[677.0, 170.0], [812.0, 167.0], [812.0, 188.0], [677.0, 191.0]]
('福州', 0.9998597), [[98.0, 206.0], [169.0, 206.0], [169.0, 229.0], [98.0, 229.0]]
('TAIYUAN', 0.9712927), [[336.0, 218.0], [474.0, 216.0], [474.0, 236.0], [336.0, 239.0]]
('C11', 0.8256312), [[508.0, 215.0], [553.0, 215.0], [553.0, 234.0], [508.0, 234.0]]
('FUZHOU', 0.9928904), [[89.0, 228.0], [203.0, 226.0], [203.0, 248.0], [89.0, 250.0]]
('身份识别IDNO.', 0.92804295), [[344.0, 238.0], [483.0, 235.0], [483.0, 255.0], [344.0, 258.0]]
('姓名NAME', 0.99572617), [[65.0, 249.0], [172.0, 247.0], [172.0, 270.0], [65.0, 272.0]]
('ZHANGQIWET', 0.9452583), [[77.0, 276.0], [263.0, 272.0], [263.0, 294.0], [77.0, 298.0]]
('票号TKTNO', 0.9782185), [[462.0, 297.0], [578.0, 294.0], [578.0, 314.0], [462.0, 316.0]]
('张祺伟', 0.95761776), [[103.0, 312.0], [209.0, 311.0], [209.0, 334.0], [103.0, 335.0]]
('票价FARE', 0.99265593), [[70.0, 344.0], [163.0, 342.0], [163.0, 362.0], [70.0, 363.0]]
('ETKT7813699238489/1', 0.9963156), [[346.0, 348.0], [661.0, 345.0], [661.0, 365.0], [346.0, 368.0]]
('登机口于起飞前10分钟关闭', 0.994543), [[100.0, 457.0], [343.0, 453.0], [343.0, 473.0], [100.0, 477.0]]
('GATES CLOSE 1O MINUTES BEFORE DEPARTURE TIME', 0.9434192), [[360.0, 452.0], [830.0, 443.0], [830.0, 462.0], [360.0, 471.0]]
../../doc/imgs/00009282.jpg
erro_no:10000, err_msg:[det] failed to predict. (data_id=1 log_id=1) [det|3] Failed to process(batch: [1]): ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 5.062500GB memory on GPU 0, 13.277405GB memory has been allocated and available memory is only 2.459412GB.