使用PP-OCRv5_mobile_det 模型进行推理时候复现不出官方提供的耗时数据 #15689
Unanswered
cocoakeith
asked this question in
Q&A
Replies: 1 comment
-
你好,你提供的代码计算的是端到端的推理时间,也就是包含了前后处理的时间,而官方的耗时数据只记录了单纯模型推理的时间,不包含前后处理,我想主要差异点可能在这里。对于轻量级模型来说,单纯的模型推理时间可能很短,前后处理在总耗时里的占比可能较大。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
硬件环境:
1、T4 卡
2、使用 paddle 推理镜像:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.9-trt8.6
测试脚本:
ocr_engine = TextDetection(model_name="PP-OCRv5_mobile_det", enable_hpi=True, precision="fp16", device="gpu")
time1 = time.time()
cnt = 100
for i in range(cnt):
result = ocr_engine.predict(
input="./general_ocr_002.png", batch_size=1)
print(f"Detace time per pic is {(time.time() - time1) / (cnt + 0.0)}")
测试结果:
Detect time per pic is 0.02564406156539917s,这个结果与官方公布的单次推理耗时8.79 / 3.13ms,差距很大。代码自动使用的 backend 是Paddle Inference backend,配置是Paddle predictor option: device_type: gpu, device_id: None, run_mode: trt_fp32, trt_dynamic_shapes: {'x': [[1, 3, 32, 32], [1, 3, 736, 736], [1, 3, 4000, 4000]]}, cpu_threads: 8, delete_pass: [], enable_new_ir: True, enable_cinn: False, trt_cfg_setting: {'precision_mode': <PrecisionMode.FP32: 'FP32'>}, trt_use_dynamic_shapes: True, trt_collect_shape_range_info: True, trt_discard_cached_shape_range_info: False, trt_dynamic_shape_input_data: None, trt_shape_range_info_path: None, trt_allow_rebuild_at_runtime: True
我在同样环境测试了PP-OCRv5_server_det模型,高性能推理平均每张图片耗时大约在0.07182648420333862s,与官方公布的比较接近,代码自动使用的 backend是Inference backend: tensorrt,配置是Inference backend config: precision='fp32' use_dynamic_shapes=True dynamic_shapes={'x': [[1, 3, 32, 32], [1, 3, 736, 736], [1, 3, 4000, 4000]]}
Beta Was this translation helpful? Give feedback.
All reactions