在Triton中的3模型cpu_threads数该如何设定呢？当前测试基本无并行 #12992

tigflanker · 2023-11-20T11:42:40Z

tigflanker
Nov 20, 2023

咨询Paddleocr大佬们一个问题，我这段时间测试PaddleOCR的各个使用途径，发现FastDeploy（Triton）效果还是挺赞的，打算往生产上推了

我们自己的机器是Ubuntu（8核 16G，CPU），生产侧最高的业务并发是500

想问一下，3个模型的runtime应该怎么设？
本人工程做的很少，以下是我的目前设定，如果闹笑话了请随意笑笑：）

models	instance_group_count	cpu_threads
det_runtime	1	8
cls_runtime	1	4
rec_runtime	1	4

这样的设置方法，单张图片（25kb左右），识别也就400ms左右，但是我按照例如 “ab -n 1000 -c 200”这样测试，就已经非常非常慢了；不知道瓶颈在哪儿

tigflanker · 2023-11-22T06:38:17Z

tigflanker
Nov 22, 2023
Author

期望大佬帮关注一下，这周测不通就必须得换方案，有点慌

目前是在CPU4核机器上进行测试，仅仅是测试Triton的服务，但是发现并发几乎没有效果，
同一张样例图片，串行1000例的耗时是446秒，并行的耗时大概是404秒，我目前修改的config如下

	max_batch_size	instance_group	cpu_threads
pp_ocr	1	调度脚本没有
det_preprocess	1	2
det_runtime	1	2	8
det_postprocess	128	2
cls_pp	128	调度脚本没有
cls_runtime	128	2	8
cls_postprocess	128	2
rec_pp	128	调度脚本没有
rec_runtime	128	2	8
rec_postprocess	128	2

测试脚本是从client.py改出来的，核心代码如下：

def tmp(p=True):
    model_name = "pp_ocr"
    model_version = "1"
    url = "88.88.16.157:8001"
    runner = SyncGRPCTritonRunner(url, model_name, model_version)
    
    im = cv2.imread("/data/images/sample2.jpg")
    im = np.array([im, ])
    result = runner.Run([im, ])
    batch_texts = result['rec_texts']
    batch_scores = result['rec_scores']
    batch_bboxes = result['det_bboxes']

    texts = batch_texts[0]
    scores = batch_scores[0]
    bboxes = batch_bboxes[0]
    
    if p:
        for i_box in range(len(texts)):
            print('text=', texts[i_box].decode('utf-8'), '  score=', scores[i_box], '  bbox=', bboxes[i_box])
            
    return texts

# rc = tmp()
test_rounds = 1000
correct = 0

print(datetime.now())

strat_time = datetime.now()
for i in range(test_rounds):
    rc = tmp(p=False)
    if np.mod(i, 10) == 0:
        print(i, (datetime.now() - strat_time).seconds)
    if rc[0].decode('utf-8')[:2] == '11':
        correct += 1
print('串联测试', (datetime.now() - strat_time).seconds, '正确率', correct / test_rounds)  # 串联测试 446 正确率 1.0

strat_time = datetime.now()
coll_list = Parallel(n_jobs=os.cpu_count())(delayed(tmp)(p=False) for i in range(test_rounds))
print('并发测试', (datetime.now() - strat_time).seconds, '正确率', sum([x[0].decode('utf-8')[:2] == '11' for x in coll_list]) / test_rounds)  # 并发测试 404 正确率 1.0

全部代码如附件，不知道哪块用的不对。

请大佬帮指一下，非常感谢。
@andyjiang1116 @tink2123 @D-DanielYang

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

在Triton中的3模型cpu_threads数该如何设定呢？当前测试基本无并行 #12992

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

在Triton中的3模型cpu_threads数该如何设定呢？当前测试基本无并行 #12992

Uh oh!

Uh oh!

tigflanker Nov 20, 2023

Replies: 1 comment

Uh oh!

Uh oh!

tigflanker Nov 22, 2023 Author

tigflanker
Nov 20, 2023

tigflanker
Nov 22, 2023
Author