Skip to content

Commit 39c3b75

Browse files
authored
add sdk profiler docs (#2189)
* add sdk profiler docs * fix typos
1 parent a5de119 commit 39c3b75

File tree

4 files changed

+222
-0
lines changed

4 files changed

+222
-0
lines changed

docs/en/sdk_usage/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Setup & Usage
88
:maxdepth: 1
99

1010
quick_start
11+
profiler
1112

1213

1314
API Reference

docs/en/sdk_usage/profiler.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# profiler
2+
3+
The SDK has ability to record the time consumption of each module in the pipeline. It's closed by default. To use this ability, two steps are required:
4+
5+
- Generate profiler data
6+
- Analyze profiler Data
7+
8+
## Generate profiler data
9+
10+
Using the C interface and classification pipeline as an example, when creating the pipeline, the create api with context information needs to be used, and profiler handle needs to be added to the context. The detailed code is shown below. Running the demo normally will generate profiler data "profiler_data.txt" in the current directory.
11+
12+
```c++
13+
#include <fstream>
14+
#include <opencv2/imgcodecs/imgcodecs.hpp>
15+
#include <string>
16+
17+
#include "mmdeploy/classifier.h"
18+
19+
int main(int argc, char* argv[]) {
20+
if (argc != 4) {
21+
fprintf(stderr, "usage:\n image_classification device_name dump_model_directory image_path\n");
22+
return 1;
23+
}
24+
auto device_name = argv[1];
25+
auto model_path = argv[2];
26+
auto image_path = argv[3];
27+
cv::Mat img = cv::imread(image_path);
28+
if (!img.data) {
29+
fprintf(stderr, "failed to load image: %s\n", image_path);
30+
return 1;
31+
}
32+
33+
mmdeploy_model_t model{};
34+
mmdeploy_model_create_by_path(model_path, &model);
35+
36+
// create profiler and add it to context
37+
// profiler data will save to profiler_data.txt
38+
mmdeploy_profiler_t profiler{};
39+
mmdeploy_profiler_create("profiler_data.txt", &profiler);
40+
41+
mmdeploy_context_t context{};
42+
mmdeploy_context_create_by_device(device_name, 0, &context);
43+
mmdeploy_context_add(context, MMDEPLOY_TYPE_PROFILER, nullptr, profiler);
44+
45+
mmdeploy_classifier_t classifier{};
46+
int status{};
47+
status = mmdeploy_classifier_create_v2(model, context, &classifier);
48+
if (status != MMDEPLOY_SUCCESS) {
49+
fprintf(stderr, "failed to create classifier, code: %d\n", (int)status);
50+
return 1;
51+
}
52+
53+
mmdeploy_mat_t mat{
54+
img.data, img.rows, img.cols, 3, MMDEPLOY_PIXEL_FORMAT_BGR, MMDEPLOY_DATA_TYPE_UINT8};
55+
56+
// inference loop
57+
for (int i = 0; i < 100; i++) {
58+
mmdeploy_classification_t* res{};
59+
int* res_count{};
60+
status = mmdeploy_classifier_apply(classifier, &mat, 1, &res, &res_count);
61+
62+
mmdeploy_classifier_release_result(res, res_count, 1);
63+
}
64+
65+
mmdeploy_classifier_destroy(classifier);
66+
67+
mmdeploy_model_destroy(model);
68+
mmdeploy_profiler_destroy(profiler);
69+
mmdeploy_context_destroy(context);
70+
71+
return 0;
72+
}
73+
74+
```
75+
76+
## Analyze profiler Data
77+
78+
The performance data can be visualized using a script.
79+
80+
```bash
81+
python tools/sdk_analyze.py profiler_data.txt
82+
```
83+
84+
The parsing results are as follows: "name" represents the name of the node, "n_call" represents the number of calls, "t_mean" represents the average time consumption, "t_50%" and "t_90%" represent the percentiles of the time consumption.
85+
86+
```bash
87+
+---------------------------+--------+-------+--------+--------+-------+-------+
88+
| name | occupy | usage | n_call | t_mean | t_50% | t_90% |
89+
+===========================+========+=======+========+========+=======+=======+
90+
| ./Pipeline | - | - | 100 | 4.831 | 1.913 | 1.946 |
91+
+---------------------------+--------+-------+--------+--------+-------+-------+
92+
| Preprocess/Compose | - | - | 100 | 0.125 | 0.118 | 0.144 |
93+
+---------------------------+--------+-------+--------+--------+-------+-------+
94+
| LoadImageFromFile | 0.017 | 0.017 | 100 | 0.081 | 0.077 | 0.098 |
95+
+---------------------------+--------+-------+--------+--------+-------+-------+
96+
| Resize | 0.003 | 0.003 | 100 | 0.012 | 0.012 | 0.013 |
97+
+---------------------------+--------+-------+--------+--------+-------+-------+
98+
| CenterCrop | 0.002 | 0.002 | 100 | 0.008 | 0.008 | 0.008 |
99+
+---------------------------+--------+-------+--------+--------+-------+-------+
100+
| Normalize | 0.002 | 0.002 | 100 | 0.009 | 0.009 | 0.009 |
101+
+---------------------------+--------+-------+--------+--------+-------+-------+
102+
| ImageToTensor | 0.002 | 0.002 | 100 | 0.008 | 0.007 | 0.007 |
103+
+---------------------------+--------+-------+--------+--------+-------+-------+
104+
| Collect | 0.001 | 0.001 | 100 | 0.005 | 0.005 | 0.005 |
105+
+---------------------------+--------+-------+--------+--------+-------+-------+
106+
| resnet | 0.968 | 0.968 | 100 | 4.678 | 1.767 | 1.774 |
107+
+---------------------------+--------+-------+--------+--------+-------+-------+
108+
| postprocess | 0.003 | 0.003 | 100 | 0.015 | 0.015 | 0.017 |
109+
+---------------------------+--------+-------+--------+--------+-------+-------+
110+
```

docs/zh_cn/sdk_usage/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ SDK 使用说明
88
:maxdepth: 1
99

1010
quick_start
11+
profiler
1112

1213

1314
API Reference

docs/zh_cn/sdk_usage/profiler.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Pipeline 速度分析 (profiler)
2+
3+
sdk 提供 pipeline 各模块耗时统计功能,默认关闭,若要使用该功能,需要两个步骤:
4+
5+
- 生成性能数据
6+
- 分析性能数据
7+
8+
## 生成性能数据
9+
10+
以 C 接口,分类 pipeline 为例。在创建 pipeline 时需要使用带有 context 信息的接口,并在 context 中加入 profiler 信息。 详细代码如下。 正常运行 demo 会在当前目录生成 profiler 数据 `profiler_data.txt`
11+
12+
```c++
13+
#include <fstream>
14+
#include <opencv2/imgcodecs/imgcodecs.hpp>
15+
#include <string>
16+
17+
#include "mmdeploy/classifier.h"
18+
19+
int main(int argc, char* argv[]) {
20+
if (argc != 4) {
21+
fprintf(stderr, "usage:\n image_classification device_name dump_model_directory image_path\n");
22+
return 1;
23+
}
24+
auto device_name = argv[1];
25+
auto model_path = argv[2];
26+
auto image_path = argv[3];
27+
cv::Mat img = cv::imread(image_path);
28+
if (!img.data) {
29+
fprintf(stderr, "failed to load image: %s\n", image_path);
30+
return 1;
31+
}
32+
33+
mmdeploy_model_t model{};
34+
mmdeploy_model_create_by_path(model_path, &model);
35+
36+
// create profiler and add it to context
37+
// profiler data will save to profiler_data.txt
38+
mmdeploy_profiler_t profiler{};
39+
mmdeploy_profiler_create("profiler_data.txt", &profiler);
40+
41+
mmdeploy_context_t context{};
42+
mmdeploy_context_create_by_device(device_name, 0, &context);
43+
mmdeploy_context_add(context, MMDEPLOY_TYPE_PROFILER, nullptr, profiler);
44+
45+
mmdeploy_classifier_t classifier{};
46+
int status{};
47+
status = mmdeploy_classifier_create_v2(model, context, &classifier);
48+
if (status != MMDEPLOY_SUCCESS) {
49+
fprintf(stderr, "failed to create classifier, code: %d\n", (int)status);
50+
return 1;
51+
}
52+
53+
mmdeploy_mat_t mat{
54+
img.data, img.rows, img.cols, 3, MMDEPLOY_PIXEL_FORMAT_BGR, MMDEPLOY_DATA_TYPE_UINT8};
55+
56+
// inference loop
57+
for (int i = 0; i < 100; i++) {
58+
mmdeploy_classification_t* res{};
59+
int* res_count{};
60+
status = mmdeploy_classifier_apply(classifier, &mat, 1, &res, &res_count);
61+
62+
mmdeploy_classifier_release_result(res, res_count, 1);
63+
}
64+
65+
mmdeploy_classifier_destroy(classifier);
66+
67+
mmdeploy_model_destroy(model);
68+
mmdeploy_profiler_destroy(profiler);
69+
mmdeploy_context_destroy(context);
70+
71+
return 0;
72+
}
73+
74+
```
75+
76+
## 分析性能数据
77+
78+
使用脚本可对性能数据进行解析。
79+
80+
```bash
81+
python tools/sdk_analyze.py profiler_data.txt
82+
```
83+
84+
解析结果如下,其中 name 表示节点的名称,n_call表示调用的次数,t_mean 表示平均耗时,t_50% t_90% 表示耗时的百分位数。
85+
86+
```bash
87+
+---------------------------+--------+-------+--------+--------+-------+-------+
88+
| name | occupy | usage | n_call | t_mean | t_50% | t_90% |
89+
+===========================+========+=======+========+========+=======+=======+
90+
| ./Pipeline | - | - | 100 | 4.831 | 1.913 | 1.946 |
91+
+---------------------------+--------+-------+--------+--------+-------+-------+
92+
| Preprocess/Compose | - | - | 100 | 0.125 | 0.118 | 0.144 |
93+
+---------------------------+--------+-------+--------+--------+-------+-------+
94+
| LoadImageFromFile | 0.017 | 0.017 | 100 | 0.081 | 0.077 | 0.098 |
95+
+---------------------------+--------+-------+--------+--------+-------+-------+
96+
| Resize | 0.003 | 0.003 | 100 | 0.012 | 0.012 | 0.013 |
97+
+---------------------------+--------+-------+--------+--------+-------+-------+
98+
| CenterCrop | 0.002 | 0.002 | 100 | 0.008 | 0.008 | 0.008 |
99+
+---------------------------+--------+-------+--------+--------+-------+-------+
100+
| Normalize | 0.002 | 0.002 | 100 | 0.009 | 0.009 | 0.009 |
101+
+---------------------------+--------+-------+--------+--------+-------+-------+
102+
| ImageToTensor | 0.002 | 0.002 | 100 | 0.008 | 0.007 | 0.007 |
103+
+---------------------------+--------+-------+--------+--------+-------+-------+
104+
| Collect | 0.001 | 0.001 | 100 | 0.005 | 0.005 | 0.005 |
105+
+---------------------------+--------+-------+--------+--------+-------+-------+
106+
| resnet | 0.968 | 0.968 | 100 | 4.678 | 1.767 | 1.774 |
107+
+---------------------------+--------+-------+--------+--------+-------+-------+
108+
| postprocess | 0.003 | 0.003 | 100 | 0.015 | 0.015 | 0.017 |
109+
+---------------------------+--------+-------+--------+--------+-------+-------+
110+
```

0 commit comments

Comments
 (0)