Skip to content

Commit 1ad8071

Browse files
reyoungSuperjomn
authored andcommitted
Cherry pick infer doc (#11857)
* doc/inference api (#11332) * inference API init cn (#11731)
1 parent 653686c commit 1ad8071

File tree

3 files changed

+147
-2
lines changed

3 files changed

+147
-2
lines changed
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Inference High-level APIs
2+
This document describes the high-level inference APIs one can use to easily deploy a Paddle model for an application.
3+
4+
The APIs are described in `paddle_inference_api.h`, just one header file, and two libaries `libpaddle_fluid.so` and `libpaddle_fluid_api.so` are needed.
5+
6+
## PaddleTensor
7+
We provide the `PaddleTensor` data structure is to give a general tensor interface.
8+
9+
The definition is
10+
11+
```c++
12+
struct PaddleTensor {
13+
std::string name; // variable name.
14+
std::vector<int> shape;
15+
PaddleBuf data; // blob of data.
16+
PaddleDType dtype;
17+
};
18+
```
19+
20+
The data is stored in a continuous memory `PaddleBuf`, and tensor's data type is specified by a `PaddleDType`.
21+
The `name` field is used to specify the name of input variable,
22+
that is important when there are multiple inputs and need to distiuish which variable to set.
23+
24+
## engine
25+
The inference APIs has two different underlying implementation, currently there are two valid engines:
26+
27+
- the native engine, which is consists of the native operators and framework,
28+
- the Anakin engine, which is a Anakin library embeded.
29+
30+
The native engine takes a native Paddle model as input, and supports any model that trained by Paddle,
31+
but the Anakin engine can only take the Anakin model as input(user need to manully transform the format first) and currently not all Paddle models are supported.
32+
33+
```c++
34+
enum class PaddleEngineKind {
35+
kNative = 0, // Use the native Fluid facility.
36+
kAnakin, // Use Anakin for inference.
37+
};
38+
```
39+
40+
## PaddlePredictor and how to create one
41+
The main interface is `PaddlePredictor`, there are following methods
42+
43+
- `bool Run(const std::vector<PaddleTensor>& inputs, std::vector<PaddleTensor>* output_data)`
44+
- take inputs and output `output_data`
45+
- `Clone` to clone a predictor from an existing one, with model parameter shared.
46+
47+
There is a factory method to help create a predictor, and the user takes the ownership of this object.
48+
49+
```c++
50+
template <typename ConfigT, PaddleEngineKind engine = PaddleEngineKind::kNative>
51+
std::unique_ptr<PaddlePredictor> CreatePaddlePredictor(const ConfigT& config);
52+
```
53+
54+
By specifying the engine kind and config, one can get an specific implementation.
55+
56+
## Reference
57+
58+
- [paddle_inference_api.h](./paddle_inference_api.h)
59+
- [demos](./demo)
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Paddle 预测 API
2+
3+
为了更简单方便的预测部署,Fluid 提供了一套高层 API 用来隐藏底层不同的优化实现。
4+
5+
预测库包含:
6+
7+
- 头文件 `paddle_inference_api.h` 定义了所有的接口
8+
- 库文件`libpaddle_fluid.so``libpaddle_fluid.a`
9+
- 库文件 `libpaddle_inference_api.so``libpaddle_inference_api.a`
10+
11+
下面是详细的一些 API 概念介绍
12+
13+
## PaddleTensor
14+
15+
PaddleTensor 定义了预测最基本的输入输出的数据格式,其定义是
16+
17+
```c++
18+
struct PaddleTensor {
19+
std::string name; // variable name.
20+
std::vector<int> shape;
21+
PaddleBuf data; // blob of data.
22+
PaddleDType dtype;
23+
};
24+
```
25+
26+
- `name` 用于指定输入数据对应的 模型中variable 的名字 (暂时没有用,但会在后续支持任意 target 时启用)
27+
- `shape` 表示一个 Tensor 的 shape
28+
- `data` 数据以连续内存的方式存储在`PaddleBuf` 中,`PaddleBuf` 可以接收外面的数据或者独立`malloc`内存,详细可以参考头文件中相关定义。
29+
- `dtype` 表示 Tensor 的数据类型
30+
31+
## engine
32+
33+
高层 API 底层有多种优化实现,我们称之为 engine,目前有三种 engine
34+
35+
- 原生 engine,由 paddle 原生的 forward operator 组成,可以天然支持所有paddle 训练出的模型,
36+
- Anakin engine,封装了 [Anakin](https://github.com/PaddlePaddle/Anakin) ,在某些模型上性能不错,但只能接受自带模型格式,无法支持所有 paddle 模型,
37+
- TensorRT mixed engine,用子图的方式支持了 [TensorRT](https://developer.nvidia.com/tensorrt) ,支持所有paddle 模型,并自动切割部分计算子图到 TensorRT 上加速(WIP)
38+
39+
其实现为
40+
41+
```c++
42+
enum class PaddleEngineKind {
43+
kNative = 0, // Use the native Fluid facility.
44+
kAnakin, // Use Anakin for inference.
45+
kAutoMixedTensorRT // Automatically mixing TensorRT with the Fluid ops.
46+
};
47+
```
48+
49+
## 预测部署过程
50+
51+
总体上分为以下步骤
52+
53+
1. 用合适的配置创建 `PaddlePredictor`
54+
2. 创建输入用的 `PaddleTensor`,传入到 `PaddlePredictor`
55+
3. 获取输出的 `PaddleTensor` ,将结果取出
56+
57+
下面完整演示一个简单的模型,部分细节代码隐去
58+
59+
```c++
60+
#include "paddle_inference_api.h"
61+
62+
// 创建一个 config,并修改相关设置
63+
paddle::NativeConfig config;
64+
config.model_dir = "xxx";
65+
config.use_gpu = false;
66+
// 创建一个原生的 PaddlePredictor
67+
auto predictor =
68+
paddle::CreatePaddlePredictor<NativeConfig, PaddleEngineKind::kNative>(config);
69+
// 创建输入 tensor
70+
int64_t data[4] = {1, 2, 3, 4};
71+
paddle::PaddleTensor tensor{.name = "",
72+
.shape = std::vector<int>({4, 1}),
73+
.data = PaddleBuf(data, sizeof(data)),
74+
.dtype = PaddleDType::INT64};
75+
// 创建输出 tensor,输出 tensor 的内存可以复用
76+
std::vector<paddle::PaddleTensor> outputs;
77+
// 执行预测
78+
CHECK(predictor->Run(slots, &outputs));
79+
// 获取 outputs ...
80+
```
81+
82+
编译时,联编 `libpaddle_fluid.a/.so` 和 `libpaddle_inference_api.a/.so` 便可。
83+
84+
## 详细代码参考
85+
86+
- [inference demos](./demo)
87+
- [复杂单线程/多线程例子](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/contrib/inference/test_paddle_inference_api_impl.cc)

paddle/contrib/inference/paddle_inference_api.h

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,8 +109,7 @@ class PaddlePredictor {
109109

110110
// The common configs for all the predictors.
111111
struct Config {
112-
std::string model_dir; // path to the model directory.
113-
bool enable_engine{false}; // Enable to execute (part of) the model on
112+
std::string model_dir; // path to the model directory.
114113
};
115114
};
116115

0 commit comments

Comments
 (0)