|
| 1 | +## Add Kernels for a New Device |
| 2 | + |
| 3 | +### Background |
| 4 | + |
| 5 | +PaddlePaddle Fluid have hundreds of operators. Each operator could have one or more kernels. A kernel is an implementation of the operator for a certain device, which could be a hardware device, e.g., the CUDA GPU, or a library that utilizes a device, e.g., Intel MKL that makes full use of the Xeon CPU. |
| 6 | + |
| 7 | +[This document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md) explains how to add an operator, and its kernels. The kernels of an operator are indexed by a C++ type [`OpKernelType`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md). An operator chooses the right kernel at runtime. This choosing mechanism is described [here](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md). |
| 8 | + |
| 9 | +### Write Kernels for A New Device |
| 10 | + |
| 11 | +#### Add A New Device |
| 12 | + |
| 13 | + For some historical reaons, we misuse the word *library* for *device*. For example, we call the deivce type by *library type*. An example is the header file [`library_type.h`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/library_type.h#L24). We will correct this ASAP. |
| 14 | + |
| 15 | +To register a new device, we need to add an enum value to `LibraryType`: |
| 16 | + |
| 17 | +``` |
| 18 | +enum class LibraryType { |
| 19 | + kPlain = 0, |
| 20 | + kMKLDNN = 1, |
| 21 | + kCUDNN = 2, |
| 22 | +}; |
| 23 | +``` |
| 24 | + |
| 25 | + |
| 26 | +#### Add A New [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L53) |
| 27 | + |
| 28 | +If you have a new kind of Device, firstly you need to add a new kind of [`Place`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L53). For example `CUDAPlace`: |
| 29 | + |
| 30 | +```cpp |
| 31 | +struct CUDAPlace { |
| 32 | + CUDAPlace() : CUDAPlace(0) {} |
| 33 | + explicit CUDAPlace(int d) : device(d) {} |
| 34 | + |
| 35 | + inline int GetDeviceId() const { return device; } |
| 36 | + // needed for variant equality comparison |
| 37 | + inline bool operator==(const CUDAPlace &o) const { |
| 38 | + return device == o.device; |
| 39 | + } |
| 40 | + inline bool operator!=(const CUDAPlace &o) const { return !(*this == o); } |
| 41 | + |
| 42 | + int device; |
| 43 | +}; |
| 44 | + |
| 45 | +typedef boost::variant<CUDAPlace, CPUPlace> Place; |
| 46 | +``` |
| 47 | +
|
| 48 | +#### Add [device context]((https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L37)) |
| 49 | +After a new kind of Device is added, you should add a corresponding [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L37) for it. |
| 50 | +
|
| 51 | +```cpp |
| 52 | +class DeviceContext { |
| 53 | + public: |
| 54 | + virtual ~DeviceContext() {} |
| 55 | + virtual Place GetPlace() const = 0; |
| 56 | +
|
| 57 | + virtual void Wait() const {} |
| 58 | +}; |
| 59 | +``` |
| 60 | + |
| 61 | +#### Implement new [OpKernel](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L351) for your Device. |
| 62 | + |
| 63 | +A detailed documentation can be found in [`new_op_and_kernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md) |
| 64 | + |
| 65 | +```cpp |
| 66 | +class OpKernelBase { |
| 67 | + public: |
| 68 | + /** |
| 69 | + * ExecutionContext is the only parameter of Kernel Run function. |
| 70 | + * Run will get input/output variables, state such as momentum and |
| 71 | + * device resource such as CUDA stream, cublas handle, etc. from |
| 72 | + * ExecutionContext. User should construct it before run the Operator. |
| 73 | + */ |
| 74 | + |
| 75 | + virtual void Compute(const ExecutionContext& context) const = 0; |
| 76 | + |
| 77 | + virtual ~OpKernelBase() = default; |
| 78 | +}; |
| 79 | + |
| 80 | +template <typename T> |
| 81 | +class OpKernel : public OpKernelBase { |
| 82 | + public: |
| 83 | + using ELEMENT_TYPE = T; |
| 84 | +}; |
| 85 | +``` |
| 86 | +
|
| 87 | +
|
| 88 | +#### Register the OpKernel to framework |
| 89 | +
|
| 90 | +After writing the components described above, we should register the kernel to the framework. |
| 91 | +
|
| 92 | +We use `REGISTER_OP_KERNEL` to do the registration. |
| 93 | +
|
| 94 | +```cpp |
| 95 | +REGISTER_OP_KERNEL( |
| 96 | + op_type, |
| 97 | + library_type, |
| 98 | + place_type, |
| 99 | + kernel0, kernel1, ...) |
| 100 | +``` |
| 101 | + |
| 102 | +kernel0, kernel1 are kernels that have the same `op_type`, `library_type`, `place_type` but different `data_types`. |
| 103 | + |
| 104 | +take [`conv2d`]((https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/conv_cudnn_op.cu.cc#L318)) as an example: |
| 105 | + |
| 106 | + ```cpp |
| 107 | + REGISTER_OP_KERNEL(conv2d, CPU, paddle::platform::CPUPlace, |
| 108 | + paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, float>, |
| 109 | + paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, double>); |
| 110 | + |
| 111 | + REGISTER_OP_KERNEL(conv2d, CUDNN, ::paddle::platform::CUDAPlace, |
| 112 | + paddle::operators::CUDNNConvOpKernel<float>, |
| 113 | + paddle::operators::CUDNNConvOpKernel<double>); |
| 114 | + ``` |
| 115 | + |
| 116 | +In the code above: |
| 117 | + |
| 118 | + - `conv2d` is the type/name of the operator |
| 119 | + - `CUDNN/CPU` is `library` |
| 120 | + - `paddle::platform::CUDAPlace/CPUPlace` is `place` |
| 121 | + - template parameter `float/double` on `CUDNNConvOpKernel<T>` is `data_type`. |
0 commit comments