|
| 1 | +# oneDNN |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +[oneDNN](https://github.com/oneapi-src/oneDNN) is the open source cross-platform performance acceleration library for deep learning from Intel, The [documentation](https://oneapi-src.github.io/oneDNN/) guides you to find out which primitives are supported. OneDNN has been integrated into DeepRec, which can be enabled by adding the compiling option in the compile command. `--config=mkl_threadpool` is used to enable oneDNN accelerated arithmetic computation. Adding the compiling option `--config=opt` will enable the optimization of `--copt=-march=native`, which can further accelerate arithmetic performance on the CPU which supports AVX512, for example, Skylake, Caslake and Icelake. |
| 6 | + |
| 7 | + |
| 8 | + |
| 9 | +Tips: MKL was first renamed as DNNL and then renamed as oneDNN. Tensorflow initially used MKL to accelerate the computation of the operators, and in subsequent versions of iteration, oneDNN gradually take the place of MKL, but the macro definitions were still retained. |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | +Macro definition of oneDNN in DeepRec: |
| 14 | + |
| 15 | +| Macro Definition | Values(Bold for Default) | Explanation | |
| 16 | +| :------------------------------- | --------------------------------------------- | ------------------------------------------------------------ | |
| 17 | +| TF_MKL_PRIMITIVE_ONLY_FOR_RECO | **1/true**, 0/false | 1: Only replace the [operators](https://github.com/alibaba/DeepRec/blob/main/tensorflow/core/graph/mkl_layout_pass.cc#L824-L840) which supported by oneDNN in recommendation models; 0: Replace all of the operators to that supported by oneDNN. | |
| 18 | +| TF_MKL_OPTIMIZE_PRIMITIVE_MEMUSE | **1/true**, 0/false | 1: Reduce the use of main memory by releasing the primitives; 0: Don't release primitives. | |
| 19 | +| TF_DISABLE_MKL | **0**, 1 | 0: Enable MKL; 1: Disable MKL | |
| 20 | +| TF_MKL_NUM_INTRAOP | Integer, such as 14 ,**Not set by default** | Integer:set the number of intra threads used by oneDNN;Not set:number of TF intra threads used most. | |
| 21 | +| ONEDNN_VERBOSE | **0**/1/2 | Print the [level](https://oneapi-src.github.io/oneDNN/dev_guide_verbose.html) of log output by oneDNN primitive. | |
| 22 | +| DNNL_MAX_CPU_ISA | **ALL**, AVX512_CORE_AMX, AVX512_CORE_BF16, … | The[ highest ISA](https://oneapi-src.github.io/oneDNN/v2.4/dev_guide_cpu_dispatcher_control.html#run-time-controls) used by oneDNN (for versions less than 2.5.0) | |
| 23 | +| ONEDNN_MAX_CPU_ISA | **ALL**, AVX512_CORE_AMX, AVX512_CORE_BF16, … | The [highest ISA](https://oneapi-src.github.io/oneDNN/v2.4/dev_guide_cpu_dispatcher_control.html#run-time-controlsused) by oneDNN (for versions more than or equal to 2.5.0) | |
| 24 | + |
| 25 | +Primitives supported by oneDNN: |
| 26 | + |
| 27 | +| Primitive | Available Types | Available Backward Operations | |
| 28 | +| -------------------------------------- | --------------------------- | --------------------------------- | |
| 29 | +| Matrix Multiplication | f32, bf16, f16, u8, s8 | Scale, Zero, Eltwise, Sum, Binary | |
| 30 | +| Inner Product | f32, bf16, f16, u8, s8 | Scale, Eltwise, Sum, Binary | |
| 31 | +| Layer Normalization | f32, bf16, f16 | / | |
| 32 | +| Batch Normalization | f32, bf16, f16, s8 | Eltwise | |
| 33 | +| Local Response Normalization (LRN) | f32, bf16, f16 | / | |
| 34 | +| Binary (+, =, *, /, >, <, min, max...) | f32, bf16, f16, u8, s8 | Scale, Eltwise, Sum, Binary | |
| 35 | +| Eltwise (relu, gelu, tanh, linear...) | f32, s32, bf16, f16, u8, s8 | Binary | |
| 36 | +| PReLU | f32, s32, bf16, s8, u8 | / | |
| 37 | +| Sum | f32, s32, bf16, f16, u8, s8 | / | |
| 38 | +| Reduction | f32, bf16, u8, s8 | Eltwise, Sum, Binary | |
| 39 | +| Softmax | f32, bf16, f16 | / | |
| 40 | +| LogSoftmax | f32, bf16 | / | |
| 41 | +| Reorder | f32, s32, bf16, f16, u8, s8 | Scale, Sum | |
| 42 | +| Concat | f32, s32, bf16, f16, u8, s8 | / | |
| 43 | +| Convolution | f32, bf16, f16, u8, s8 | Scale, Zero, Eltwise, Sum, Binary | |
| 44 | +| Pooling | f32, s32, bf16, f16, u8, s8 | Binary | |
| 45 | +| RNN (LSTM, GRU, Vanilla RNN...) | f32, bf16, f16, u8, s8 | / | |
| 46 | +| Resampling | f32, s32, bf16, f16, s8, u8 | Eltwise, Sum, Binary | |
| 47 | +| Shuffle | f32, s32, bf16, s8, u8 | / | |
| 48 | + |
0 commit comments