|
1 | | -## High level interface to cuDNN functions |
2 | | -Deniz Yuret, Nov 6, 2020 |
| 1 | +# cuDNN.jl |
3 | 2 |
|
4 | | -The goal of the high-level interface is to map the low level cuDNN calls to more natural |
5 | | -Julia functions. Here are some design choices I followed: |
| 3 | +Julia wrapper for [NVIDIA cuDNN](https://developer.nvidia.com/cudnn), providing |
| 4 | +GPU-accelerated deep neural network primitives (convolutions, activations, normalization, |
| 5 | +RNNs, etc.). |
6 | 6 |
|
7 | | -**Naming:** We try to keep the same function, argument, and type names from the cuDNN |
8 | | -library in the high level interface. The wrappers for descriptors drop the `_t` suffix, |
9 | | -e.g. `cudnnPoolingDescriptor_t => cudnnPoolingDescriptor`. |
10 | | - |
11 | | -**Descriptors:** The cuDNN functions take data and operator descriptors. Most of these |
12 | | -descriptors are relatively fast to create (~500 ns for a cudnnTensorDescriptor) so they may |
13 | | -not be worth preallocating for the user but we provide keyword options anyway. We cache |
14 | | -descriptors (~100 ns) so we can use them as hash keys for memoization, which also saves a |
15 | | -bit of memory and speed. All descriptor fields are `isbits` types with the exception of the |
16 | | -`cudnnDropoutDescriptor` which points to a random number generator state and is used as a |
17 | | -field of some other descriptors. |
18 | | - |
19 | | -**Operator descriptors:** Descriptors such as `cudnnPoolingDescriptor` specify the options |
20 | | -for an operator such as stride and padding. For operators with descriptors we have one |
21 | | -method that takes keyword arguments with reasonable defaults to construct the descriptor and |
22 | | -another method that takes a pre-initialized descriptor as its last argument. This way a |
23 | | -casual user can call the first method without worrying about the descriptor format, only |
24 | | -specifying non-default options, whereas a layer architect can keep a preset descriptor in |
25 | | -the layer that gets passed to the function using the second method. We try to use generic |
26 | | -Julia types for keyword arguments that specify default descriptor fields and convert these |
27 | | -to the appropriate cudnn types during descriptor construction. |
28 | | - |
29 | | -**Output arrays:** The low level cuDNN functions take pre-allocated output arrays. The high |
30 | | -level interface has one Julia function that allocates its own output array |
31 | | -(e.g. `cudnnPoolingForward`) and another with an exclamation mark that takes a pre-allocated |
32 | | -output array as its first argument (e.g. `cudnnPoolingForward!`). |
33 | | - |
34 | | -**Methods:** Each cuDNN forward function may have up to four methods depending on whether |
35 | | -the descriptor and the output array are specified: |
36 | | - |
37 | | - cudnnPoolingForward(x; kwargs...) |
38 | | - cudnnPoolingForward(x, d::cudnnPoolingDescriptor; kwargs...) |
39 | | - cudnnPoolingForward!(y, x; kwargs...) |
40 | | - cudnnPoolingForward!(y, x, d::cudnnPoolingDescriptor; kwargs...) |
41 | | - |
42 | | -The conventional order of arguments for these public methods is: |
43 | | - |
44 | | - ([output], weights, inputs, [descriptor]; kwargs...) |
45 | | - |
46 | | -**AD method:** Neither the high level nor the low level interface is sometimes |
47 | | -appropriate for gradient definitions, e.g. the low level API may not return a value, the |
48 | | -high level API may have some gradient target parameters as keyword arguments. To solve this |
49 | | -issue the API exposes an intermediate function with an AD suffix, |
50 | | -e.g. `cudnnPoolingForwardAD`, that is called by the high level method and that makes |
51 | | -the low level library call. These methods may not seem like they are doing anything useful, |
52 | | -but they should not be removed so automatic gradient packages may make use of them. |
53 | | - |
54 | | -**Backward functions:** The point of a high level interface is to give the user appropriate |
55 | | -defaults for the many options of typical cudnn functions. Backward functions do not have |
56 | | -meaningful defaults because they need to copy their options from the corresponding forward |
57 | | -function. Therefore we do not need high level APIs for backward functions unless they are |
58 | | -useful in some other way. See Knet/src/cudnn for example uses. |
59 | | - |
60 | | -**Types:** Do not specify types for array arguments. Leave the high level functions generic |
61 | | -so they can be called with CuArray, KnetArray, AutoGrad.Param etc. Types can and should be |
62 | | -specified for non-array arguments. In the API we use `nothing` to indicate unspecified array |
63 | | -argument values, convert these to `C_NULL` or `CU_NULL` as appropriate only at the low-level |
64 | | -call. Similarly for numbers the API should accept generic types like `Integer` or `Real` and |
65 | | -convert these to the appropriate specific type, e.g. `Cint` or `Cdouble` only at the |
66 | | -low-level call. |
67 | | - |
68 | | -**Workspace:** Some functions need a temporary allocated workspace whose required size is |
69 | | -determined by another cudnn call. Unfortunately, the required size may depend on factors |
70 | | -other than the current inputs (see [this |
71 | | -issue](https://github.com/FluxML/Flux.jl/issues/923#issuecomment-558671966)), so the usage |
72 | | -of the `@workspace` macro is used at a point as close to the library call as possible. One |
73 | | -exception to this is cases where the same workspace will be passed to the backward call, in |
74 | | -which case we allocate a regular CuArray. |
75 | | - |
76 | | -**Training vs Inference:** There is no consistent way cuDNN distinguishes training vs inference calls: |
77 | | -* BatchNormalization and Normalization have two separate functions: `cudnnNormalizationForwardTraining / Inference` |
78 | | -* RNN has an indicator argument: `fwdMode` in `cudnnRNNForward` |
79 | | -* MultiHeadAttn looks at the `reserveSpace` argument to decide: if `NULL` inference mode, otherwise training mode |
80 | | -* Dropout always runs in training mode with a non-NULL `reserveSpace` (it doesn't make sense in inference mode) |
81 | | -* Activation, convolution, pooling, softmax, optensor, addtensor, reducetensor do not make a distinction between the two modes |
82 | | - |
83 | | -In the high level API we assume inference by default and let the gradient packages override when necessary. |
84 | | -See the gradient implementations in Knet/src/cudnn for examples. |
85 | | - |
86 | | -**TODO:** |
87 | | -* Keyword arg descriptor constructors. |
88 | | -* Test forw fns with descriptors: check for desc vs kwarg incompatibility. |
89 | | -* Find out about cudnnRNNSetClip_v8. |
90 | | -* Test with Knet.Ops20. |
91 | | -* Command used to test: julia17 --project -e 'using Pkg; Pkg.API.test(; test_args=`--memcheck --jobs=1 cudnn`)' |
| 7 | +This package is part of the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) ecosystem. |
0 commit comments