|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 |
| 2 | +
|
| 3 | +============ |
| 4 | +Introduction |
| 5 | +============ |
| 6 | + |
| 7 | +The Linux compute accelerators subsystem is designed to expose compute |
| 8 | +accelerators in a common way to user-space and provide a common set of |
| 9 | +functionality. |
| 10 | + |
| 11 | +These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. |
| 12 | +Although these devices are typically designed to accelerate |
| 13 | +Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer |
| 14 | +is not limited to handling these types of accelerators. |
| 15 | + |
| 16 | +Typically, a compute accelerator will belong to one of the following |
| 17 | +categories: |
| 18 | + |
| 19 | +- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, |
| 20 | + or an IP inside a SoC (e.g. laptop web camera). These devices |
| 21 | + are typically configured using registers and can work with or without DMA. |
| 22 | + |
| 23 | +- Inference data-center - single/multi user devices in a large server. This |
| 24 | + type of device can be stand-alone or an IP inside a SoC or a GPU. It will |
| 25 | + have on-board DRAM (to hold the DL topology), DMA engines and |
| 26 | + command submission queues (either kernel or user-space queues). |
| 27 | + It might also have an MMU to manage multiple users and might also enable |
| 28 | + virtualization (SR-IOV) to support multiple VMs on the same device. In |
| 29 | + addition, these devices will usually have some tools, such as profiler and |
| 30 | + debugger. |
| 31 | + |
| 32 | +- Training data-center - Similar to Inference data-center cards, but typically |
| 33 | + have more computational power and memory b/w (e.g. HBM) and will likely have |
| 34 | + a method of scaling-up/out, i.e. connecting to other training cards inside |
| 35 | + the server or in other servers, respectively. |
| 36 | + |
| 37 | +All these devices typically have different runtime user-space software stacks, |
| 38 | +that are tailored-made to their h/w. In addition, they will also probably |
| 39 | +include a compiler to generate programs to their custom-made computational |
| 40 | +engines. Typically, the common layer in user-space will be the DL frameworks, |
| 41 | +such as PyTorch and TensorFlow. |
| 42 | + |
| 43 | +Sharing code with DRM |
| 44 | +===================== |
| 45 | + |
| 46 | +Because this type of devices can be an IP inside GPUs or have similar |
| 47 | +characteristics as those of GPUs, the accel subsystem will use the |
| 48 | +DRM subsystem's code and functionality. i.e. the accel core code will |
| 49 | +be part of the DRM subsystem and an accel device will be a new type of DRM |
| 50 | +device. |
| 51 | + |
| 52 | +This will allow us to leverage the extensive DRM code-base and |
| 53 | +collaborate with DRM developers that have experience with this type of |
| 54 | +devices. In addition, new features that will be added for the accelerator |
| 55 | +drivers can be of use to GPU drivers as well. |
| 56 | + |
| 57 | +Differentiation from GPUs |
| 58 | +========================= |
| 59 | + |
| 60 | +Because we want to prevent the extensive user-space graphic software stack |
| 61 | +from trying to use an accelerator as a GPU, the compute accelerators will be |
| 62 | +differentiated from GPUs by using a new major number and new device char files. |
| 63 | + |
| 64 | +Furthermore, the drivers will be located in a separate place in the kernel |
| 65 | +tree - drivers/accel/. |
| 66 | + |
| 67 | +The accelerator devices will be exposed to the user space with the dedicated |
| 68 | +261 major number and will have the following convention: |
| 69 | + |
| 70 | +- device char files - /dev/accel/accel* |
| 71 | +- sysfs - /sys/class/accel/accel*/ |
| 72 | +- debugfs - /sys/kernel/debug/accel/accel*/ |
| 73 | + |
| 74 | +Getting Started |
| 75 | +=============== |
| 76 | + |
| 77 | +First, read the DRM documentation at Documentation/gpu/index.rst. |
| 78 | +Not only it will explain how to write a new DRM driver but it will also |
| 79 | +contain all the information on how to contribute, the Code Of Conduct and |
| 80 | +what is the coding style/documentation. All of that is the same for the |
| 81 | +accel subsystem. |
| 82 | + |
| 83 | +Second, make sure the kernel is configured with CONFIG_DRM_ACCEL. |
| 84 | + |
| 85 | +To expose your device as an accelerator, two changes are needed to |
| 86 | +be done in your driver (as opposed to a standard DRM driver): |
| 87 | + |
| 88 | +- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's |
| 89 | + driver_features field. It is important to note that this driver feature is |
| 90 | + mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want |
| 91 | + to expose both graphics and compute device char files should be handled by |
| 92 | + two drivers that are connected using the auxiliary bus framework. |
| 93 | + |
| 94 | +- Change the open callback in your driver fops structure to accel_open(). |
| 95 | + Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily |
| 96 | + set the correct function operations pointers structure. |
| 97 | + |
| 98 | +External References |
| 99 | +=================== |
| 100 | + |
| 101 | +email threads |
| 102 | +------------- |
| 103 | + |
| 104 | +* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022) |
| 105 | +* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022) |
| 106 | + |
| 107 | +Conference talks |
| 108 | +---------------- |
| 109 | + |
| 110 | +* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022) |
0 commit comments