Skip to content

Commit 8c5577a

Browse files
committed
doc: add documentation for accel subsystem
Add an introduction section for the accel subsystem. Most of the relevant data is in the DRM documentation, so the introduction only presents the why of the new subsystem, how are the compute accelerators exposed to user-space and what changes need to be done in a standard DRM driver to register it to the new accel subsystem. Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Jeffrey Hugo <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Acked-by: Thomas Zimmermann <[email protected]> Acked-by: Jacek Lawrynowicz <[email protected]> Tested-by: Jacek Lawrynowicz <[email protected]> Reviewed-by: Melissa Wen <[email protected]>
1 parent 7428ff7 commit 8c5577a

File tree

4 files changed

+129
-0
lines changed

4 files changed

+129
-0
lines changed

Documentation/accel/index.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
====================
4+
Compute Accelerators
5+
====================
6+
7+
.. toctree::
8+
:maxdepth: 1
9+
10+
introduction
11+
12+
.. only:: subproject and html
13+
14+
Indices
15+
=======
16+
17+
* :ref:`genindex`

Documentation/accel/introduction.rst

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
============
4+
Introduction
5+
============
6+
7+
The Linux compute accelerators subsystem is designed to expose compute
8+
accelerators in a common way to user-space and provide a common set of
9+
functionality.
10+
11+
These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
12+
Although these devices are typically designed to accelerate
13+
Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer
14+
is not limited to handling these types of accelerators.
15+
16+
Typically, a compute accelerator will belong to one of the following
17+
categories:
18+
19+
- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
20+
or an IP inside a SoC (e.g. laptop web camera). These devices
21+
are typically configured using registers and can work with or without DMA.
22+
23+
- Inference data-center - single/multi user devices in a large server. This
24+
type of device can be stand-alone or an IP inside a SoC or a GPU. It will
25+
have on-board DRAM (to hold the DL topology), DMA engines and
26+
command submission queues (either kernel or user-space queues).
27+
It might also have an MMU to manage multiple users and might also enable
28+
virtualization (SR-IOV) to support multiple VMs on the same device. In
29+
addition, these devices will usually have some tools, such as profiler and
30+
debugger.
31+
32+
- Training data-center - Similar to Inference data-center cards, but typically
33+
have more computational power and memory b/w (e.g. HBM) and will likely have
34+
a method of scaling-up/out, i.e. connecting to other training cards inside
35+
the server or in other servers, respectively.
36+
37+
All these devices typically have different runtime user-space software stacks,
38+
that are tailored-made to their h/w. In addition, they will also probably
39+
include a compiler to generate programs to their custom-made computational
40+
engines. Typically, the common layer in user-space will be the DL frameworks,
41+
such as PyTorch and TensorFlow.
42+
43+
Sharing code with DRM
44+
=====================
45+
46+
Because this type of devices can be an IP inside GPUs or have similar
47+
characteristics as those of GPUs, the accel subsystem will use the
48+
DRM subsystem's code and functionality. i.e. the accel core code will
49+
be part of the DRM subsystem and an accel device will be a new type of DRM
50+
device.
51+
52+
This will allow us to leverage the extensive DRM code-base and
53+
collaborate with DRM developers that have experience with this type of
54+
devices. In addition, new features that will be added for the accelerator
55+
drivers can be of use to GPU drivers as well.
56+
57+
Differentiation from GPUs
58+
=========================
59+
60+
Because we want to prevent the extensive user-space graphic software stack
61+
from trying to use an accelerator as a GPU, the compute accelerators will be
62+
differentiated from GPUs by using a new major number and new device char files.
63+
64+
Furthermore, the drivers will be located in a separate place in the kernel
65+
tree - drivers/accel/.
66+
67+
The accelerator devices will be exposed to the user space with the dedicated
68+
261 major number and will have the following convention:
69+
70+
- device char files - /dev/accel/accel*
71+
- sysfs - /sys/class/accel/accel*/
72+
- debugfs - /sys/kernel/debug/accel/accel*/
73+
74+
Getting Started
75+
===============
76+
77+
First, read the DRM documentation at Documentation/gpu/index.rst.
78+
Not only it will explain how to write a new DRM driver but it will also
79+
contain all the information on how to contribute, the Code Of Conduct and
80+
what is the coding style/documentation. All of that is the same for the
81+
accel subsystem.
82+
83+
Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.
84+
85+
To expose your device as an accelerator, two changes are needed to
86+
be done in your driver (as opposed to a standard DRM driver):
87+
88+
- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
89+
driver_features field. It is important to note that this driver feature is
90+
mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
91+
to expose both graphics and compute device char files should be handled by
92+
two drivers that are connected using the auxiliary bus framework.
93+
94+
- Change the open callback in your driver fops structure to accel_open().
95+
Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
96+
set the correct function operations pointers structure.
97+
98+
External References
99+
===================
100+
101+
email threads
102+
-------------
103+
104+
* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
105+
* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)
106+
107+
Conference talks
108+
----------------
109+
110+
* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022)

Documentation/subsystem-apis.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ needed).
4343
input/index
4444
hwmon/index
4545
gpu/index
46+
accel/index
4647
security/index
4748
sound/index
4849
crypto/index

MAINTAINERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6839,6 +6839,7 @@ L: [email protected]
68396839
S: Maintained
68406840
C: irc://irc.oftc.net/dri-devel
68416841
T: git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git
6842+
F: Documentation/accel/
68426843
F: drivers/accel/
68436844

68446845
DRM DRIVERS FOR ALLWINNER A10

0 commit comments

Comments
 (0)