Skip to content

Commit b1857ae

Browse files
author
Kelvin Cao
committed
Merge branch 'master' into backport_4.4_to_4.7
2 parents c667398 + d705e19 commit b1857ae

File tree

2 files changed

+138
-4
lines changed

2 files changed

+138
-4
lines changed

Documentation/design.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Switchtec Kernel Design Documentation
2+
3+
This document aims to provide a jumping off point to working with the
4+
kernel code for the switchtec driver. It describes some core concepts
5+
and landmarks to help get started hacking on the code. This document
6+
may not stay up to date so when in doubt, consult the code.
7+
8+
The Switchtec kernel module is divided into two parts: switchtec.ko and
9+
ntb_hw_switchtec.ko. The former enumerates management and NTB endpoints,
10+
configures them, and provides the interface to switchtec-user. The later
11+
provides a driver for the Linux NTB stack. ntb_hw_switchtec.ko depends on
12+
switchtec.ko.
13+
14+
## switchtec.ko
15+
16+
The main Switchtec driver enumerates the devices in the standard way
17+
for Linux (how that is done is not covered in this document, for more
18+
information on Linux Driver implementations refer to [LDD3][1] or the
19+
Kernel source code).
20+
21+
### Userspace Interface
22+
23+
Refer to the README file or switchtec_ioclt.h for more information on
24+
how the userspace interface is defined. The kernel module creates a
25+
character device for each switch that was enumerated. Reading and
26+
writing this device allows for creating MRPC commands and a few IOCTLs
27+
are provided so userspace does not have to directly access the GAS
28+
(which requires full root permission and has security and stability
29+
implications). For the implementation of these commands refer to
30+
switchtec_fops in switchtec.c.
31+
32+
Whenever a userspace application opens a switchtec char device, the
33+
kernel creates a switchtec_user structure. This structure is used for
34+
queueing MRPC commands so each application can have one MRPC command in
35+
flight at a time and the kernel will arbitrate between the applications
36+
on a first in first out basis.
37+
38+
When the application does a write, the kernel will queue the data to be
39+
sent to the firmware. If the queue is empty, it will immediately submit
40+
the command (see mrpc_queue_cmd). A read command will store how much data
41+
is to be read and block until the command has been completed. An event
42+
interrupt indicates when the command is completed and the kernel will
43+
read the output data and store it in the switchtec_user structure (see
44+
mrpc_complete_cmd). If the read command has not yet set how much output
45+
data is expected the kernel will read all of the data into the buffer
46+
(which may be slower than expected). Once the data is read the completion
47+
in switchtec_user will signal the read command to return the data
48+
to userspace.
49+
50+
In case something unexpected happens the kernel has a timeout on all
51+
MRPC commands (see mrpc_timeout_work). Usually the interrupt will occur
52+
before the timeout but if it is missed the timeout will prevent the
53+
queue from being hung. Note: however if the firmware never indicates the
54+
command is complete this will still hang the queue.
55+
56+
### Interrupts
57+
58+
The driver sets up space for up to four MSI-X or MSI interrupts but only
59+
registers a handler for the event interrupt as designated by the
60+
vep_vector_number in the GAS region. The NTB module will also register
61+
another interrupt handler for the doorbell and message vector.
62+
63+
The event interrupt (switchtec_event_isr) first checks if the MRPC event
64+
occurred and queues mrpc_work which will call mrpc_complete_cmd. It will
65+
then clear the EVENT_OCCURRED bit so the interrupt doesn't continue to
66+
trigger.
67+
68+
Next, the interrupt will check all the link state events in all the
69+
ports and signal a link_notifier (typically used by the NTB driver)
70+
if such an event occurs.
71+
72+
Finally, the interrupt will check all other event interrupts. If
73+
an event interrupt occurs it wakes up any process that is polling
74+
on events (see switchtec_dev_poll). It then disables the interrupt
75+
for that event. In this way, it is expected that an application will
76+
enable the interrupt it's waiting for, then call poll in a loop
77+
checking for if the expected interrupt occurs. poll will return anytime
78+
any event occurs.
79+
80+
### IOCTLs
81+
82+
A number of IOCTLs are provided for a number of functions needed by
83+
switchtec-user. See the README for a description of these IOCTLs and
84+
switchtec_dev_ioctl for their implementation.
85+
86+
### Sysfs
87+
88+
There are a number of sysfs attributes provided so that userspace can
89+
easily enumerate and discover the available switchtec devices. The
90+
attributes in the system can easily by browsed in sysfs under
91+
/sys/class/switchtec.
92+
93+
These attributes are documented in Documentation/ABI/sysfs-class-switchtec.
94+
See switchtec_device_attrs in switchtec.c for their implementation.
95+
96+
## ntb_hw_switchtec.ko
97+
98+
The ntb_hw_switchtec enumerates all devices in the switchtec class
99+
and creates NTB interfaces for any devices that are NTB endpoints.
100+
See switchtec_ntb_ops for the implementation of all the NTB operations.
101+
102+
### Shared Memory Window
103+
104+
The Switchtec NTB driver reserves one of the LUT memory windows so it
105+
can be used to provide scratch pad registers and link detection. For
106+
now, the driver sets the size of all LUT windows to be fixed at 64KB.
107+
This size allows for the combined size of all LUT windows to be
108+
sufficent enough that the alignment of the direct window that follows
109+
will be at least 2MB.
110+
111+
### Link Management
112+
113+
The link is considered to be up when both sides have setup their shared
114+
memory window and a magic number and link status must be read by both
115+
sides to realize that the link is up. When either side changes their
116+
link status, a specific message is sent telling the otherside to check
117+
the current link state. The link state is also checked whenever the
118+
switch sends a link state change interrupt.
119+
120+
### Memory windows
121+
122+
By default, the driver only provides direct memory windows to the
123+
upper layers. This is because the existing upper layers can get confused
124+
by a large number of LUT memory windows. The LUT memory windows can be
125+
enabled with the use_lut_mws parameter.
126+
127+
### Crosslink
128+
129+
The crosslink feature allows for an NTB system to be entirely symmetric
130+
such that two hosts can be identical and interchangeable. To do this a
131+
special hostless partition is created in the middle of the two hosts.
132+
This is supported by the driver and only requires a special initialization
133+
procedure (see switchtec_ntb_init_crosslink). Crosslink also reserves another
134+
one of the LUT windows to be used to window the NTB register space inside
135+
the crosslink partition. Besides this, all other NTB operations function
136+
identically to regular NTB.
137+
138+
[1]: https://lwn.net/Kernel/LDD3/

switchtec.c

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -144,10 +144,6 @@ static void mrpc_cmd_submit(struct switchtec_dev *stdev)
144144
stuser->data, stuser->data_len);
145145
iowrite32(stuser->cmd, &stdev->mmio_mrpc->cmd);
146146

147-
stuser->status = ioread32(&stdev->mmio_mrpc->status);
148-
if (stuser->status != SWITCHTEC_MRPC_STATUS_INPROGRESS)
149-
mrpc_complete_cmd(stdev);
150-
151147
schedule_delayed_work(&stdev->mrpc_timeout,
152148
msecs_to_jiffies(500));
153149
}

0 commit comments

Comments
 (0)