Skip to content

Commit 0805725

Browse files
committed
Merge tag 'drm-habanalabs-next-2023-10-10' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
This tag contains habanalabs driver changes for v6.7. The notable changes are: - uAPI changes: - Expose tsc clock sampling to better sync clock information in profiler. - Enhance engine error reporting in the info ioctl. - Block access to the eventfd operations through the control device. - Disable the option of the user to register multiple times with the same offset for timestamp dump by the driver. If a user wants to use the same offset in the timestamp buffer for different interrupt, it needs to first de-register the offset. - When exporting dma-buf (for p2p), force the user to specify size/offset in multiples of PAGE_SIZE. This is instead of the driver doing the rounding to PAGE_SIZE, which has caused the driver to map more memory than was intended by the user. - New features and improvements: - Complete the move of the driver to the accel subsystem by removing the custom habanalabs class and major and registering to accel subsystem. - Move the firmware interface files to include/linux/habanalabs. This is a pre-requisite for upstreaming the NIC drivers of Gaudi (as they need to include those files). - Perform device hard-reset upon PCIe AXI drain event to prevent the failure from cascading to different IP blocks in the SoC. In secured environments, this is done automatically by the firmware. - Print device name when it is removed for better debuggability. - Add support for trace of dma map sgtable operations. - Optimize handling of user interrupts by splitting the interrupts to two lists. One list for fast handling and second list for handling with timestamp recording, which is slower. - Prevent double device hard-reset due to 2 adjacent H/W events. - Set device status 'malfunction' while in rmmod. - Firmware related fixes: - Extend preboot timeout because preboot loading might take longer than expected in certain cases. - Add a protection mechanism for the Event Queue. In case it is full, the firmware will be able to notify about it through a dedicated interrupt. - Perform device hard-reset in case scrubbing of memory has failed. - Bug fixes and code cleanups: - Small fixes of dma-buf handling in Gaudi2, such as handling an offset != 0, using the correct exported size, creation of sg table. - Fix spmu mask creation. - Fix bug in wait for cs completion for decoder workloads. - Cleanup Greco name from documentation. - Fix bug in recording timestamp during cs completion interrupt handling. - Fix CoreSight ETF configuration and flush logic. - Fix small bug in hpriv_list handling (the list that contains the private data per process that opens our device). Signed-off-by: Dave Airlie <[email protected]> # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCgAdFiEE7TEboABC71LctBLFZR1NuKta54AFAmUlHoQACgkQZR1NuKta # 54DsXQf8CW+W4iWJf5UDTj/E/giu9rVRrsUsU0hhCcXbecIxRsLObYXtulENu5/u # VuEAo/tAvo0LUKi8pdIv6ernDKaxZ1+fimlfXMCzllAA/ts3yp1NgunprsIsx3tv # YgcJ2GNR8UlVZ1qYuZl+4dOTyD0yfRMROUXBe7wqKnUXOEepOiLBxq6W15tZiJnx # L+V0yGkNk6pAoADIXLW9EgEXiN/bJZCXGPWp06i/Nz7cHIHJGoV59wAqftqllCtk # 8ZMkLByjlQKPhc5AgWBtKE8EGVip3sm7b/Q2Gq0ZXdZiebyVJ+AjuuDOdtq1UCIw # Rcp2576E7rByIBu3RAFlrioWhuR5Zw== # =2ien # -----END PGP SIGNATURE----- # gpg: Signature made Tue 10 Oct 2023 19:51:00 AEST # gpg: using RSA key ED311BA00042EF52DCB412C5651D4DB8AB5AE780 # gpg: Can't check signature: No public key From: Oded Gabbay <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2 parents 614351f + 4db74c0 commit 0805725

32 files changed

+1919
-1063
lines changed

Documentation/ABI/testing/debugfs-driver-habanalabs

Lines changed: 41 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
What: /sys/kernel/debug/habanalabs/hl<n>/addr
1+
What: /sys/kernel/debug/accel/<n>/addr
22
Date: Jan 2019
33
KernelVersion: 5.1
44
@@ -8,34 +8,34 @@ Description: Sets the device address to be used for read or write through
88
only when the IOMMU is disabled.
99
The acceptable value is a string that starts with "0x"
1010

11-
What: /sys/kernel/debug/habanalabs/hl<n>/clk_gate
11+
What: /sys/kernel/debug/accel/<n>/clk_gate
1212
Date: May 2020
1313
KernelVersion: 5.8
1414
1515
Description: This setting is now deprecated as clock gating is handled solely by the f/w
1616

17-
What: /sys/kernel/debug/habanalabs/hl<n>/command_buffers
17+
What: /sys/kernel/debug/accel/<n>/command_buffers
1818
Date: Jan 2019
1919
KernelVersion: 5.1
2020
2121
Description: Displays a list with information about the currently allocated
2222
command buffers
2323

24-
What: /sys/kernel/debug/habanalabs/hl<n>/command_submission
24+
What: /sys/kernel/debug/accel/<n>/command_submission
2525
Date: Jan 2019
2626
KernelVersion: 5.1
2727
2828
Description: Displays a list with information about the currently active
2929
command submissions
3030

31-
What: /sys/kernel/debug/habanalabs/hl<n>/command_submission_jobs
31+
What: /sys/kernel/debug/accel/<n>/command_submission_jobs
3232
Date: Jan 2019
3333
KernelVersion: 5.1
3434
3535
Description: Displays a list with detailed information about each JOB (CB) of
3636
each active command submission
3737

38-
What: /sys/kernel/debug/habanalabs/hl<n>/data32
38+
What: /sys/kernel/debug/accel/<n>/data32
3939
Date: Jan 2019
4040
KernelVersion: 5.1
4141
@@ -50,7 +50,7 @@ Description: Allows the root user to read or write directly through the
5050
If the IOMMU is disabled, it also allows the root user to read
5151
or write from the host a device VA of a host mapped memory
5252

53-
What: /sys/kernel/debug/habanalabs/hl<n>/data64
53+
What: /sys/kernel/debug/accel/<n>/data64
5454
Date: Jan 2020
5555
KernelVersion: 5.6
5656
@@ -65,7 +65,7 @@ Description: Allows the root user to read or write 64 bit data directly
6565
If the IOMMU is disabled, it also allows the root user to read
6666
or write from the host a device VA of a host mapped memory
6767

68-
What: /sys/kernel/debug/habanalabs/hl<n>/data_dma
68+
What: /sys/kernel/debug/accel/<n>/data_dma
6969
Date: Apr 2021
7070
KernelVersion: 5.13
7171
@@ -79,26 +79,26 @@ Description: Allows the root user to read from the device's internal
7979
a very long time.
8080
This interface doesn't support concurrency in the same device.
8181
In GAUDI and GOYA, this action can cause undefined behavior
82-
in case the it is done while the device is executing user
82+
in case it is done while the device is executing user
8383
workloads.
8484
Only supported on GAUDI at this stage.
8585

86-
What: /sys/kernel/debug/habanalabs/hl<n>/device
86+
What: /sys/kernel/debug/accel/<n>/device
8787
Date: Jan 2019
8888
KernelVersion: 5.1
8989
9090
Description: Enables the root user to set the device to specific state.
9191
Valid values are "disable", "enable", "suspend", "resume".
9292
User can read this property to see the valid values
9393

94-
What: /sys/kernel/debug/habanalabs/hl<n>/device_release_watchdog_timeout
94+
What: /sys/kernel/debug/accel/<n>/device_release_watchdog_timeout
9595
Date: Oct 2022
9696
KernelVersion: 6.2
9797
9898
Description: The watchdog timeout value in seconds for a device release upon
9999
certain error cases, after which the device is reset.
100100

101-
What: /sys/kernel/debug/habanalabs/hl<n>/dma_size
101+
What: /sys/kernel/debug/accel/<n>/dma_size
102102
Date: Apr 2021
103103
KernelVersion: 5.13
104104
@@ -108,7 +108,7 @@ Description: Specify the size of the DMA transaction when using DMA to read
108108
When the write is finished, the user can read the "data_dma"
109109
blob
110110

111-
What: /sys/kernel/debug/habanalabs/hl<n>/dump_razwi_events
111+
What: /sys/kernel/debug/accel/<n>/dump_razwi_events
112112
Date: Aug 2022
113113
KernelVersion: 5.20
114114
@@ -117,38 +117,38 @@ Description: Dumps all razwi events to dmesg if exist.
117117
the routine will clear the status register.
118118
Usage: cat dump_razwi_events
119119

120-
What: /sys/kernel/debug/habanalabs/hl<n>/dump_security_violations
120+
What: /sys/kernel/debug/accel/<n>/dump_security_violations
121121
Date: Jan 2021
122122
KernelVersion: 5.12
123123
124124
Description: Dumps all security violations to dmesg. This will also ack
125125
all security violations meanings those violations will not be
126126
dumped next time user calls this API
127127

128-
What: /sys/kernel/debug/habanalabs/hl<n>/engines
128+
What: /sys/kernel/debug/accel/<n>/engines
129129
Date: Jul 2019
130130
KernelVersion: 5.3
131131
132132
Description: Displays the status registers values of the device engines and
133133
their derived idle status
134134

135-
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_addr
135+
What: /sys/kernel/debug/accel/<n>/i2c_addr
136136
Date: Jan 2019
137137
KernelVersion: 5.1
138138
139139
Description: Sets I2C device address for I2C transaction that is generated
140140
by the device's CPU, Not available when device is loaded with secured
141141
firmware
142142

143-
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_bus
143+
What: /sys/kernel/debug/accel/<n>/i2c_bus
144144
Date: Jan 2019
145145
KernelVersion: 5.1
146146
147147
Description: Sets I2C bus address for I2C transaction that is generated by
148148
the device's CPU, Not available when device is loaded with secured
149149
firmware
150150

151-
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_data
151+
What: /sys/kernel/debug/accel/<n>/i2c_data
152152
Date: Jan 2019
153153
KernelVersion: 5.1
154154
@@ -157,79 +157,79 @@ Description: Triggers an I2C transaction that is generated by the device's
157157
reading from the file generates a read transaction, Not available
158158
when device is loaded with secured firmware
159159

160-
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_len
160+
What: /sys/kernel/debug/accel/<n>/i2c_len
161161
Date: Dec 2021
162162
KernelVersion: 5.17
163163
164164
Description: Sets I2C length in bytes for I2C transaction that is generated by
165165
the device's CPU, Not available when device is loaded with secured
166166
firmware
167167

168-
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_reg
168+
What: /sys/kernel/debug/accel/<n>/i2c_reg
169169
Date: Jan 2019
170170
KernelVersion: 5.1
171171
172172
Description: Sets I2C register id for I2C transaction that is generated by
173173
the device's CPU, Not available when device is loaded with secured
174174
firmware
175175

176-
What: /sys/kernel/debug/habanalabs/hl<n>/led0
176+
What: /sys/kernel/debug/accel/<n>/led0
177177
Date: Jan 2019
178178
KernelVersion: 5.1
179179
180180
Description: Sets the state of the first S/W led on the device, Not available
181181
when device is loaded with secured firmware
182182

183-
What: /sys/kernel/debug/habanalabs/hl<n>/led1
183+
What: /sys/kernel/debug/accel/<n>/led1
184184
Date: Jan 2019
185185
KernelVersion: 5.1
186186
187187
Description: Sets the state of the second S/W led on the device, Not available
188188
when device is loaded with secured firmware
189189

190-
What: /sys/kernel/debug/habanalabs/hl<n>/led2
190+
What: /sys/kernel/debug/accel/<n>/led2
191191
Date: Jan 2019
192192
KernelVersion: 5.1
193193
194194
Description: Sets the state of the third S/W led on the device, Not available
195195
when device is loaded with secured firmware
196196

197-
What: /sys/kernel/debug/habanalabs/hl<n>/memory_scrub
197+
What: /sys/kernel/debug/accel/<n>/memory_scrub
198198
Date: May 2022
199199
KernelVersion: 5.19
200200
201201
Description: Allows the root user to scrub the dram memory. The scrubbing
202202
value can be set using the debugfs file memory_scrub_val.
203203

204-
What: /sys/kernel/debug/habanalabs/hl<n>/memory_scrub_val
204+
What: /sys/kernel/debug/accel/<n>/memory_scrub_val
205205
Date: May 2022
206206
KernelVersion: 5.19
207207
208208
Description: The value to which the dram will be set to when the user
209209
scrubs the dram using 'memory_scrub' debugfs file and
210210
the scrubbing value when using module param 'memory_scrub'
211211

212-
What: /sys/kernel/debug/habanalabs/hl<n>/mmu
212+
What: /sys/kernel/debug/accel/<n>/mmu
213213
Date: Jan 2019
214214
KernelVersion: 5.1
215215
216216
Description: Displays the hop values and physical address for a given ASID
217217
and virtual address. The user should write the ASID and VA into
218218
the file and then read the file to get the result.
219219
e.g. to display info about VA 0x1000 for ASID 1 you need to do:
220-
echo "1 0x1000" > /sys/kernel/debug/habanalabs/hl0/mmu
220+
echo "1 0x1000" > /sys/kernel/debug/accel/0/mmu
221221

222-
What: /sys/kernel/debug/habanalabs/hl<n>/mmu_error
222+
What: /sys/kernel/debug/accel/<n>/mmu_error
223223
Date: Mar 2021
224224
KernelVersion: 5.12
225225
226226
Description: Check and display page fault or access violation mmu errors for
227227
all MMUs specified in mmu_cap_mask.
228228
e.g. to display error info for MMU hw cap bit 9, you need to do:
229-
echo "0x200" > /sys/kernel/debug/habanalabs/hl0/mmu_error
230-
cat /sys/kernel/debug/habanalabs/hl0/mmu_error
229+
echo "0x200" > /sys/kernel/debug/accel/0/mmu_error
230+
cat /sys/kernel/debug/accel/0/mmu_error
231231

232-
What: /sys/kernel/debug/habanalabs/hl<n>/monitor_dump
232+
What: /sys/kernel/debug/accel/<n>/monitor_dump
233233
Date: Mar 2022
234234
KernelVersion: 5.19
235235
@@ -243,7 +243,7 @@ Description: Allows the root user to dump monitors status from the device's
243243
This interface doesn't support concurrency in the same device.
244244
Only supported on GAUDI.
245245

246-
What: /sys/kernel/debug/habanalabs/hl<n>/monitor_dump_trig
246+
What: /sys/kernel/debug/accel/<n>/monitor_dump_trig
247247
Date: Mar 2022
248248
KernelVersion: 5.19
249249
@@ -253,22 +253,22 @@ Description: Triggers dump of monitor data. The value to trigger the operatio
253253
When the write is finished, the user can read the "monitor_dump"
254254
blob
255255

256-
What: /sys/kernel/debug/habanalabs/hl<n>/set_power_state
256+
What: /sys/kernel/debug/accel/<n>/set_power_state
257257
Date: Jan 2019
258258
KernelVersion: 5.1
259259
260260
Description: Sets the PCI power state. Valid values are "1" for D0 and "2"
261261
for D3Hot
262262

263-
What: /sys/kernel/debug/habanalabs/hl<n>/skip_reset_on_timeout
263+
What: /sys/kernel/debug/accel/<n>/skip_reset_on_timeout
264264
Date: Jun 2021
265265
KernelVersion: 5.13
266266
267267
Description: Sets the skip reset on timeout option for the device. Value of
268268
"0" means device will be reset in case some CS has timed out,
269269
otherwise it will not be reset.
270270

271-
What: /sys/kernel/debug/habanalabs/hl<n>/state_dump
271+
What: /sys/kernel/debug/accel/<n>/state_dump
272272
Date: Oct 2021
273273
KernelVersion: 5.15
274274
@@ -279,37 +279,37 @@ Description: Gets the state dump occurring on a CS timeout or failure.
279279
Writing an integer X discards X state dumps, so that the
280280
next read would return X+1-st newest state dump.
281281

282-
What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err
282+
What: /sys/kernel/debug/accel/<n>/stop_on_err
283283
Date: Mar 2020
284284
KernelVersion: 5.6
285285
286286
Description: Sets the stop-on_error option for the device engines. Value of
287287
"0" is for disable, otherwise enable.
288288
Relevant only for GOYA and GAUDI.
289289

290-
What: /sys/kernel/debug/habanalabs/hl<n>/timeout_locked
290+
What: /sys/kernel/debug/accel/<n>/timeout_locked
291291
Date: Sep 2021
292292
KernelVersion: 5.16
293293
294294
Description: Sets the command submission timeout value in seconds.
295295

296-
What: /sys/kernel/debug/habanalabs/hl<n>/userptr
296+
What: /sys/kernel/debug/accel/<n>/userptr
297297
Date: Jan 2019
298298
KernelVersion: 5.1
299299
300-
Description: Displays a list with information about the currently user
300+
Description: Displays a list with information about the current user
301301
pointers (user virtual addresses) that are pinned and mapped
302302
to DMA addresses
303303

304-
What: /sys/kernel/debug/habanalabs/hl<n>/userptr_lookup
304+
What: /sys/kernel/debug/accel/<n>/userptr_lookup
305305
Date: Oct 2021
306306
KernelVersion: 5.15
307307
308308
Description: Allows to search for specific user pointers (user virtual
309309
addresses) that are pinned and mapped to DMA addresses, and see
310310
their resolution to the specific dma address.
311311

312-
What: /sys/kernel/debug/habanalabs/hl<n>/vm
312+
What: /sys/kernel/debug/accel/<n>/vm
313313
Date: Jan 2019
314314
KernelVersion: 5.1
315315

0 commit comments

Comments
 (0)