Skip to content

Commit 5fbff81

Browse files
committed
dma-fence: basic lockdep annotations
Design is similar to the lockdep annotations for workers, but with some twists: - We use a read-lock for the execution/worker/completion side, so that this explicit annotation can be more liberally sprinkled around. With read locks lockdep isn't going to complain if the read-side isn't nested the same way under all circumstances, so ABBA deadlocks are ok. Which they are, since this is an annotation only. - We're using non-recursive lockdep read lock mode, since in recursive read lock mode lockdep does not catch read side hazards. And we _very_ much want read side hazards to be caught. For full details of this limitation see commit e914985 Author: Peter Zijlstra <[email protected]> Date: Wed Aug 23 13:13:11 2017 +0200 locking/lockdep/selftests: Add mixed read-write ABBA tests - To allow nesting of the read-side explicit annotations we explicitly keep track of the nesting. lock_is_held() allows us to do that. - The wait-side annotation is a write lock, and entirely done within dma_fence_wait() for everyone by default. - To be able to freely annotate helper functions I want to make it ok to call dma_fence_begin/end_signalling from soft/hardirq context. First attempt was using the hardirq locking context for the write side in lockdep, but this forces all normal spinlocks nested within dma_fence_begin/end_signalling to be spinlocks. That bollocks. The approach now is to simple check in_atomic(), and for these cases entirely rely on the might_sleep() check in dma_fence_wait(). That will catch any wrong nesting against spinlocks from soft/hardirq contexts. The idea here is that every code path that's critical for eventually signalling a dma_fence should be annotated with dma_fence_begin/end_signalling. The annotation ideally starts right after a dma_fence is published (added to a dma_resv, exposed as a sync_file fd, attached to a drm_syncobj fd, or anything else that makes the dma_fence visible to other kernel threads), up to and including the dma_fence_wait(). Examples are irq handlers, the scheduler rt threads, the tail of execbuf (after the corresponding fences are visible), any workers that end up signalling dma_fences and really anything else. Not annotated should be code paths that only complete fences opportunistically as the gpu progresses, like e.g. shrinker/eviction code. The main class of deadlocks this is supposed to catch are: Thread A: mutex_lock(A); mutex_unlock(A); dma_fence_signal(); Thread B: mutex_lock(A); dma_fence_wait(); mutex_unlock(A); Thread B is blocked on A signalling the fence, but A never gets around to that because it cannot acquire the lock A. Note that dma_fence_wait() is allowed to be nested within dma_fence_begin/end_signalling sections. To allow this to happen the read lock needs to be upgraded to a write lock, which means that any other lock is acquired between the dma_fence_begin_signalling() call and the call to dma_fence_wait(), and still held, this will result in an immediate lockdep complaint. The only other option would be to not annotate such calls, defeating the point. Therefore these annotations cannot be sprinkled over the code entirely mindless to avoid false positives. Originally I hope that the cross-release lockdep extensions would alleviate the need for explicit annotations: https://lwn.net/Articles/709849/ But there's a few reasons why that's not an option: - It's not happening in upstream, since it got reverted due to too many false positives: commit e966eae Author: Ingo Molnar <[email protected]> Date: Tue Dec 12 12:31:16 2017 +0100 locking/lockdep: Remove the cross-release locking checks This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y), while it found a number of old bugs initially, was also causing too many false positives that caused people to disable lockdep - which is arguably a worse overall outcome. - cross-release uses the complete() call to annotate the end of critical sections, for dma_fence that would be dma_fence_signal(). But we do not want all dma_fence_signal() calls to be treated as critical, since many are opportunistic cleanup of gpu requests. If these get stuck there's still the main completion interrupt and workers who can unblock everyone. Automatically annotating all dma_fence_signal() calls would hence cause false positives. - cross-release had some educated guesses for when a critical section starts, like fresh syscall or fresh work callback. This would again cause false positives without explicit annotations, since for dma_fence the critical sections only starts when we publish a fence. - Furthermore there can be cases where a thread never does a dma_fence_signal, but is still critical for reaching completion of fences. One example would be a scheduler kthread which picks up jobs and pushes them into hardware, where the interrupt handler or another completion thread calls dma_fence_signal(). But if the scheduler thread hangs, then all the fences hang, hence we need to manually annotate it. cross-release aimed to solve this by chaining cross-release dependencies, but the dependency from scheduler thread to the completion interrupt handler goes through hw where cross-release code can't observe it. In short, without manual annotations and careful review of the start and end of critical sections, cross-relese dependency tracking doesn't work. We need explicit annotations. v2: handle soft/hardirq ctx better against write side and dont forget EXPORT_SYMBOL, drivers can't use this otherwise. v3: Kerneldoc. v4: Some spelling fixes from Mika v5: Amend commit message to explain in detail why cross-release isn't the solution. v6: Pull out misplaced .rst hunk. Acked-by: Christian König <[email protected]> Acked-by: Dave Airlie <[email protected]> Cc: Felix Kuehling <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Reviewed-by: Maarten Lankhorst <[email protected]> Cc: Mika Kuoppala <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Chris Wilson <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
1 parent 23f166c commit 5fbff81

File tree

3 files changed

+179
-0
lines changed

3 files changed

+179
-0
lines changed

Documentation/driver-api/dma-buf.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,12 @@ DMA Fences
133133
.. kernel-doc:: drivers/dma-buf/dma-fence.c
134134
:doc: DMA fences overview
135135

136+
DMA Fence Signalling Annotations
137+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
138+
139+
.. kernel-doc:: drivers/dma-buf/dma-fence.c
140+
:doc: fence signalling annotation
141+
136142
DMA Fences Functions Reference
137143
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
138144

drivers/dma-buf/dma-fence.c

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,160 @@ u64 dma_fence_context_alloc(unsigned num)
110110
}
111111
EXPORT_SYMBOL(dma_fence_context_alloc);
112112

113+
/**
114+
* DOC: fence signalling annotation
115+
*
116+
* Proving correctness of all the kernel code around &dma_fence through code
117+
* review and testing is tricky for a few reasons:
118+
*
119+
* * It is a cross-driver contract, and therefore all drivers must follow the
120+
* same rules for lock nesting order, calling contexts for various functions
121+
* and anything else significant for in-kernel interfaces. But it is also
122+
* impossible to test all drivers in a single machine, hence brute-force N vs.
123+
* N testing of all combinations is impossible. Even just limiting to the
124+
* possible combinations is infeasible.
125+
*
126+
* * There is an enormous amount of driver code involved. For render drivers
127+
* there's the tail of command submission, after fences are published,
128+
* scheduler code, interrupt and workers to process job completion,
129+
* and timeout, gpu reset and gpu hang recovery code. Plus for integration
130+
* with core mm with have &mmu_notifier, respectively &mmu_interval_notifier,
131+
* and &shrinker. For modesetting drivers there's the commit tail functions
132+
* between when fences for an atomic modeset are published, and when the
133+
* corresponding vblank completes, including any interrupt processing and
134+
* related workers. Auditing all that code, across all drivers, is not
135+
* feasible.
136+
*
137+
* * Due to how many other subsystems are involved and the locking hierarchies
138+
* this pulls in there is extremely thin wiggle-room for driver-specific
139+
* differences. &dma_fence interacts with almost all of the core memory
140+
* handling through page fault handlers via &dma_resv, dma_resv_lock() and
141+
* dma_resv_unlock(). On the other side it also interacts through all
142+
* allocation sites through &mmu_notifier and &shrinker.
143+
*
144+
* Furthermore lockdep does not handle cross-release dependencies, which means
145+
* any deadlocks between dma_fence_wait() and dma_fence_signal() can't be caught
146+
* at runtime with some quick testing. The simplest example is one thread
147+
* waiting on a &dma_fence while holding a lock::
148+
*
149+
* lock(A);
150+
* dma_fence_wait(B);
151+
* unlock(A);
152+
*
153+
* while the other thread is stuck trying to acquire the same lock, which
154+
* prevents it from signalling the fence the previous thread is stuck waiting
155+
* on::
156+
*
157+
* lock(A);
158+
* unlock(A);
159+
* dma_fence_signal(B);
160+
*
161+
* By manually annotating all code relevant to signalling a &dma_fence we can
162+
* teach lockdep about these dependencies, which also helps with the validation
163+
* headache since now lockdep can check all the rules for us::
164+
*
165+
* cookie = dma_fence_begin_signalling();
166+
* lock(A);
167+
* unlock(A);
168+
* dma_fence_signal(B);
169+
* dma_fence_end_signalling(cookie);
170+
*
171+
* For using dma_fence_begin_signalling() and dma_fence_end_signalling() to
172+
* annotate critical sections the following rules need to be observed:
173+
*
174+
* * All code necessary to complete a &dma_fence must be annotated, from the
175+
* point where a fence is accessible to other threads, to the point where
176+
* dma_fence_signal() is called. Un-annotated code can contain deadlock issues,
177+
* and due to the very strict rules and many corner cases it is infeasible to
178+
* catch these just with review or normal stress testing.
179+
*
180+
* * &struct dma_resv deserves a special note, since the readers are only
181+
* protected by rcu. This means the signalling critical section starts as soon
182+
* as the new fences are installed, even before dma_resv_unlock() is called.
183+
*
184+
* * The only exception are fast paths and opportunistic signalling code, which
185+
* calls dma_fence_signal() purely as an optimization, but is not required to
186+
* guarantee completion of a &dma_fence. The usual example is a wait IOCTL
187+
* which calls dma_fence_signal(), while the mandatory completion path goes
188+
* through a hardware interrupt and possible job completion worker.
189+
*
190+
* * To aid composability of code, the annotations can be freely nested, as long
191+
* as the overall locking hierarchy is consistent. The annotations also work
192+
* both in interrupt and process context. Due to implementation details this
193+
* requires that callers pass an opaque cookie from
194+
* dma_fence_begin_signalling() to dma_fence_end_signalling().
195+
*
196+
* * Validation against the cross driver contract is implemented by priming
197+
* lockdep with the relevant hierarchy at boot-up. This means even just
198+
* testing with a single device is enough to validate a driver, at least as
199+
* far as deadlocks with dma_fence_wait() against dma_fence_signal() are
200+
* concerned.
201+
*/
202+
#ifdef CONFIG_LOCKDEP
203+
struct lockdep_map dma_fence_lockdep_map = {
204+
.name = "dma_fence_map"
205+
};
206+
207+
/**
208+
* dma_fence_begin_signalling - begin a critical DMA fence signalling section
209+
*
210+
* Drivers should use this to annotate the beginning of any code section
211+
* required to eventually complete &dma_fence by calling dma_fence_signal().
212+
*
213+
* The end of these critical sections are annotated with
214+
* dma_fence_end_signalling().
215+
*
216+
* Returns:
217+
*
218+
* Opaque cookie needed by the implementation, which needs to be passed to
219+
* dma_fence_end_signalling().
220+
*/
221+
bool dma_fence_begin_signalling(void)
222+
{
223+
/* explicitly nesting ... */
224+
if (lock_is_held_type(&dma_fence_lockdep_map, 1))
225+
return true;
226+
227+
/* rely on might_sleep check for soft/hardirq locks */
228+
if (in_atomic())
229+
return true;
230+
231+
/* ... and non-recursive readlock */
232+
lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_);
233+
234+
return false;
235+
}
236+
EXPORT_SYMBOL(dma_fence_begin_signalling);
237+
238+
/**
239+
* dma_fence_end_signalling - end a critical DMA fence signalling section
240+
*
241+
* Closes a critical section annotation opened by dma_fence_begin_signalling().
242+
*/
243+
void dma_fence_end_signalling(bool cookie)
244+
{
245+
if (cookie)
246+
return;
247+
248+
lock_release(&dma_fence_lockdep_map, _RET_IP_);
249+
}
250+
EXPORT_SYMBOL(dma_fence_end_signalling);
251+
252+
void __dma_fence_might_wait(void)
253+
{
254+
bool tmp;
255+
256+
tmp = lock_is_held_type(&dma_fence_lockdep_map, 1);
257+
if (tmp)
258+
lock_release(&dma_fence_lockdep_map, _THIS_IP_);
259+
lock_map_acquire(&dma_fence_lockdep_map);
260+
lock_map_release(&dma_fence_lockdep_map);
261+
if (tmp)
262+
lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _THIS_IP_);
263+
}
264+
#endif
265+
266+
113267
/**
114268
* dma_fence_signal_locked - signal completion of a fence
115269
* @fence: the fence to signal
@@ -170,14 +324,19 @@ int dma_fence_signal(struct dma_fence *fence)
170324
{
171325
unsigned long flags;
172326
int ret;
327+
bool tmp;
173328

174329
if (!fence)
175330
return -EINVAL;
176331

332+
tmp = dma_fence_begin_signalling();
333+
177334
spin_lock_irqsave(fence->lock, flags);
178335
ret = dma_fence_signal_locked(fence);
179336
spin_unlock_irqrestore(fence->lock, flags);
180337

338+
dma_fence_end_signalling(tmp);
339+
181340
return ret;
182341
}
183342
EXPORT_SYMBOL(dma_fence_signal);
@@ -210,6 +369,8 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
210369

211370
might_sleep();
212371

372+
__dma_fence_might_wait();
373+
213374
trace_dma_fence_wait_start(fence);
214375
if (fence->ops->wait)
215376
ret = fence->ops->wait(fence, intr, timeout);

include/linux/dma-fence.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,18 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
357357
} while (1);
358358
}
359359

360+
#ifdef CONFIG_LOCKDEP
361+
bool dma_fence_begin_signalling(void);
362+
void dma_fence_end_signalling(bool cookie);
363+
#else
364+
static inline bool dma_fence_begin_signalling(void)
365+
{
366+
return true;
367+
}
368+
static inline void dma_fence_end_signalling(bool cookie) {}
369+
static inline void __dma_fence_might_wait(void) {}
370+
#endif
371+
360372
int dma_fence_signal(struct dma_fence *fence);
361373
int dma_fence_signal_locked(struct dma_fence *fence);
362374
signed long dma_fence_default_wait(struct dma_fence *fence,

0 commit comments

Comments
 (0)