Skip to content

Commit 7ecc6aa

Browse files
committed
Documentation/locking/locktypes: Further clarifications and wordsmithing
The documentation of rw_semaphores is wrong as it claims that the non-owner reader release is not supported by RT. That's just history biased memory distortion. Split the 'Owner semantics' section up and add separate sections for semaphore and rw_semaphore to reflect reality. Aside of that the following updates are done: - Add pseudo code to document the spinlock state preserving mechanism on PREEMPT_RT - Wordsmith the bitspinlock and lock nesting sections Co-developed-by: Paul McKenney <[email protected]> Signed-off-by: Paul McKenney <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Sebastian Andrzej Siewior <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent cf226c4 commit 7ecc6aa

File tree

1 file changed

+98
-50
lines changed

1 file changed

+98
-50
lines changed

Documentation/locking/locktypes.rst

Lines changed: 98 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,17 @@ can have suffixes which apply further protections:
6767
_irqsave/restore() Save and disable / restore interrupt disabled state
6868
=================== ====================================================
6969

70+
Owner semantics
71+
===============
72+
73+
The aforementioned lock types except semaphores have strict owner
74+
semantics:
75+
76+
The context (task) that acquired the lock must release it.
77+
78+
rw_semaphores have a special interface which allows non-owner release for
79+
readers.
80+
7081

7182
rtmutex
7283
=======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts. This conversion allows spinlock_t
8394
and rwlock_t to be implemented via RT-mutexes.
8495

8596

97+
semaphore
98+
=========
99+
100+
semaphore is a counting semaphore implementation.
101+
102+
Semaphores are often used for both serialization and waiting, but new use
103+
cases should instead use separate serialization and wait mechanisms, such
104+
as mutexes and completions.
105+
106+
semaphores and PREEMPT_RT
107+
----------------------------
108+
109+
PREEMPT_RT does not change the semaphore implementation because counting
110+
semaphores have no concept of owners, thus preventing PREEMPT_RT from
111+
providing priority inheritance for semaphores. After all, an unknown
112+
owner cannot be boosted. As a consequence, blocking on semaphores can
113+
result in priority inversion.
114+
115+
116+
rw_semaphore
117+
============
118+
119+
rw_semaphore is a multiple readers and single writer lock mechanism.
120+
121+
On non-PREEMPT_RT kernels the implementation is fair, thus preventing
122+
writer starvation.
123+
124+
rw_semaphore complies by default with the strict owner semantics, but there
125+
exist special-purpose interfaces that allow non-owner release for readers.
126+
These interfaces work independent of the kernel configuration.
127+
128+
rw_semaphore and PREEMPT_RT
129+
---------------------------
130+
131+
PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
132+
implementation, thus changing the fairness:
133+
134+
Because an rw_semaphore writer cannot grant its priority to multiple
135+
readers, a preempted low-priority reader will continue holding its lock,
136+
thus starving even high-priority writers. In contrast, because readers
137+
can grant their priority to a writer, a preempted low-priority writer will
138+
have its priority boosted until it releases the lock, thus preventing that
139+
writer from starving readers.
140+
141+
86142
raw_spinlock_t and spinlock_t
87143
=============================
88144

@@ -102,7 +158,7 @@ critical section is tiny, thus avoiding RT-mutex overhead.
102158
spinlock_t
103159
----------
104160

105-
The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
161+
The semantics of spinlock_t change with the state of PREEMPT_RT.
106162

107163
On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
108164
and has exactly the same semantics.
@@ -140,15 +196,39 @@ PREEMPT_RT kernels preserve all other spinlock_t semantics:
140196
kernels leave task state untouched. However, PREEMPT_RT must change
141197
task state if the task blocks during acquisition. Therefore, it saves
142198
the current task state before blocking and the corresponding lock wakeup
143-
restores it.
199+
restores it, as shown below::
200+
201+
task->state = TASK_INTERRUPTIBLE
202+
lock()
203+
block()
204+
task->saved_state = task->state
205+
task->state = TASK_UNINTERRUPTIBLE
206+
schedule()
207+
lock wakeup
208+
task->state = task->saved_state
144209

145210
Other types of wakeups would normally unconditionally set the task state
146211
to RUNNING, but that does not work here because the task must remain
147212
blocked until the lock becomes available. Therefore, when a non-lock
148213
wakeup attempts to awaken a task blocked waiting for a spinlock, it
149214
instead sets the saved state to RUNNING. Then, when the lock
150215
acquisition completes, the lock wakeup sets the task state to the saved
151-
state, in this case setting it to RUNNING.
216+
state, in this case setting it to RUNNING::
217+
218+
task->state = TASK_INTERRUPTIBLE
219+
lock()
220+
block()
221+
task->saved_state = task->state
222+
task->state = TASK_UNINTERRUPTIBLE
223+
schedule()
224+
non lock wakeup
225+
task->saved_state = TASK_RUNNING
226+
227+
lock wakeup
228+
task->state = task->saved_state
229+
230+
This ensures that the real wakeup cannot be lost.
231+
152232

153233
rwlock_t
154234
========
@@ -228,17 +308,16 @@ preemption on PREEMPT_RT kernels::
228308
bit spinlocks
229309
-------------
230310

231-
Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
232-
substituted by an RT-mutex based implementation for obvious reasons.
233-
234-
The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
235-
caveats vs. raw_spinlock_t apply.
311+
PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
312+
small to accommodate an RT-mutex. Therefore, the semantics of bit
313+
spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
314+
caveats also apply to bit spinlocks.
236315

237-
Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
238-
this requires conditional (#ifdef'ed) code changes at the usage site while
239-
the spinlock_t substitution is simply done by the compiler and the
240-
conditionals are restricted to header files and core implementation of the
241-
locking primitives and the usage sites do not require any changes.
316+
Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
317+
using conditional (#ifdef'ed) code changes at the usage site. In contrast,
318+
usage-site changes are not needed for the spinlock_t substitution.
319+
Instead, conditionals in header files and the core locking implemementation
320+
enable the compiler to do the substitution transparently.
242321

243322

244323
Lock type nesting rules
@@ -254,46 +333,15 @@ The most basic rules are:
254333

255334
- Spinning lock types can nest inside sleeping lock types.
256335

257-
These rules apply in general independent of CONFIG_PREEMPT_RT.
336+
These constraints apply both in PREEMPT_RT and otherwise.
258337

259-
As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
260-
spinning to sleeping this has obviously restrictions how they can nest with
261-
raw_spinlock_t.
262-
263-
This results in the following nest ordering:
338+
The fact that PREEMPT_RT changes the lock category of spinlock_t and
339+
rwlock_t from spinning to sleeping means that they cannot be acquired while
340+
holding a raw spinlock. This results in the following nesting ordering:
264341

265342
1) Sleeping locks
266343
2) spinlock_t and rwlock_t
267344
3) raw_spinlock_t and bit spinlocks
268345

269-
Lockdep is aware of these constraints to ensure that they are respected.
270-
271-
272-
Owner semantics
273-
===============
274-
275-
Most lock types in the Linux kernel have strict owner semantics, i.e. the
276-
context (task) which acquires a lock has to release it.
277-
278-
There are two exceptions:
279-
280-
- semaphores
281-
- rwsems
282-
283-
semaphores have no owner semantics for historical reason, and as such
284-
trylock and release operations can be called from any context. They are
285-
often used for both serialization and waiting purposes. That's generally
286-
discouraged and should be replaced by separate serialization and wait
287-
mechanisms, such as mutexes and completions.
288-
289-
rwsems have grown interfaces which allow non owner release for special
290-
purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
291-
substitutes all locking primitives except semaphores with RT-mutex based
292-
implementations to provide priority inheritance for all lock types except
293-
the truly spinning ones. Priority inheritance on ownerless locks is
294-
obviously impossible.
295-
296-
For now the rwsem non-owner release excludes code which utilizes it from
297-
being used on PREEMPT_RT enabled kernels. In same cases this can be
298-
mitigated by disabling portions of the code, in other cases the complete
299-
functionality has to be disabled until a workable solution has been found.
346+
Lockdep will complain if these constraints are violated, both in
347+
PREEMPT_RT and otherwise.

0 commit comments

Comments
 (0)