@@ -67,6 +67,17 @@ can have suffixes which apply further protections:
67
67
_irqsave/restore() Save and disable / restore interrupt disabled state
68
68
=================== ====================================================
69
69
70
+ Owner semantics
71
+ ===============
72
+
73
+ The aforementioned lock types except semaphores have strict owner
74
+ semantics:
75
+
76
+ The context (task) that acquired the lock must release it.
77
+
78
+ rw_semaphores have a special interface which allows non-owner release for
79
+ readers.
80
+
70
81
71
82
rtmutex
72
83
=======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts. This conversion allows spinlock_t
83
94
and rwlock_t to be implemented via RT-mutexes.
84
95
85
96
97
+ semaphore
98
+ =========
99
+
100
+ semaphore is a counting semaphore implementation.
101
+
102
+ Semaphores are often used for both serialization and waiting, but new use
103
+ cases should instead use separate serialization and wait mechanisms, such
104
+ as mutexes and completions.
105
+
106
+ semaphores and PREEMPT_RT
107
+ ----------------------------
108
+
109
+ PREEMPT_RT does not change the semaphore implementation because counting
110
+ semaphores have no concept of owners, thus preventing PREEMPT_RT from
111
+ providing priority inheritance for semaphores. After all, an unknown
112
+ owner cannot be boosted. As a consequence, blocking on semaphores can
113
+ result in priority inversion.
114
+
115
+
116
+ rw_semaphore
117
+ ============
118
+
119
+ rw_semaphore is a multiple readers and single writer lock mechanism.
120
+
121
+ On non-PREEMPT_RT kernels the implementation is fair, thus preventing
122
+ writer starvation.
123
+
124
+ rw_semaphore complies by default with the strict owner semantics, but there
125
+ exist special-purpose interfaces that allow non-owner release for readers.
126
+ These interfaces work independent of the kernel configuration.
127
+
128
+ rw_semaphore and PREEMPT_RT
129
+ ---------------------------
130
+
131
+ PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
132
+ implementation, thus changing the fairness:
133
+
134
+ Because an rw_semaphore writer cannot grant its priority to multiple
135
+ readers, a preempted low-priority reader will continue holding its lock,
136
+ thus starving even high-priority writers. In contrast, because readers
137
+ can grant their priority to a writer, a preempted low-priority writer will
138
+ have its priority boosted until it releases the lock, thus preventing that
139
+ writer from starving readers.
140
+
141
+
86
142
raw_spinlock_t and spinlock_t
87
143
=============================
88
144
@@ -102,7 +158,7 @@ critical section is tiny, thus avoiding RT-mutex overhead.
102
158
spinlock_t
103
159
----------
104
160
105
- The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT .
161
+ The semantics of spinlock_t change with the state of PREEMPT_RT .
106
162
107
163
On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
108
164
and has exactly the same semantics.
@@ -140,15 +196,39 @@ PREEMPT_RT kernels preserve all other spinlock_t semantics:
140
196
kernels leave task state untouched. However, PREEMPT_RT must change
141
197
task state if the task blocks during acquisition. Therefore, it saves
142
198
the current task state before blocking and the corresponding lock wakeup
143
- restores it.
199
+ restores it, as shown below::
200
+
201
+ task->state = TASK_INTERRUPTIBLE
202
+ lock()
203
+ block()
204
+ task->saved_state = task->state
205
+ task->state = TASK_UNINTERRUPTIBLE
206
+ schedule()
207
+ lock wakeup
208
+ task->state = task->saved_state
144
209
145
210
Other types of wakeups would normally unconditionally set the task state
146
211
to RUNNING, but that does not work here because the task must remain
147
212
blocked until the lock becomes available. Therefore, when a non-lock
148
213
wakeup attempts to awaken a task blocked waiting for a spinlock, it
149
214
instead sets the saved state to RUNNING. Then, when the lock
150
215
acquisition completes, the lock wakeup sets the task state to the saved
151
- state, in this case setting it to RUNNING.
216
+ state, in this case setting it to RUNNING::
217
+
218
+ task->state = TASK_INTERRUPTIBLE
219
+ lock()
220
+ block()
221
+ task->saved_state = task->state
222
+ task->state = TASK_UNINTERRUPTIBLE
223
+ schedule()
224
+ non lock wakeup
225
+ task->saved_state = TASK_RUNNING
226
+
227
+ lock wakeup
228
+ task->state = task->saved_state
229
+
230
+ This ensures that the real wakeup cannot be lost.
231
+
152
232
153
233
rwlock_t
154
234
========
@@ -228,17 +308,16 @@ preemption on PREEMPT_RT kernels::
228
308
bit spinlocks
229
309
-------------
230
310
231
- Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
232
- substituted by an RT-mutex based implementation for obvious reasons.
233
-
234
- The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
235
- caveats vs. raw_spinlock_t apply.
311
+ PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
312
+ small to accommodate an RT-mutex. Therefore, the semantics of bit
313
+ spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
314
+ caveats also apply to bit spinlocks.
236
315
237
- Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
238
- this requires conditional (#ifdef'ed) code changes at the usage site while
239
- the spinlock_t substitution is simply done by the compiler and the
240
- conditionals are restricted to header files and core implementation of the
241
- locking primitives and the usage sites do not require any changes .
316
+ Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
317
+ using conditional (#ifdef'ed) code changes at the usage site. In contrast,
318
+ usage-site changes are not needed for the spinlock_t substitution.
319
+ Instead, conditionals in header files and the core locking implemementation
320
+ enable the compiler to do the substitution transparently .
242
321
243
322
244
323
Lock type nesting rules
@@ -254,46 +333,15 @@ The most basic rules are:
254
333
255
334
- Spinning lock types can nest inside sleeping lock types.
256
335
257
- These rules apply in general independent of CONFIG_PREEMPT_RT .
336
+ These constraints apply both in PREEMPT_RT and otherwise .
258
337
259
- As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
260
- spinning to sleeping this has obviously restrictions how they can nest with
261
- raw_spinlock_t.
262
-
263
- This results in the following nest ordering:
338
+ The fact that PREEMPT_RT changes the lock category of spinlock_t and
339
+ rwlock_t from spinning to sleeping means that they cannot be acquired while
340
+ holding a raw spinlock. This results in the following nesting ordering:
264
341
265
342
1) Sleeping locks
266
343
2) spinlock_t and rwlock_t
267
344
3) raw_spinlock_t and bit spinlocks
268
345
269
- Lockdep is aware of these constraints to ensure that they are respected.
270
-
271
-
272
- Owner semantics
273
- ===============
274
-
275
- Most lock types in the Linux kernel have strict owner semantics, i.e. the
276
- context (task) which acquires a lock has to release it.
277
-
278
- There are two exceptions:
279
-
280
- - semaphores
281
- - rwsems
282
-
283
- semaphores have no owner semantics for historical reason, and as such
284
- trylock and release operations can be called from any context. They are
285
- often used for both serialization and waiting purposes. That's generally
286
- discouraged and should be replaced by separate serialization and wait
287
- mechanisms, such as mutexes and completions.
288
-
289
- rwsems have grown interfaces which allow non owner release for special
290
- purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
291
- substitutes all locking primitives except semaphores with RT-mutex based
292
- implementations to provide priority inheritance for all lock types except
293
- the truly spinning ones. Priority inheritance on ownerless locks is
294
- obviously impossible.
295
-
296
- For now the rwsem non-owner release excludes code which utilizes it from
297
- being used on PREEMPT_RT enabled kernels. In same cases this can be
298
- mitigated by disabling portions of the code, in other cases the complete
299
- functionality has to be disabled until a workable solution has been found.
346
+ Lockdep will complain if these constraints are violated, both in
347
+ PREEMPT_RT and otherwise.
0 commit comments