@@ -174,6 +174,173 @@ running one, no real task switch occurs but interrupts are disabled nonetheless:
174
174
| | irq_entry
175
175
+---------------+ irq_enable
176
176
177
+ Monitor nrp
178
+ -----------
179
+
180
+ The need resched preempts (nrp) monitor ensures preemption requires
181
+ ``need_resched ``. Only kernel preemption is considered, since preemption
182
+ while returning to userspace, for this monitor, is indistinguishable from
183
+ ``sched_switch_yield `` (described in the sssw monitor).
184
+ A kernel preemption is whenever ``__schedule `` is called with the preemption
185
+ flag set to true (e.g. from preempt_enable or exiting from interrupts). This
186
+ type of preemption occurs after the need for ``rescheduling `` has been set.
187
+ This is not valid for the *lazy * variant of the flag, which causes only
188
+ userspace preemption.
189
+ A ``schedule_entry_preempt `` may involve a task switch or not, in the latter
190
+ case, a task goes through the scheduler from a preemption context but it is
191
+ picked as the next task to run. Since the scheduler runs, this clears the need
192
+ to reschedule. The ``any_thread_running `` state does not imply the monitored
193
+ task is not running as this monitor does not track the outcome of scheduling.
194
+
195
+ In theory, a preemption can only occur after the ``need_resched `` flag is set. In
196
+ practice, however, it is possible to see a preemption where the flag is not
197
+ set. This can happen in one specific condition::
198
+
199
+ need_resched
200
+ preempt_schedule()
201
+ preempt_schedule_irq()
202
+ __schedule()
203
+ !need_resched
204
+ __schedule()
205
+
206
+ In the situation above, standard preemption starts (e.g. from preempt_enable
207
+ when the flag is set), an interrupt occurs before scheduling and, on its exit
208
+ path, it schedules, which clears the ``need_resched `` flag.
209
+ When the preempted task runs again, the standard preemption started earlier
210
+ resumes, although the flag is no longer set. The monitor considers this a
211
+ ``nested_preemption ``, this allows another preemption without re-setting the
212
+ flag. This condition relaxes the monitor constraints and may catch false
213
+ negatives (i.e. no real ``nested_preemptions ``) but makes the monitor more
214
+ robust and able to validate other scenarios.
215
+ For simplicity, the monitor starts in ``preempt_irq ``, although no interrupt
216
+ occurred, as the situation above is hard to pinpoint::
217
+
218
+ schedule_entry
219
+ irq_entry #===========================================#
220
+ +-------------------------- H H
221
+ | H H
222
+ +-------------------------> H any_thread_running H
223
+ H H
224
+ +-------------------------> H H
225
+ | #===========================================#
226
+ | schedule_entry | ^
227
+ | schedule_entry_preempt | sched_need_resched | schedule_entry
228
+ | | schedule_entry_preempt
229
+ | v |
230
+ | +----------------------+ |
231
+ | +--- | | |
232
+ | sched_need_resched | | rescheduling | -+
233
+ | +--> | |
234
+ | +----------------------+
235
+ | | irq_entry
236
+ | v
237
+ | +----------------------+
238
+ | | | ---+
239
+ | ---> | | | sched_need_resched
240
+ | | preempt_irq | | irq_entry
241
+ | | | <--+
242
+ | | | <--+
243
+ | +----------------------+ |
244
+ | | schedule_entry | sched_need_resched
245
+ | | schedule_entry_preempt |
246
+ | v |
247
+ | +-----------------------+ |
248
+ +-------------------------- | nested_preempt | --+
249
+ +-----------------------+
250
+ ^ irq_entry |
251
+ +-------------------+
252
+
253
+ Due to how the ``need_resched `` flag on the preemption count works on arm64,
254
+ this monitor is unstable on that architecture, as it often records preemption
255
+ when the flag is not set, even in presence of the workaround above.
256
+ For the time being, the monitor is disabled by default on arm64.
257
+
258
+ Monitor sssw
259
+ ------------
260
+
261
+ The set state sleep and wakeup (sssw) monitor ensures ``set_state `` to
262
+ sleepable leads to sleeping and sleeping tasks require wakeup. It includes the
263
+ following types of switch:
264
+
265
+ * ``switch_suspend ``:
266
+ a task puts itself to sleep, this can happen only after explicitly setting
267
+ the task to ``sleepable ``. After a task is suspended, it needs to be woken up
268
+ (``waking `` state) before being switched in again.
269
+ Setting the task's state to ``sleepable `` can be reverted before switching if it
270
+ is woken up or set to ``runnable ``.
271
+ * ``switch_blocking ``:
272
+ a special case of a ``switch_suspend `` where the task is waiting on a
273
+ sleeping RT lock (``PREEMPT_RT `` only), it is common to see wakeup and set
274
+ state events racing with each other and this leads the model to perceive this
275
+ type of switch when the task is not set to sleepable. This is a limitation of
276
+ the model in SMP system and workarounds may slow down the system.
277
+ * ``switch_preempt ``:
278
+ a task switch as a result of kernel preemption (``schedule_entry_preempt `` in
279
+ the nrp model).
280
+ * ``switch_yield ``:
281
+ a task explicitly calls the scheduler or is preempted while returning to
282
+ userspace. It can happen after a ``yield `` system call, from the idle task or
283
+ if the ``need_resched `` flag is set. By definition, a task cannot yield while
284
+ ``sleepable `` as that would be a suspension. A special case of a yield occurs
285
+ when a task in ``TASK_INTERRUPTIBLE `` calls the scheduler while a signal is
286
+ pending. The task doesn't go through the usual blocking/waking and is set
287
+ back to runnable, the resulting switch (if there) looks like a yield to the
288
+ ``signal_wakeup `` state and is followed by the signal delivery. From this
289
+ state, the monitor expects a signal even if it sees a wakeup event, although
290
+ not necessary, to rule out false negatives.
291
+
292
+ This monitor doesn't include a running state, ``sleepable `` and ``runnable ``
293
+ are only referring to the task's desired state, which could be scheduled out
294
+ (e.g. due to preemption). However, it does include the event
295
+ ``sched_switch_in `` to represent when a task is allowed to become running. This
296
+ can be triggered also by preemption, but cannot occur after the task got to
297
+ ``sleeping `` before a ``wakeup `` occurs::
298
+
299
+ +--------------------------------------------------------------------------+
300
+ | |
301
+ | |
302
+ | switch_suspend | |
303
+ | switch_blocking | |
304
+ v v |
305
+ +----------+ #==========================# set_state_runnable |
306
+ | | H H wakeup |
307
+ | | H H switch_in |
308
+ | | H H switch_yield |
309
+ | sleeping | H H switch_preempt |
310
+ | | H H signal_deliver |
311
+ | | switch_ H H ------+ |
312
+ | | _blocking H runnable H | |
313
+ | | <----------- H H <-----+ |
314
+ +----------+ H H |
315
+ | wakeup H H |
316
+ +---------------------> H H |
317
+ H H |
318
+ +---------> H H |
319
+ | #==========================# |
320
+ | | ^ |
321
+ | | | set_state_runnable |
322
+ | | | wakeup |
323
+ | set_state_sleepable | +------------------------+
324
+ | v | |
325
+ | +--------------------------+ set_state_sleepable
326
+ | | | switch_in
327
+ | | | switch_preempt
328
+ signal_deliver | sleepable | signal_deliver
329
+ | | | ------+
330
+ | | | |
331
+ | | | <-----+
332
+ | +--------------------------+
333
+ | | ^
334
+ | switch_yield | set_state_sleepable
335
+ | v |
336
+ | +---------------+ |
337
+ +---------- | signal_wakeup | -+
338
+ +---------------+
339
+ ^ | switch_in
340
+ | | switch_preempt
341
+ | | switch_yield
342
+ +-----------+ wakeup
343
+
177
344
References
178
345
----------
179
346
0 commit comments