Skip to content

Commit e8440a8

Browse files
glemcorostedt
authored andcommitted
rv: Add nrp and sssw per-task monitors
Add 2 per-task monitors as part of the sched model: * nrp: need-resched preempts Monitor to ensure preemption requires need resched. * sssw: set state sleep and wakeup Monitor to ensure sched_set_state to sleepable leads to sleeping and sleeping tasks require wakeup. Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Tomas Glozar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Clark Williams <[email protected]> Cc: John Kacur <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Gabriele Monaco <[email protected]> Acked-by: Nam Cao <[email protected]> Tested-by: Nam Cao <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
1 parent d0096c2 commit e8440a8

File tree

15 files changed

+728
-0
lines changed

15 files changed

+728
-0
lines changed

Documentation/trace/rv/monitor_sched.rst

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,173 @@ running one, no real task switch occurs but interrupts are disabled nonetheless:
174174
| | irq_entry
175175
+---------------+ irq_enable
176176

177+
Monitor nrp
178+
-----------
179+
180+
The need resched preempts (nrp) monitor ensures preemption requires
181+
``need_resched``. Only kernel preemption is considered, since preemption
182+
while returning to userspace, for this monitor, is indistinguishable from
183+
``sched_switch_yield`` (described in the sssw monitor).
184+
A kernel preemption is whenever ``__schedule`` is called with the preemption
185+
flag set to true (e.g. from preempt_enable or exiting from interrupts). This
186+
type of preemption occurs after the need for ``rescheduling`` has been set.
187+
This is not valid for the *lazy* variant of the flag, which causes only
188+
userspace preemption.
189+
A ``schedule_entry_preempt`` may involve a task switch or not, in the latter
190+
case, a task goes through the scheduler from a preemption context but it is
191+
picked as the next task to run. Since the scheduler runs, this clears the need
192+
to reschedule. The ``any_thread_running`` state does not imply the monitored
193+
task is not running as this monitor does not track the outcome of scheduling.
194+
195+
In theory, a preemption can only occur after the ``need_resched`` flag is set. In
196+
practice, however, it is possible to see a preemption where the flag is not
197+
set. This can happen in one specific condition::
198+
199+
need_resched
200+
preempt_schedule()
201+
preempt_schedule_irq()
202+
__schedule()
203+
!need_resched
204+
__schedule()
205+
206+
In the situation above, standard preemption starts (e.g. from preempt_enable
207+
when the flag is set), an interrupt occurs before scheduling and, on its exit
208+
path, it schedules, which clears the ``need_resched`` flag.
209+
When the preempted task runs again, the standard preemption started earlier
210+
resumes, although the flag is no longer set. The monitor considers this a
211+
``nested_preemption``, this allows another preemption without re-setting the
212+
flag. This condition relaxes the monitor constraints and may catch false
213+
negatives (i.e. no real ``nested_preemptions``) but makes the monitor more
214+
robust and able to validate other scenarios.
215+
For simplicity, the monitor starts in ``preempt_irq``, although no interrupt
216+
occurred, as the situation above is hard to pinpoint::
217+
218+
schedule_entry
219+
irq_entry #===========================================#
220+
+-------------------------- H H
221+
| H H
222+
+-------------------------> H any_thread_running H
223+
H H
224+
+-------------------------> H H
225+
| #===========================================#
226+
| schedule_entry | ^
227+
| schedule_entry_preempt | sched_need_resched | schedule_entry
228+
| | schedule_entry_preempt
229+
| v |
230+
| +----------------------+ |
231+
| +--- | | |
232+
| sched_need_resched | | rescheduling | -+
233+
| +--> | |
234+
| +----------------------+
235+
| | irq_entry
236+
| v
237+
| +----------------------+
238+
| | | ---+
239+
| ---> | | | sched_need_resched
240+
| | preempt_irq | | irq_entry
241+
| | | <--+
242+
| | | <--+
243+
| +----------------------+ |
244+
| | schedule_entry | sched_need_resched
245+
| | schedule_entry_preempt |
246+
| v |
247+
| +-----------------------+ |
248+
+-------------------------- | nested_preempt | --+
249+
+-----------------------+
250+
^ irq_entry |
251+
+-------------------+
252+
253+
Due to how the ``need_resched`` flag on the preemption count works on arm64,
254+
this monitor is unstable on that architecture, as it often records preemption
255+
when the flag is not set, even in presence of the workaround above.
256+
For the time being, the monitor is disabled by default on arm64.
257+
258+
Monitor sssw
259+
------------
260+
261+
The set state sleep and wakeup (sssw) monitor ensures ``set_state`` to
262+
sleepable leads to sleeping and sleeping tasks require wakeup. It includes the
263+
following types of switch:
264+
265+
* ``switch_suspend``:
266+
a task puts itself to sleep, this can happen only after explicitly setting
267+
the task to ``sleepable``. After a task is suspended, it needs to be woken up
268+
(``waking`` state) before being switched in again.
269+
Setting the task's state to ``sleepable`` can be reverted before switching if it
270+
is woken up or set to ``runnable``.
271+
* ``switch_blocking``:
272+
a special case of a ``switch_suspend`` where the task is waiting on a
273+
sleeping RT lock (``PREEMPT_RT`` only), it is common to see wakeup and set
274+
state events racing with each other and this leads the model to perceive this
275+
type of switch when the task is not set to sleepable. This is a limitation of
276+
the model in SMP system and workarounds may slow down the system.
277+
* ``switch_preempt``:
278+
a task switch as a result of kernel preemption (``schedule_entry_preempt`` in
279+
the nrp model).
280+
* ``switch_yield``:
281+
a task explicitly calls the scheduler or is preempted while returning to
282+
userspace. It can happen after a ``yield`` system call, from the idle task or
283+
if the ``need_resched`` flag is set. By definition, a task cannot yield while
284+
``sleepable`` as that would be a suspension. A special case of a yield occurs
285+
when a task in ``TASK_INTERRUPTIBLE`` calls the scheduler while a signal is
286+
pending. The task doesn't go through the usual blocking/waking and is set
287+
back to runnable, the resulting switch (if there) looks like a yield to the
288+
``signal_wakeup`` state and is followed by the signal delivery. From this
289+
state, the monitor expects a signal even if it sees a wakeup event, although
290+
not necessary, to rule out false negatives.
291+
292+
This monitor doesn't include a running state, ``sleepable`` and ``runnable``
293+
are only referring to the task's desired state, which could be scheduled out
294+
(e.g. due to preemption). However, it does include the event
295+
``sched_switch_in`` to represent when a task is allowed to become running. This
296+
can be triggered also by preemption, but cannot occur after the task got to
297+
``sleeping`` before a ``wakeup`` occurs::
298+
299+
+--------------------------------------------------------------------------+
300+
| |
301+
| |
302+
| switch_suspend | |
303+
| switch_blocking | |
304+
v v |
305+
+----------+ #==========================# set_state_runnable |
306+
| | H H wakeup |
307+
| | H H switch_in |
308+
| | H H switch_yield |
309+
| sleeping | H H switch_preempt |
310+
| | H H signal_deliver |
311+
| | switch_ H H ------+ |
312+
| | _blocking H runnable H | |
313+
| | <----------- H H <-----+ |
314+
+----------+ H H |
315+
| wakeup H H |
316+
+---------------------> H H |
317+
H H |
318+
+---------> H H |
319+
| #==========================# |
320+
| | ^ |
321+
| | | set_state_runnable |
322+
| | | wakeup |
323+
| set_state_sleepable | +------------------------+
324+
| v | |
325+
| +--------------------------+ set_state_sleepable
326+
| | | switch_in
327+
| | | switch_preempt
328+
signal_deliver | sleepable | signal_deliver
329+
| | | ------+
330+
| | | |
331+
| | | <-----+
332+
| +--------------------------+
333+
| | ^
334+
| switch_yield | set_state_sleepable
335+
| v |
336+
| +---------------+ |
337+
+---------- | signal_wakeup | -+
338+
+---------------+
339+
^ | switch_in
340+
| | switch_preempt
341+
| | switch_yield
342+
+-----------+ wakeup
343+
177344
References
178345
----------
179346

kernel/trace/rv/Kconfig

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ source "kernel/trace/rv/monitors/snroc/Kconfig"
5555
source "kernel/trace/rv/monitors/scpd/Kconfig"
5656
source "kernel/trace/rv/monitors/snep/Kconfig"
5757
source "kernel/trace/rv/monitors/sts/Kconfig"
58+
source "kernel/trace/rv/monitors/nrp/Kconfig"
59+
source "kernel/trace/rv/monitors/sssw/Kconfig"
5860
# Add new sched monitors here
5961

6062
source "kernel/trace/rv/monitors/rtapp/Kconfig"

kernel/trace/rv/Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ obj-$(CONFIG_RV_MON_RTAPP) += monitors/rtapp/rtapp.o
1414
obj-$(CONFIG_RV_MON_PAGEFAULT) += monitors/pagefault/pagefault.o
1515
obj-$(CONFIG_RV_MON_SLEEP) += monitors/sleep/sleep.o
1616
obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
17+
obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
18+
obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
1719
# Add new monitors here
1820
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
1921
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o

kernel/trace/rv/monitors/nrp/Kconfig

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# SPDX-License-Identifier: GPL-2.0-only
2+
#
3+
config RV_MON_NRP
4+
depends on RV
5+
depends on RV_MON_SCHED
6+
default y if !ARM64
7+
select DA_MON_EVENTS_ID
8+
bool "nrp monitor"
9+
help
10+
Monitor to ensure preemption requires need resched.
11+
This monitor is part of the sched monitors collection.
12+
13+
This monitor is unstable on arm64, say N unless you are testing it.
14+
15+
For further information, see:
16+
Documentation/trace/rv/monitor_sched.rst

kernel/trace/rv/monitors/nrp/nrp.c

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
// SPDX-License-Identifier: GPL-2.0
2+
#include <linux/ftrace.h>
3+
#include <linux/tracepoint.h>
4+
#include <linux/kernel.h>
5+
#include <linux/module.h>
6+
#include <linux/init.h>
7+
#include <linux/rv.h>
8+
#include <rv/instrumentation.h>
9+
#include <rv/da_monitor.h>
10+
11+
#define MODULE_NAME "nrp"
12+
13+
#include <trace/events/irq.h>
14+
#include <trace/events/sched.h>
15+
#include <rv_trace.h>
16+
#include <monitors/sched/sched.h>
17+
18+
#include "nrp.h"
19+
20+
static struct rv_monitor rv_nrp;
21+
DECLARE_DA_MON_PER_TASK(nrp, unsigned char);
22+
23+
#ifdef CONFIG_X86_LOCAL_APIC
24+
#include <asm/trace/irq_vectors.h>
25+
26+
static void handle_vector_irq_entry(void *data, int vector)
27+
{
28+
da_handle_event_nrp(current, irq_entry_nrp);
29+
}
30+
31+
static void attach_vector_irq(void)
32+
{
33+
rv_attach_trace_probe("nrp", local_timer_entry, handle_vector_irq_entry);
34+
if (IS_ENABLED(CONFIG_IRQ_WORK))
35+
rv_attach_trace_probe("nrp", irq_work_entry, handle_vector_irq_entry);
36+
if (IS_ENABLED(CONFIG_SMP)) {
37+
rv_attach_trace_probe("nrp", reschedule_entry, handle_vector_irq_entry);
38+
rv_attach_trace_probe("nrp", call_function_entry, handle_vector_irq_entry);
39+
rv_attach_trace_probe("nrp", call_function_single_entry, handle_vector_irq_entry);
40+
}
41+
}
42+
43+
static void detach_vector_irq(void)
44+
{
45+
rv_detach_trace_probe("nrp", local_timer_entry, handle_vector_irq_entry);
46+
if (IS_ENABLED(CONFIG_IRQ_WORK))
47+
rv_detach_trace_probe("nrp", irq_work_entry, handle_vector_irq_entry);
48+
if (IS_ENABLED(CONFIG_SMP)) {
49+
rv_detach_trace_probe("nrp", reschedule_entry, handle_vector_irq_entry);
50+
rv_detach_trace_probe("nrp", call_function_entry, handle_vector_irq_entry);
51+
rv_detach_trace_probe("nrp", call_function_single_entry, handle_vector_irq_entry);
52+
}
53+
}
54+
55+
#else
56+
/* We assume irq_entry tracepoints are sufficient on other architectures */
57+
static void attach_vector_irq(void) { }
58+
static void detach_vector_irq(void) { }
59+
#endif
60+
61+
static void handle_irq_entry(void *data, int irq, struct irqaction *action)
62+
{
63+
da_handle_event_nrp(current, irq_entry_nrp);
64+
}
65+
66+
static void handle_sched_need_resched(void *data, struct task_struct *tsk,
67+
int cpu, int tif)
68+
{
69+
/*
70+
* Although need_resched leads to both the rescheduling and preempt_irq
71+
* states, it is safer to start the monitor always in preempt_irq,
72+
* which may not mirror the system state but makes the monitor simpler,
73+
*/
74+
if (tif == TIF_NEED_RESCHED)
75+
da_handle_start_event_nrp(tsk, sched_need_resched_nrp);
76+
}
77+
78+
static void handle_schedule_entry(void *data, bool preempt)
79+
{
80+
if (preempt)
81+
da_handle_event_nrp(current, schedule_entry_preempt_nrp);
82+
else
83+
da_handle_event_nrp(current, schedule_entry_nrp);
84+
}
85+
86+
static int enable_nrp(void)
87+
{
88+
int retval;
89+
90+
retval = da_monitor_init_nrp();
91+
if (retval)
92+
return retval;
93+
94+
rv_attach_trace_probe("nrp", irq_handler_entry, handle_irq_entry);
95+
rv_attach_trace_probe("nrp", sched_set_need_resched_tp, handle_sched_need_resched);
96+
rv_attach_trace_probe("nrp", sched_entry_tp, handle_schedule_entry);
97+
attach_vector_irq();
98+
99+
return 0;
100+
}
101+
102+
static void disable_nrp(void)
103+
{
104+
rv_nrp.enabled = 0;
105+
106+
rv_detach_trace_probe("nrp", irq_handler_entry, handle_irq_entry);
107+
rv_detach_trace_probe("nrp", sched_set_need_resched_tp, handle_sched_need_resched);
108+
rv_detach_trace_probe("nrp", sched_entry_tp, handle_schedule_entry);
109+
detach_vector_irq();
110+
111+
da_monitor_destroy_nrp();
112+
}
113+
114+
static struct rv_monitor rv_nrp = {
115+
.name = "nrp",
116+
.description = "need resched preempts.",
117+
.enable = enable_nrp,
118+
.disable = disable_nrp,
119+
.reset = da_monitor_reset_all_nrp,
120+
.enabled = 0,
121+
};
122+
123+
static int __init register_nrp(void)
124+
{
125+
return rv_register_monitor(&rv_nrp, &rv_sched);
126+
}
127+
128+
static void __exit unregister_nrp(void)
129+
{
130+
rv_unregister_monitor(&rv_nrp);
131+
}
132+
133+
module_init(register_nrp);
134+
module_exit(unregister_nrp);
135+
136+
MODULE_LICENSE("GPL");
137+
MODULE_AUTHOR("Gabriele Monaco <[email protected]>");
138+
MODULE_DESCRIPTION("nrp: need resched preempts.");

0 commit comments

Comments
 (0)