Skip to content

Commit 2004cef

Browse files
committed
Merge tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: - Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot de Oliveira's last major contribution to the kernel: "SCHED_DEADLINE servers can help fixing starvation issues of low priority tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU cycles. Today we have RT Throttling; DEADLINE servers should be able to replace and improve that." (Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes, Youssef Esmat, Huang Shijie) - Preparatory changes for sched_ext integration: - Use set_next_task(.first) where required - Fix up set_next_task() implementations - Clean up DL server vs. core sched - Split up put_prev_task_balance() - Rework pick_next_task() - Combine the last put_prev_task() and the first set_next_task() - Rework dl_server - Add put_prev_task(.next) (Peter Zijlstra, with a fix by Tejun Heo) - Complete the EEVDF transition and refine EEVDF scheduling: - Implement delayed dequeue - Allow shorter slices to wakeup-preempt - Use sched_attr::sched_runtime to set request/slice suggestion - Document the new feature flags - Remove unused and duplicate-functionality fields - Simplify & unify pick_next_task_fair() - Misc debuggability enhancements (Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann, Valentin Schneider and Chuyi Zhou) - Initialize the vruntime of a new task when it is first enqueued, resulting in significant decrease in latency of newly woken tasks (Zhang Qiao) - Introduce SM_IDLE and an idle re-entry fast-path in __schedule() (K Prateek Nayak, Peter Zijlstra) - Clean up and clarify the usage of Clean up usage of rt_task() (Qais Yousef) - Preempt SCHED_IDLE entities in strict cgroup hierarchies (Tianchen Ding) - Clarify the documentation of time units for deadline scheduler parameters (Christian Loehle) - Remove the HZ_BW chicken-bit feature flag introduced a year ago, the original change seems to be working fine (Phil Auld) - Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie, Peilin He, Qais Yousefm and Vincent Guittot) * tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) sched/cpufreq: Use NSEC_PER_MSEC for deadline task cpufreq/cppc: Use NSEC_PER_MSEC for deadline task sched/deadline: Clarify nanoseconds in uapi sched/deadline: Convert schedtool example to chrt sched/debug: Fix the runnable tasks output sched: Fix sched_delayed vs sched_core kernel/sched: Fix util_est accounting for DELAY_DEQUEUE kthread: Fix task state in kthread worker if being frozen sched/pelt: Use rq_clock_task() for hw_pressure sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule() sched: Add put_prev_task(.next) sched: Rework dl_server sched: Combine the last put_prev_task() and the first set_next_task() sched: Rework pick_next_task() sched: Split up put_prev_task_balance() sched: Clean up DL server vs core sched sched: Fixup set_next_task() implementations sched: Use set_next_task(.first) where required sched/fair: Properly deactivate sched_delayed task upon class change ...
2 parents 509d2cd + bc9057d commit 2004cef

File tree

32 files changed

+1695
-747
lines changed

32 files changed

+1695
-747
lines changed

Documentation/scheduler/sched-deadline.rst

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -749,21 +749,19 @@ Appendix A. Test suite
749749
of the command line options. Please refer to rt-app documentation for more
750750
details (`<rt-app-sources>/doc/*.json`).
751751

752-
The second testing application is a modification of schedtool, called
753-
schedtool-dl, which can be used to setup SCHED_DEADLINE parameters for a
754-
certain pid/application. schedtool-dl is available at:
755-
https://github.com/scheduler-tools/schedtool-dl.git.
752+
The second testing application is done using chrt which has support
753+
for SCHED_DEADLINE.
756754

757755
The usage is straightforward::
758756

759-
# schedtool -E -t 10000000:100000000 -e ./my_cpuhog_app
757+
# chrt -d -T 10000000 -D 100000000 0 ./my_cpuhog_app
760758

761759
With this, my_cpuhog_app is put to run inside a SCHED_DEADLINE reservation
762-
of 10ms every 100ms (note that parameters are expressed in microseconds).
763-
You can also use schedtool to create a reservation for an already running
760+
of 10ms every 100ms (note that parameters are expressed in nanoseconds).
761+
You can also use chrt to create a reservation for an already running
764762
application, given that you know its pid::
765763

766-
# schedtool -E -t 10000000:100000000 my_app_pid
764+
# chrt -d -T 10000000 -D 100000000 -p 0 my_app_pid
767765

768766
Appendix B. Minimal main()
769767
==========================

drivers/cpufreq/cppc_cpufreq.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -224,9 +224,9 @@ static void __init cppc_freq_invariance_init(void)
224224
* Fake (unused) bandwidth; workaround to "fix"
225225
* priority inheritance.
226226
*/
227-
.sched_runtime = 1000000,
228-
.sched_deadline = 10000000,
229-
.sched_period = 10000000,
227+
.sched_runtime = NSEC_PER_MSEC,
228+
.sched_deadline = 10 * NSEC_PER_MSEC,
229+
.sched_period = 10 * NSEC_PER_MSEC,
230230
};
231231
int ret;
232232

fs/bcachefs/six.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ static inline bool six_owner_running(struct six_lock *lock)
335335
*/
336336
rcu_read_lock();
337337
struct task_struct *owner = READ_ONCE(lock->owner);
338-
bool ret = owner ? owner_on_cpu(owner) : !rt_task(current);
338+
bool ret = owner ? owner_on_cpu(owner) : !rt_or_dl_task(current);
339339
rcu_read_unlock();
340340

341341
return ret;

fs/proc/base.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2626,7 +2626,7 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
26262626
}
26272627

26282628
task_lock(p);
2629-
if (task_is_realtime(p))
2629+
if (rt_or_dl_task_policy(p))
26302630
slack_ns = 0;
26312631
else if (slack_ns == 0)
26322632
slack_ns = p->default_timer_slack_ns;

include/linux/ioprio.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
4040
{
4141
if (task->policy == SCHED_IDLE)
4242
return IOPRIO_CLASS_IDLE;
43-
else if (task_is_realtime(task))
43+
else if (rt_or_dl_task_policy(task))
4444
return IOPRIO_CLASS_RT;
4545
else
4646
return IOPRIO_CLASS_BE;

include/linux/sched.h

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -149,8 +149,9 @@ struct user_event_mm;
149149
* Special states are those that do not use the normal wait-loop pattern. See
150150
* the comment with set_special_state().
151151
*/
152-
#define is_special_task_state(state) \
153-
((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | TASK_DEAD))
152+
#define is_special_task_state(state) \
153+
((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | \
154+
TASK_DEAD | TASK_FROZEN))
154155

155156
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
156157
# define debug_normal_state_change(state_value) \
@@ -541,9 +542,14 @@ struct sched_entity {
541542
struct rb_node run_node;
542543
u64 deadline;
543544
u64 min_vruntime;
545+
u64 min_slice;
544546

545547
struct list_head group_node;
546-
unsigned int on_rq;
548+
unsigned char on_rq;
549+
unsigned char sched_delayed;
550+
unsigned char rel_deadline;
551+
unsigned char custom_slice;
552+
/* hole */
547553

548554
u64 exec_start;
549555
u64 sum_exec_runtime;
@@ -639,12 +645,26 @@ struct sched_dl_entity {
639645
*
640646
* @dl_overrun tells if the task asked to be informed about runtime
641647
* overruns.
648+
*
649+
* @dl_server tells if this is a server entity.
650+
*
651+
* @dl_defer tells if this is a deferred or regular server. For
652+
* now only defer server exists.
653+
*
654+
* @dl_defer_armed tells if the deferrable server is waiting
655+
* for the replenishment timer to activate it.
656+
*
657+
* @dl_defer_running tells if the deferrable server is actually
658+
* running, skipping the defer phase.
642659
*/
643660
unsigned int dl_throttled : 1;
644661
unsigned int dl_yielded : 1;
645662
unsigned int dl_non_contending : 1;
646663
unsigned int dl_overrun : 1;
647664
unsigned int dl_server : 1;
665+
unsigned int dl_defer : 1;
666+
unsigned int dl_defer_armed : 1;
667+
unsigned int dl_defer_running : 1;
648668

649669
/*
650670
* Bandwidth enforcement timer. Each -deadline task has its
@@ -672,7 +692,7 @@ struct sched_dl_entity {
672692
*/
673693
struct rq *rq;
674694
dl_server_has_tasks_f server_has_tasks;
675-
dl_server_pick_f server_pick;
695+
dl_server_pick_f server_pick_task;
676696

677697
#ifdef CONFIG_RT_MUTEXES
678698
/*

include/linux/sched/deadline.h

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,16 @@
1010

1111
#include <linux/sched.h>
1212

13-
#define MAX_DL_PRIO 0
14-
15-
static inline int dl_prio(int prio)
13+
static inline bool dl_prio(int prio)
1614
{
17-
if (unlikely(prio < MAX_DL_PRIO))
18-
return 1;
19-
return 0;
15+
return unlikely(prio < MAX_DL_PRIO);
2016
}
2117

22-
static inline int dl_task(struct task_struct *p)
18+
/*
19+
* Returns true if a task has a priority that belongs to DL class. PI-boosted
20+
* tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
21+
*/
22+
static inline bool dl_task(struct task_struct *p)
2323
{
2424
return dl_prio(p->prio);
2525
}

include/linux/sched/prio.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
*/
1515

1616
#define MAX_RT_PRIO 100
17+
#define MAX_DL_PRIO 0
1718

1819
#define MAX_PRIO (MAX_RT_PRIO + NICE_WIDTH)
1920
#define DEFAULT_PRIO (MAX_RT_PRIO + NICE_WIDTH / 2)

include/linux/sched/rt.h

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,40 @@
66

77
struct task_struct;
88

9-
static inline int rt_prio(int prio)
9+
static inline bool rt_prio(int prio)
1010
{
11-
if (unlikely(prio < MAX_RT_PRIO))
12-
return 1;
13-
return 0;
11+
return unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO);
1412
}
1513

16-
static inline int rt_task(struct task_struct *p)
14+
static inline bool rt_or_dl_prio(int prio)
15+
{
16+
return unlikely(prio < MAX_RT_PRIO);
17+
}
18+
19+
/*
20+
* Returns true if a task has a priority that belongs to RT class. PI-boosted
21+
* tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
22+
*/
23+
static inline bool rt_task(struct task_struct *p)
1724
{
1825
return rt_prio(p->prio);
1926
}
2027

21-
static inline bool task_is_realtime(struct task_struct *tsk)
28+
/*
29+
* Returns true if a task has a priority that belongs to RT or DL classes.
30+
* PI-boosted tasks will return true. Use rt_or_dl_task_policy() to ignore
31+
* PI-boosted tasks.
32+
*/
33+
static inline bool rt_or_dl_task(struct task_struct *p)
34+
{
35+
return rt_or_dl_prio(p->prio);
36+
}
37+
38+
/*
39+
* Returns true if a task has a policy that belongs to RT or DL classes.
40+
* PI-boosted tasks will return false.
41+
*/
42+
static inline bool rt_or_dl_task_policy(struct task_struct *tsk)
2243
{
2344
int policy = tsk->policy;
2445

include/uapi/linux/sched/types.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,9 +58,9 @@
5858
*
5959
* This is reflected by the following fields of the sched_attr structure:
6060
*
61-
* @sched_deadline representative of the task's deadline
62-
* @sched_runtime representative of the task's runtime
63-
* @sched_period representative of the task's period
61+
* @sched_deadline representative of the task's deadline in nanoseconds
62+
* @sched_runtime representative of the task's runtime in nanoseconds
63+
* @sched_period representative of the task's period in nanoseconds
6464
*
6565
* Given this task model, there are a multiplicity of scheduling algorithms
6666
* and policies, that can be used to ensure all the tasks will make their

0 commit comments

Comments
 (0)