Skip to content

Commit bf76f23

Browse files
committed
Merge tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: "Core scheduler changes: - Better tracking of maximum lag of tasks in presence of different slices duration, for better handling of lag in the fair scheduler (Vincent Guittot) - Clean up and standardize #if/#else/#endif markers throughout the entire scheduler code base (Ingo Molnar) - Make SMP unconditional: build the SMP scheduler's data structures and logic on UP kernel too, even though they are not used, to simplify the scheduler and remove around 200 #ifdef/[#else]/#endif blocks from the scheduler (Ingo Molnar) - Reorganize cgroup bandwidth control interface handling for better interfacing with sched_ext (Tejun Heo) Balancing: - Bump sd->max_newidle_lb_cost when newidle balance fails (Chris Mason) - Remove sched_domain_topology_level::flags to simplify the code (Prateek Nayak) - Simplify and clean up build_sched_topology() (Li Chen) - Optimize build_sched_topology() on large machines (Li Chen) Real-time scheduling: - Add initial version of proxy execution: a mechanism for mutex-owning tasks to inherit the scheduling context of higher priority waiters. Currently limited to a single runqueue and conditional on CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra, Valentin Schneider) - Deadline scheduler (Juri Lelli): - Fix dl_servers initialization order (Juri Lelli) - Fix DL scheduler's root domain reinitialization logic (Juri Lelli) - Fix accounting bugs after global limits change (Juri Lelli) - Fix scalability regression by implementing less agressive dl_server handling (Peter Zijlstra) PSI: - Improve scalability by optimizing psi_group_change() cpu_clock() usage (Peter Zijlstra) Rust changes: - Make Task, CondVar and PollCondVar methods inline to avoid unnecessary function calls (Kunwu Chan, Panagiotis Foliadis) - Add might_sleep() support for Rust code: Rust's "#[track_caller]" mechanism is used so that Rust's might_sleep() doesn't need to be defined as a macro (Fujita Tomonori) - Introduce file_from_location() (Boqun Feng) Debugging & instrumentation: - Make clangd usable with scheduler source code files again (Peter Zijlstra) - tools: Add root_domains_dump.py which dumps root domains info (Juri Lelli) - tools: Add dl_bw_dump.py for printing bandwidth accounting info (Juri Lelli) Misc cleanups & fixes: - Remove play_idle() (Feng Lee) - Fix check_preemption_disabled() (Sebastian Andrzej Siewior) - Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis Claudio R. Goncalves) - Correct the comment in place_entity() (wang wei)" * tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits) sched/idle: Remove play_idle() sched: Do not call __put_task_struct() on rt if pi_blocked_on is set sched: Start blocked_on chain processing in find_proxy_task() sched: Fix proxy/current (push,pull)ability sched: Add an initial sketch of the find_proxy_task() function sched: Fix runtime accounting w/ split exec & sched contexts sched: Move update_curr_task logic into update_curr_se locking/mutex: Add p->blocked_on wrappers for correctness checks locking/mutex: Rework task_struct::blocked_on sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable sched/topology: Remove sched_domain_topology_level::flags x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled x86/smpboot: moves x86_topology to static initialize and truncate x86/smpboot: remove redundant CONFIG_SCHED_SMT smpboot: introduce SDTL_INIT() helper to tidy sched topology setup tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info tools/sched: Add root_domains_dump.py which dumps root domains info sched/deadline: Fix accounting after global limits change sched/deadline: Reset extra_bw to max_bw when clearing root domains sched/deadline: Initialize dl_servers after SMP ...
2 parents 14bed9b + 1b5f145 commit bf76f23

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+1472
-1463
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6410,6 +6410,11 @@
64106410
sa1100ir [NET]
64116411
See drivers/net/irda/sa1100_ir.c.
64126412

6413+
sched_proxy_exec= [KNL]
6414+
Enables or disables "proxy execution" style
6415+
solution to mutex-based priority inversion.
6416+
Format: <bool>
6417+
64136418
sched_verbose [KNL,EARLY] Enables verbose scheduler debug messages.
64146419

64156420
schedstats= [KNL,X86] Enable or disable scheduled statistics.

MAINTAINERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22319,6 +22319,7 @@ F: include/linux/wait.h
2231922319
F: include/uapi/linux/sched.h
2232022320
F: kernel/fork.c
2232122321
F: kernel/sched/
22322+
F: tools/sched/
2232222323

2232322324
SCHEDULER - SCHED_EXT
2232422325
R: Tejun Heo <[email protected]>

arch/powerpc/kernel/smp.c

Lines changed: 10 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1700,28 +1700,23 @@ static void __init build_sched_topology(void)
17001700
#ifdef CONFIG_SCHED_SMT
17011701
if (has_big_cores) {
17021702
pr_info("Big cores detected but using small core scheduling\n");
1703-
powerpc_topology[i++] = (struct sched_domain_topology_level){
1704-
smallcore_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT)
1705-
};
1703+
powerpc_topology[i++] =
1704+
SDTL_INIT(smallcore_smt_mask, powerpc_smt_flags, SMT);
17061705
} else {
1707-
powerpc_topology[i++] = (struct sched_domain_topology_level){
1708-
cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT)
1709-
};
1706+
powerpc_topology[i++] = SDTL_INIT(cpu_smt_mask, powerpc_smt_flags, SMT);
17101707
}
17111708
#endif
17121709
if (shared_caches) {
1713-
powerpc_topology[i++] = (struct sched_domain_topology_level){
1714-
shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE)
1715-
};
1710+
powerpc_topology[i++] =
1711+
SDTL_INIT(shared_cache_mask, powerpc_shared_cache_flags, CACHE);
17161712
}
1713+
17171714
if (has_coregroup_support()) {
1718-
powerpc_topology[i++] = (struct sched_domain_topology_level){
1719-
cpu_mc_mask, powerpc_shared_proc_flags, SD_INIT_NAME(MC)
1720-
};
1715+
powerpc_topology[i++] =
1716+
SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC);
17211717
}
1722-
powerpc_topology[i++] = (struct sched_domain_topology_level){
1723-
cpu_cpu_mask, powerpc_shared_proc_flags, SD_INIT_NAME(PKG)
1724-
};
1718+
1719+
powerpc_topology[i++] = SDTL_INIT(cpu_cpu_mask, powerpc_shared_proc_flags, PKG);
17251720

17261721
/* There must be one trailing NULL entry left. */
17271722
BUG_ON(i >= ARRAY_SIZE(powerpc_topology) - 1);

arch/s390/kernel/topology.c

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -531,11 +531,11 @@ static const struct cpumask *cpu_drawer_mask(int cpu)
531531
}
532532

533533
static struct sched_domain_topology_level s390_topology[] = {
534-
{ cpu_thread_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
535-
{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
536-
{ cpu_book_mask, SD_INIT_NAME(BOOK) },
537-
{ cpu_drawer_mask, SD_INIT_NAME(DRAWER) },
538-
{ cpu_cpu_mask, SD_INIT_NAME(PKG) },
534+
SDTL_INIT(cpu_thread_mask, cpu_smt_flags, SMT),
535+
SDTL_INIT(cpu_coregroup_mask, cpu_core_flags, MC),
536+
SDTL_INIT(cpu_book_mask, NULL, BOOK),
537+
SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
538+
SDTL_INIT(cpu_cpu_mask, NULL, PKG),
539539
{ NULL, },
540540
};
541541

arch/x86/kernel/smpboot.c

Lines changed: 24 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -478,44 +478,41 @@ static int x86_cluster_flags(void)
478478
*/
479479
static bool x86_has_numa_in_package;
480480

481-
static struct sched_domain_topology_level x86_topology[6];
482-
483-
static void __init build_sched_topology(void)
484-
{
485-
int i = 0;
486-
487-
#ifdef CONFIG_SCHED_SMT
488-
x86_topology[i++] = (struct sched_domain_topology_level){
489-
cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT)
490-
};
491-
#endif
481+
static struct sched_domain_topology_level x86_topology[] = {
482+
SDTL_INIT(cpu_smt_mask, cpu_smt_flags, SMT),
492483
#ifdef CONFIG_SCHED_CLUSTER
493-
x86_topology[i++] = (struct sched_domain_topology_level){
494-
cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS)
495-
};
484+
SDTL_INIT(cpu_clustergroup_mask, x86_cluster_flags, CLS),
496485
#endif
497486
#ifdef CONFIG_SCHED_MC
498-
x86_topology[i++] = (struct sched_domain_topology_level){
499-
cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC)
500-
};
487+
SDTL_INIT(cpu_coregroup_mask, x86_core_flags, MC),
501488
#endif
489+
SDTL_INIT(cpu_cpu_mask, x86_sched_itmt_flags, PKG),
490+
{ NULL },
491+
};
492+
493+
static void __init build_sched_topology(void)
494+
{
495+
struct sched_domain_topology_level *topology = x86_topology;
496+
502497
/*
503-
* When there is NUMA topology inside the package skip the PKG domain
504-
* since the NUMA domains will auto-magically create the right spanning
505-
* domains based on the SLIT.
498+
* When there is NUMA topology inside the package invalidate the
499+
* PKG domain since the NUMA domains will auto-magically create the
500+
* right spanning domains based on the SLIT.
506501
*/
507-
if (!x86_has_numa_in_package) {
508-
x86_topology[i++] = (struct sched_domain_topology_level){
509-
cpu_cpu_mask, x86_sched_itmt_flags, SD_INIT_NAME(PKG)
510-
};
502+
if (x86_has_numa_in_package) {
503+
unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
504+
505+
memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
511506
}
512507

513508
/*
514-
* There must be one trailing NULL entry left.
509+
* Drop the SMT domains if there is only one thread per-core
510+
* since it'll get degenerated by the scheduler anyways.
515511
*/
516-
BUG_ON(i >= ARRAY_SIZE(x86_topology)-1);
512+
if (cpu_smt_num_threads <= 1)
513+
++topology;
517514

518-
set_sched_topology(x86_topology);
515+
set_sched_topology(topology);
519516
}
520517

521518
void set_cpu_sibling_map(int cpu)

include/linux/cpu.h

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -187,11 +187,6 @@ static inline void arch_cpu_finalize_init(void) { }
187187

188188
void play_idle_precise(u64 duration_ns, u64 latency_ns);
189189

190-
static inline void play_idle(unsigned long duration_us)
191-
{
192-
play_idle_precise(duration_us * NSEC_PER_USEC, U64_MAX);
193-
}
194-
195190
#ifdef CONFIG_HOTPLUG_CPU
196191
void cpuhp_report_idle_dead(void);
197192
#else

include/linux/preempt.h

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -369,8 +369,6 @@ static inline void preempt_notifier_init(struct preempt_notifier *notifier,
369369

370370
#endif
371371

372-
#ifdef CONFIG_SMP
373-
374372
/*
375373
* Migrate-Disable and why it is undesired.
376374
*
@@ -429,13 +427,6 @@ static inline void preempt_notifier_init(struct preempt_notifier *notifier,
429427
extern void migrate_disable(void);
430428
extern void migrate_enable(void);
431429

432-
#else
433-
434-
static inline void migrate_disable(void) { }
435-
static inline void migrate_enable(void) { }
436-
437-
#endif /* CONFIG_SMP */
438-
439430
/**
440431
* preempt_disable_nested - Disable preemption inside a normally preempt disabled section
441432
*

include/linux/psi_types.h

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -84,11 +84,9 @@ enum psi_aggregators {
8484
struct psi_group_cpu {
8585
/* 1st cacheline updated by the scheduler */
8686

87-
/* Aggregator needs to know of concurrent changes */
88-
seqcount_t seq ____cacheline_aligned_in_smp;
89-
9087
/* States of the tasks belonging to this group */
91-
unsigned int tasks[NR_PSI_TASK_COUNTS];
88+
unsigned int tasks[NR_PSI_TASK_COUNTS]
89+
____cacheline_aligned_in_smp;
9290

9391
/* Aggregate pressure state derived from the tasks */
9492
u32 state_mask;

0 commit comments

Comments
 (0)