Skip to content

Commit de10553

Browse files
committed
Merge tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner: - Fix the incorrect handling of atomic offset updates in reserve_eilvt_offset() The check for the return value of atomic_cmpxchg() is not compared against the old value, it is compared against the new value, which makes it two round on success. Convert it to atomic_try_cmpxchg() which does the right thing. - Handle IO/APIC less systems correctly When IO/APIC is not advertised by ACPI then the computation of the lower bound for dynamically allocated interrupts like MSI goes wrong. This lower bound is used to exclude the IO/APIC legacy GSI space as that must stay reserved for the legacy interrupts. In case that the system, e.g. VM, does not advertise an IO/APIC the lower bound stays at 0. 0 is an invalid interrupt number except for the legacy timer interrupt on x86. The return value is unchecked in the core code, so it ends up to allocate interrupt number 0 which is subsequently considered to be invalid by the caller, e.g. the MSI allocation code. A similar problem was already cured for device tree based systems years ago, but that missed - or did not envision - the zero IO/APIC case. Consolidate the zero check and return the provided "from" argument to the core code call site, which is guaranteed to be greater than 0. - Simplify the X2APIC cluster CPU mask logic for CPU hotplug Per cluster CPU masks are required for X2APIC in cluster mode to determine the correct cluster for a target CPU when calculating the destination for IPIs These masks are established when CPUs are borught up. The first CPU in a cluster must allocate a new cluster CPU mask. As this happens during the early startup of a CPU, where memory allocations cannot be done, the mask has to be allocated by the control CPU. The current implementation allocates a clustermask just in case and if the to be brought up CPU is the first in a cluster the CPU takes over this allocation from a global pointer. This works nicely in the fully serialized CPU bringup scenario which is used today, but would fail completely for parallel bringup of CPUs. The cluster association of a CPU can be computed from the APIC ID which is enumerated by ACPI/MADT. So the cluster CPU masks can be preallocated and associated upfront and the upcoming CPUs just need to set their corresponding bit. Aside of preparing for parallel bringup this is a valuable simplification on its own. - Remove global variables which control the early startup of secondary CPUs on 64-bit The only information which is needed by a starting CPU is the Linux CPU number. The CPU number allows it to retrieve the rest of the required data from already existing per CPU storage. So instead of initial_stack, early_gdt_desciptor and initial_gs provide a new variable smpboot_control which contains the Linux CPU number for now. The starting CPU can retrieve and compute all required information for startup from there. Aside of being a cleanup, this is also preparing for parallel CPU bringup, where starting CPUs will look up their Linux CPU number via the APIC ID, when smpboot_control has the corresponding control bit set. - Make cc_vendor globally accesible Subsequent parallel bringup changes require access to cc_vendor because confidental computing platforms need special treatment in the early startup phase vs. CPUID and APCI ID readouts. The change makes cc_vendor global and provides stub accessors in case that CONFIG_ARCH_HAS_CC_PLATFORM is not set. This was merged from the x86/cc branch in anticipation of further parallel bringup commits which require access to cc_vendor. Due to late discoveries of fundamental issue with those patches these commits never happened. The merge commit is unfortunately in the middle of the APIC commits so unraveling it would have required a rebase or revert. As the parallel bringup seems to be well on its way for 6.5 this would be just pointless churn. As the commit does not contain any functional change it's not a risk to keep it. * tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Don't return 0 from arch_dynirq_lower_bound() x86/apic: Fix atomic update of offset in reserve_eilvt_offset() x86/coco: Export cc_vendor x86/smpboot: Reference count on smpboot_setup_warm_reset_vector() x86/smpboot: Remove initial_gs x86/smpboot: Remove early_gdt_descr on 64-bit x86/smpboot: Remove initial_stack on 64-bit x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
2 parents e798978 + 5af507b commit de10553

File tree

13 files changed

+205
-116
lines changed

13 files changed

+205
-116
lines changed

arch/x86/coco/core.c

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
#include <asm/coco.h>
1414
#include <asm/processor.h>
1515

16-
static enum cc_vendor vendor __ro_after_init;
16+
enum cc_vendor cc_vendor __ro_after_init;
1717
static u64 cc_mask __ro_after_init;
1818

1919
static bool intel_cc_platform_has(enum cc_attr attr)
@@ -99,7 +99,7 @@ static bool amd_cc_platform_has(enum cc_attr attr)
9999

100100
bool cc_platform_has(enum cc_attr attr)
101101
{
102-
switch (vendor) {
102+
switch (cc_vendor) {
103103
case CC_VENDOR_AMD:
104104
return amd_cc_platform_has(attr);
105105
case CC_VENDOR_INTEL:
@@ -119,7 +119,7 @@ u64 cc_mkenc(u64 val)
119119
* - for AMD, bit *set* means the page is encrypted
120120
* - for AMD with vTOM and for Intel, *clear* means encrypted
121121
*/
122-
switch (vendor) {
122+
switch (cc_vendor) {
123123
case CC_VENDOR_AMD:
124124
if (sev_status & MSR_AMD64_SNP_VTOM)
125125
return val & ~cc_mask;
@@ -135,7 +135,7 @@ u64 cc_mkenc(u64 val)
135135
u64 cc_mkdec(u64 val)
136136
{
137137
/* See comment in cc_mkenc() */
138-
switch (vendor) {
138+
switch (cc_vendor) {
139139
case CC_VENDOR_AMD:
140140
if (sev_status & MSR_AMD64_SNP_VTOM)
141141
return val | cc_mask;
@@ -149,11 +149,6 @@ u64 cc_mkdec(u64 val)
149149
}
150150
EXPORT_SYMBOL_GPL(cc_mkdec);
151151

152-
__init void cc_set_vendor(enum cc_vendor v)
153-
{
154-
vendor = v;
155-
}
156-
157152
__init void cc_set_mask(u64 mask)
158153
{
159154
cc_mask = mask;

arch/x86/include/asm/coco.h

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,30 @@ enum cc_vendor {
1010
CC_VENDOR_INTEL,
1111
};
1212

13-
void cc_set_vendor(enum cc_vendor v);
14-
void cc_set_mask(u64 mask);
15-
1613
#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
14+
extern enum cc_vendor cc_vendor;
15+
16+
static inline enum cc_vendor cc_get_vendor(void)
17+
{
18+
return cc_vendor;
19+
}
20+
21+
static inline void cc_set_vendor(enum cc_vendor vendor)
22+
{
23+
cc_vendor = vendor;
24+
}
25+
26+
void cc_set_mask(u64 mask);
1727
u64 cc_mkenc(u64 val);
1828
u64 cc_mkdec(u64 val);
1929
#else
30+
static inline enum cc_vendor cc_get_vendor(void)
31+
{
32+
return CC_VENDOR_NONE;
33+
}
34+
35+
static inline void cc_set_vendor(enum cc_vendor vendor) { }
36+
2037
static inline u64 cc_mkenc(u64 val)
2138
{
2239
return val;

arch/x86/include/asm/processor.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -647,7 +647,11 @@ static inline void spin_lock_prefetch(const void *x)
647647
#define KSTK_ESP(task) (task_pt_regs(task)->sp)
648648

649649
#else
650-
#define INIT_THREAD { }
650+
extern unsigned long __end_init_task[];
651+
652+
#define INIT_THREAD { \
653+
.sp = (unsigned long)&__end_init_task - sizeof(struct pt_regs), \
654+
}
651655

652656
extern unsigned long KSTK_ESP(struct task_struct *task);
653657

arch/x86/include/asm/realmode.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,6 @@ extern struct real_mode_header *real_mode_header;
5959
extern unsigned char real_mode_blob_end[];
6060

6161
extern unsigned long initial_code;
62-
extern unsigned long initial_gs;
6362
extern unsigned long initial_stack;
6463
#ifdef CONFIG_AMD_MEM_ENCRYPT
6564
extern unsigned long initial_vc_handler;

arch/x86/include/asm/smp.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,5 +199,8 @@ extern void nmi_selftest(void);
199199
#define nmi_selftest() do { } while (0)
200200
#endif
201201

202-
#endif /* __ASSEMBLY__ */
202+
extern unsigned int smpboot_control;
203+
204+
#endif /* !__ASSEMBLY__ */
205+
203206
#endif /* _ASM_X86_SMP_H */

arch/x86/kernel/acpi/sleep.c

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -111,13 +111,26 @@ int x86_acpi_suspend_lowlevel(void)
111111
saved_magic = 0x12345678;
112112
#else /* CONFIG_64BIT */
113113
#ifdef CONFIG_SMP
114-
initial_stack = (unsigned long)temp_stack + sizeof(temp_stack);
115-
early_gdt_descr.address =
116-
(unsigned long)get_cpu_gdt_rw(smp_processor_id());
117-
initial_gs = per_cpu_offset(smp_processor_id());
114+
/*
115+
* As each CPU starts up, it will find its own stack pointer
116+
* from its current_task->thread.sp. Typically that will be
117+
* the idle thread for a newly-started AP, or even the boot
118+
* CPU which will find it set to &init_task in the static
119+
* per-cpu data.
120+
*
121+
* Make the resuming CPU use the temporary stack at startup
122+
* by setting current->thread.sp to point to that. The true
123+
* %rsp will be restored with the rest of the CPU context,
124+
* by do_suspend_lowlevel(). And unwinders don't care about
125+
* the abuse of ->thread.sp because it's a dead variable
126+
* while the thread is running on the CPU anyway; the true
127+
* value is in the actual %rsp register.
128+
*/
129+
current->thread.sp = (unsigned long)temp_stack + sizeof(temp_stack);
130+
smpboot_control = smp_processor_id();
118131
#endif
119132
initial_code = (unsigned long)wakeup_long64;
120-
saved_magic = 0x123456789abcdef0L;
133+
saved_magic = 0x123456789abcdef0L;
121134
#endif /* CONFIG_64BIT */
122135

123136
/*

arch/x86/kernel/apic/apic.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -422,10 +422,9 @@ static unsigned int reserve_eilvt_offset(int offset, unsigned int new)
422422
if (vector && !eilvt_entry_is_changeable(vector, new))
423423
/* may not change if vectors are different */
424424
return rsvd;
425-
rsvd = atomic_cmpxchg(&eilvt_offsets[offset], rsvd, new);
426-
} while (rsvd != new);
425+
} while (!atomic_try_cmpxchg(&eilvt_offsets[offset], &rsvd, new));
427426

428-
rsvd &= ~APIC_EILVT_MASKED;
427+
rsvd = new & ~APIC_EILVT_MASKED;
429428
if (rsvd && rsvd != vector)
430429
pr_info("LVT offset %d assigned for vector 0x%02x\n",
431430
offset, rsvd);

arch/x86/kernel/apic/io_apic.c

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2478,17 +2478,21 @@ static int io_apic_get_redir_entries(int ioapic)
24782478

24792479
unsigned int arch_dynirq_lower_bound(unsigned int from)
24802480
{
2481+
unsigned int ret;
2482+
24812483
/*
24822484
* dmar_alloc_hwirq() may be called before setup_IO_APIC(), so use
24832485
* gsi_top if ioapic_dynirq_base hasn't been initialized yet.
24842486
*/
2485-
if (!ioapic_initialized)
2486-
return gsi_top;
2487+
ret = ioapic_dynirq_base ? : gsi_top;
2488+
24872489
/*
2488-
* For DT enabled machines ioapic_dynirq_base is irrelevant and not
2489-
* updated. So simply return @from if ioapic_dynirq_base == 0.
2490+
* For DT enabled machines ioapic_dynirq_base is irrelevant and
2491+
* always 0. gsi_top can be 0 if there is no IO/APIC registered.
2492+
* 0 is an invalid interrupt number for dynamic allocations. Return
2493+
* @from instead.
24902494
*/
2491-
return ioapic_dynirq_base ? : from;
2495+
return ret ? : from;
24922496
}
24932497

24942498
#ifdef CONFIG_X86_32

arch/x86/kernel/apic/x2apic_cluster.c

Lines changed: 82 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,7 @@
99

1010
#include "local.h"
1111

12-
struct cluster_mask {
13-
unsigned int clusterid;
14-
int node;
15-
struct cpumask mask;
16-
};
12+
#define apic_cluster(apicid) ((apicid) >> 4)
1713

1814
/*
1915
* __x2apic_send_IPI_mask() possibly needs to read
@@ -23,8 +19,7 @@ struct cluster_mask {
2319
static u32 *x86_cpu_to_logical_apicid __read_mostly;
2420

2521
static DEFINE_PER_CPU(cpumask_var_t, ipi_mask);
26-
static DEFINE_PER_CPU_READ_MOSTLY(struct cluster_mask *, cluster_masks);
27-
static struct cluster_mask *cluster_hotplug_mask;
22+
static DEFINE_PER_CPU_READ_MOSTLY(struct cpumask *, cluster_masks);
2823

2924
static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
3025
{
@@ -60,18 +55,18 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
6055

6156
/* Collapse cpus in a cluster so a single IPI per cluster is sent */
6257
for_each_cpu(cpu, tmpmsk) {
63-
struct cluster_mask *cmsk = per_cpu(cluster_masks, cpu);
58+
struct cpumask *cmsk = per_cpu(cluster_masks, cpu);
6459

6560
dest = 0;
66-
for_each_cpu_and(clustercpu, tmpmsk, &cmsk->mask)
61+
for_each_cpu_and(clustercpu, tmpmsk, cmsk)
6762
dest |= x86_cpu_to_logical_apicid[clustercpu];
6863

6964
if (!dest)
7065
continue;
7166

7267
__x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
7368
/* Remove cluster CPUs from tmpmask */
74-
cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
69+
cpumask_andnot(tmpmsk, tmpmsk, cmsk);
7570
}
7671

7772
local_irq_restore(flags);
@@ -105,55 +100,98 @@ static u32 x2apic_calc_apicid(unsigned int cpu)
105100

106101
static void init_x2apic_ldr(void)
107102
{
108-
struct cluster_mask *cmsk = this_cpu_read(cluster_masks);
109-
u32 cluster, apicid = apic_read(APIC_LDR);
110-
unsigned int cpu;
103+
struct cpumask *cmsk = this_cpu_read(cluster_masks);
111104

112-
x86_cpu_to_logical_apicid[smp_processor_id()] = apicid;
105+
BUG_ON(!cmsk);
113106

114-
if (cmsk)
115-
goto update;
116-
117-
cluster = apicid >> 16;
118-
for_each_online_cpu(cpu) {
119-
cmsk = per_cpu(cluster_masks, cpu);
120-
/* Matching cluster found. Link and update it. */
121-
if (cmsk && cmsk->clusterid == cluster)
122-
goto update;
107+
cpumask_set_cpu(smp_processor_id(), cmsk);
108+
}
109+
110+
/*
111+
* As an optimisation during boot, set the cluster_mask for all present
112+
* CPUs at once, to prevent each of them having to iterate over the others
113+
* to find the existing cluster_mask.
114+
*/
115+
static void prefill_clustermask(struct cpumask *cmsk, unsigned int cpu, u32 cluster)
116+
{
117+
int cpu_i;
118+
119+
for_each_present_cpu(cpu_i) {
120+
struct cpumask **cpu_cmsk = &per_cpu(cluster_masks, cpu_i);
121+
u32 apicid = apic->cpu_present_to_apicid(cpu_i);
122+
123+
if (apicid == BAD_APICID || cpu_i == cpu || apic_cluster(apicid) != cluster)
124+
continue;
125+
126+
if (WARN_ON_ONCE(*cpu_cmsk == cmsk))
127+
continue;
128+
129+
BUG_ON(*cpu_cmsk);
130+
*cpu_cmsk = cmsk;
123131
}
124-
cmsk = cluster_hotplug_mask;
125-
cmsk->clusterid = cluster;
126-
cluster_hotplug_mask = NULL;
127-
update:
128-
this_cpu_write(cluster_masks, cmsk);
129-
cpumask_set_cpu(smp_processor_id(), &cmsk->mask);
130132
}
131133

132-
static int alloc_clustermask(unsigned int cpu, int node)
134+
static int alloc_clustermask(unsigned int cpu, u32 cluster, int node)
133135
{
136+
struct cpumask *cmsk = NULL;
137+
unsigned int cpu_i;
138+
139+
/*
140+
* At boot time, the CPU present mask is stable. The cluster mask is
141+
* allocated for the first CPU in the cluster and propagated to all
142+
* present siblings in the cluster. If the cluster mask is already set
143+
* on entry to this function for a given CPU, there is nothing to do.
144+
*/
134145
if (per_cpu(cluster_masks, cpu))
135146
return 0;
147+
148+
if (system_state < SYSTEM_RUNNING)
149+
goto alloc;
150+
136151
/*
137-
* If a hotplug spare mask exists, check whether it's on the right
138-
* node. If not, free it and allocate a new one.
152+
* On post boot hotplug for a CPU which was not present at boot time,
153+
* iterate over all possible CPUs (even those which are not present
154+
* any more) to find any existing cluster mask.
139155
*/
140-
if (cluster_hotplug_mask) {
141-
if (cluster_hotplug_mask->node == node)
142-
return 0;
143-
kfree(cluster_hotplug_mask);
156+
for_each_possible_cpu(cpu_i) {
157+
u32 apicid = apic->cpu_present_to_apicid(cpu_i);
158+
159+
if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
160+
cmsk = per_cpu(cluster_masks, cpu_i);
161+
/*
162+
* If the cluster is already initialized, just store
163+
* the mask and return. There's no need to propagate.
164+
*/
165+
if (cmsk) {
166+
per_cpu(cluster_masks, cpu) = cmsk;
167+
return 0;
168+
}
169+
}
144170
}
145-
146-
cluster_hotplug_mask = kzalloc_node(sizeof(*cluster_hotplug_mask),
147-
GFP_KERNEL, node);
148-
if (!cluster_hotplug_mask)
171+
/*
172+
* No CPU in the cluster has ever been initialized, so fall through to
173+
* the boot time code which will also populate the cluster mask for any
174+
* other CPU in the cluster which is (now) present.
175+
*/
176+
alloc:
177+
cmsk = kzalloc_node(sizeof(*cmsk), GFP_KERNEL, node);
178+
if (!cmsk)
149179
return -ENOMEM;
150-
cluster_hotplug_mask->node = node;
180+
per_cpu(cluster_masks, cpu) = cmsk;
181+
prefill_clustermask(cmsk, cpu, cluster);
182+
151183
return 0;
152184
}
153185

154186
static int x2apic_prepare_cpu(unsigned int cpu)
155187
{
156-
if (alloc_clustermask(cpu, cpu_to_node(cpu)) < 0)
188+
u32 phys_apicid = apic->cpu_present_to_apicid(cpu);
189+
u32 cluster = apic_cluster(phys_apicid);
190+
u32 logical_apicid = (cluster << 16) | (1 << (phys_apicid & 0xf));
191+
192+
x86_cpu_to_logical_apicid[cpu] = logical_apicid;
193+
194+
if (alloc_clustermask(cpu, cluster, cpu_to_node(cpu)) < 0)
157195
return -ENOMEM;
158196
if (!zalloc_cpumask_var(&per_cpu(ipi_mask, cpu), GFP_KERNEL))
159197
return -ENOMEM;
@@ -162,10 +200,10 @@ static int x2apic_prepare_cpu(unsigned int cpu)
162200

163201
static int x2apic_dead_cpu(unsigned int dead_cpu)
164202
{
165-
struct cluster_mask *cmsk = per_cpu(cluster_masks, dead_cpu);
203+
struct cpumask *cmsk = per_cpu(cluster_masks, dead_cpu);
166204

167205
if (cmsk)
168-
cpumask_clear_cpu(dead_cpu, &cmsk->mask);
206+
cpumask_clear_cpu(dead_cpu, cmsk);
169207
free_cpumask_var(per_cpu(ipi_mask, dead_cpu));
170208
return 0;
171209
}

arch/x86/kernel/asm-offsets.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ static void __used common(void)
115115
OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
116116
OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
117117
OFFSET(X86_top_of_stack, pcpu_hot, top_of_stack);
118+
OFFSET(X86_current_task, pcpu_hot, current_task);
118119
#ifdef CONFIG_CALL_DEPTH_TRACKING
119120
OFFSET(X86_call_depth, pcpu_hot, call_depth);
120121
#endif

0 commit comments

Comments
 (0)