Commit cc95a07
x86/apic: Reduce cache line misses in __x2apic_send_IPI_mask()
Using per-cpu storage for @x86_cpu_to_logical_apicid is not optimal.
Broadcast IPI will need at least one cache line per cpu to access this
field.
__x2apic_send_IPI_mask() is using standard bitmask operators.
By converting x86_cpu_to_logical_apicid to an array, we divide by 16x
number of needed cache lines, because we find 16 values per cache
line. CPU prefetcher can kick nicely.
Also move @cluster_masks to READ_MOSTLY section to avoid false sharing.
Tested on a dual socket host with 256 cpus, cost for a full broadcast
is now 11 usec instead of 33 usec.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]1 parent 3906fe9 commit cc95a07
1 file changed
+21
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
19 | 25 | | |
20 | | - | |
| 26 | + | |
21 | 27 | | |
22 | 28 | | |
23 | 29 | | |
| |||
27 | 33 | | |
28 | 34 | | |
29 | 35 | | |
30 | | - | |
| 36 | + | |
31 | 37 | | |
32 | 38 | | |
33 | 39 | | |
| |||
58 | 64 | | |
59 | 65 | | |
60 | 66 | | |
61 | | - | |
| 67 | + | |
62 | 68 | | |
63 | 69 | | |
64 | 70 | | |
| |||
94 | 100 | | |
95 | 101 | | |
96 | 102 | | |
97 | | - | |
| 103 | + | |
98 | 104 | | |
99 | 105 | | |
100 | 106 | | |
| |||
103 | 109 | | |
104 | 110 | | |
105 | 111 | | |
106 | | - | |
| 112 | + | |
107 | 113 | | |
108 | 114 | | |
109 | 115 | | |
| |||
166 | 172 | | |
167 | 173 | | |
168 | 174 | | |
| 175 | + | |
| 176 | + | |
169 | 177 | | |
170 | 178 | | |
171 | 179 | | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
172 | 185 | | |
173 | 186 | | |
174 | 187 | | |
| 188 | + | |
| 189 | + | |
175 | 190 | | |
176 | 191 | | |
177 | 192 | | |
| |||
0 commit comments