Skip to content

Commit d0d895e

Browse files
authored
jl_cpu_threads: exclude big.LITTLE efficency cores (#42099)
* jl_cpu_threads: exclude big.LITTLE efficency cores On big.LITTLE systems, we generally only want to spawn as many threads/tasks as there are performance cores. By default, we want to leave the efficiency cores alone, as they may end up choking on the heavy workloads we are likely to schedule. Even something as simple as starting `julia` and initializing OpenBLAS on each thread can cause a system-wide latency spike as the efficiency cores struggle to chew through the momentary workload. To fix this, we attempt to identify when we are running on a big.LITTLE system (the only one currently widely supported is the Apple M1), and we subtract out the known number of efficiency cores. Once macOS 12 is released, we will be able to use the official API for enumerating the perflevels of the available cores, demonstrated in this PR to pytorch's cpuinfo repository [0]. [0] https://github.com/pytorch/cpuinfo/blob/8ab2db2d405436f1014ed603021545b3b6b6f1ae/src/arm/mach/init.c#L161-L163 * whitespace
1 parent 4541922 commit d0d895e

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

src/sys.c

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -587,6 +587,15 @@ typedef DWORD (WINAPI *GAPC)(WORD);
587587
#endif
588588
#endif
589589

590+
// Apple's M1 processor is a big.LITTLE style processor, with 4x "performance"
591+
// cores, and 4x "efficiency" cores. Because Julia expects to be able to run
592+
// things like heavy linear algebra workloads on all cores, it's best for us
593+
// to only spawn as many threads as there are performance cores. Once macOS
594+
// 12 is released, we'll be able to query the multiple "perf levels" of the
595+
// cores of a CPU (see this PR [0] to pytorch/cpuinfo for an example) but
596+
// until it's released, we will just recognize the M1 by its CPU family
597+
// identifier, then subtract how many efficiency cores we know it has.
598+
590599
JL_DLLEXPORT int jl_cpu_threads(void) JL_NOTSAFEPOINT
591600
{
592601
#if defined(HW_AVAILCPU) && defined(HW_NCPU)
@@ -599,6 +608,19 @@ JL_DLLEXPORT int jl_cpu_threads(void) JL_NOTSAFEPOINT
599608
sysctl(nm, 2, &count, &len, NULL, 0);
600609
if (count < 1) { count = 1; }
601610
}
611+
612+
#if defined(__APPLE__) && defined(_CPU_AARCH64_)
613+
// Manually subtract efficiency cores for Apple's big.LITTLE cores
614+
int32_t family = 0;
615+
len = 4;
616+
sysctlbyname("hw.cpufamily", &family, &len, NULL, 0);
617+
if (family >= 1 && count > 1) {
618+
if (family == CPUFAMILY_ARM_FIRESTORM_ICESTORM) {
619+
// We know the Apple M1 has 4 efficiency cores, so subtract them out.
620+
count -= 4;
621+
}
622+
}
623+
#endif
602624
return count;
603625
#elif defined(_SC_NPROCESSORS_ONLN)
604626
long count = sysconf(_SC_NPROCESSORS_ONLN);

0 commit comments

Comments
 (0)