Skip to content

Commit ca15562

Browse files
anematodePikaCat-OuO
authored andcommitted
Make shared history allocation aware of non-uniform cache access
Although shared history has been successful overall, it led to some speed issues with large numbers of threads. Originally we just split by NUMA node, but on systems with non-unified L3 caches (most AMD workstation and server CPUs, and some Intel E-core based server CPUs), this can still lead to a speed penalty at the default config. Thus, we decided to further subdivide the shared history based on the L3 cache structure. Based on this test, the original SPRTs, and speed experiments, we decided that grouping L3 domains to reach 32 threads per SharedHistories was a reasonable balance for affected systems – but we may revisit this in the future. See the PR for full details. In an extreme case, a single-socket EPYC 9755 configured with 1 numa domain per socket, the nps increases from: Nodes/second : 182827480 to Nodes/second : 229118365 In many cases, when L3 caches are shared between many threads, or when several numa nodes are already configured per socket, this patch does not influence the default. This default setting can adjusted with the existing NumaPolicy option. No functional change.
1 parent fa93bc3 commit ca15562

File tree

2 files changed

+309
-99
lines changed

2 files changed

+309
-99
lines changed

src/engine.cpp

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,15 @@ constexpr auto StartFEN = "rnbakabnr/9/1c5c1/p1p1p1p1p/9/9/P1P1P1P1P/1C5C1/9/R
4747
constexpr int MaxHashMB = Is64Bit ? 33554432 : 2048;
4848
int MaxThreads = std::max(1024, 4 * int(get_hardware_concurrency()));
4949

50+
// The default configuration will attempt to group L3 domains up to 32 threads.
51+
// This size was found to be a good balance between the Elo gain of increased
52+
// history sharing and the speed loss from more cross-cache accesses (see
53+
// PR#6526). The user can always explicitly override this behavior.
54+
constexpr NumaAutoPolicy DefaultNumaPolicy = BundledL3Policy{32};
55+
5056
Engine::Engine(std::optional<std::string> path) :
5157
binaryDirectory(path ? CommandLine::get_binary_directory(*path) : ""),
52-
numaContext(NumaConfig::from_system()),
58+
numaContext(NumaConfig::from_system(DefaultNumaPolicy)),
5359
states(new std::deque<StateInfo>(1)),
5460
threads(),
5561
networks(numaContext,
@@ -176,12 +182,12 @@ void Engine::set_position(const std::string& fen, const std::vector<std::string>
176182
void Engine::set_numa_config_from_option(const std::string& o) {
177183
if (o == "auto" || o == "system")
178184
{
179-
numaContext.set_numa_config(NumaConfig::from_system());
185+
numaContext.set_numa_config(NumaConfig::from_system(DefaultNumaPolicy));
180186
}
181187
else if (o == "hardware")
182188
{
183189
// Don't respect affinity set in the system.
184-
numaContext.set_numa_config(NumaConfig::from_system(false));
190+
numaContext.set_numa_config(NumaConfig::from_system(DefaultNumaPolicy, false));
185191
}
186192
else if (o == "none")
187193
{

0 commit comments

Comments
 (0)