Skip to content

Commit 2ff3850

Browse files
author
Jonathan Peyton
authored
[OpenMP] Add absolute KMP_HW_SUBSET functionality (#85326)
Users can put a : in front of KMP_HW_SUBSET to indicate that the specified subset is an "absolute" subset. Currently, when a user puts KMP_HW_SUBSET=1t. This gets translated to KMP_HW_SUBSET="*s,*c,1t", where * means "use all of". If a user wants only one thread as the entire topology they can now do KMP_HW_SUBSET=:1t. Along with the absolute syntax is a fix for newer machines and making them easier to use with only the 3-level topology syntax. When a user puts KMP_HW_SUBSET=1s,4c,2t on a machine which actually has 4 layers, (say 1s,2m,3c,2t as the entire machine) the user gets an unexpected "too many resources asked" message because KMP_HW_SUBSET currently translates the "4c" value to mean 4 cores per module. To help users out, the runtime can assume that these newer layers, module in this case, should be ignored if they are not specified, but the topology should always take into account the sockets, cores, and threads layers.
1 parent cc308f6 commit 2ff3850

File tree

4 files changed

+256
-77
lines changed

4 files changed

+256
-77
lines changed

openmp/docs/design/Runtimes.rst

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -496,7 +496,9 @@ An extended syntax is available when ``KMP_TOPOLOGY_METHOD=hwloc``. Depending on
496496
resources are detected, you may be able to specify additional resources, such as
497497
NUMA domains and groups of hardware resources that share certain cache levels.
498498

499-
**Basic syntax:** ``[num_units|*]ID[@offset][:attribute] [,[num_units|*]ID[@offset][:attribute]...]``
499+
**Basic syntax:** ``[:][num_units|*]ID[@offset][:attribute] [,[num_units|*]ID[@offset][:attribute]...]``
500+
501+
An optional colon (:) can be specified at the beginning of the syntax to specify an explicit hardware subset. The default is an implicit hardware subset.
500502

501503
Supported unit IDs are not case-insensitive.
502504

@@ -547,6 +549,18 @@ When any numa or tile units are specified in ``KMP_HW_SUBSET`` and the hwloc
547549
topology method is available, the ``KMP_TOPOLOGY_METHOD`` will be automatically
548550
set to hwloc, so there is no need to set it explicitly.
549551

552+
For an **explicit hardware subset**, if one or more topology layers detected by the
553+
runtime are omitted from the subset, then those topology layers are ignored.
554+
Only explicitly specified topology layers are used in the subset.
555+
556+
For an **implicit hardware subset**, it is implied that the socket, core, and thread
557+
topology types should be included in the subset. Other topology layers are not
558+
implicitly included and are ignored if they are not specified in the subset.
559+
Because the socket, core and thread topology types are always included in
560+
implicit hardware subsets, when they are omitted, it is assumed that all
561+
available resources of that type should be used. Implicit hardware subsets are
562+
the default.
563+
550564
If you don't specify one or more types of resource, such as socket or thread,
551565
all available resources of that type are used.
552566

@@ -565,7 +579,7 @@ This variable does not work if ``KMP_AFFINITY=disabled``.
565579
**Default:** If omitted, the default value is to use all the
566580
available hardware resources.
567581

568-
**Examples:**
582+
**Implicit Hardware Subset Examples:**
569583

570584
* ``2s,4c,2t``: Use the first 2 sockets (s0 and s1), the first 4 cores on each
571585
socket (c0 - c3), and 2 threads per core.
@@ -590,6 +604,12 @@ available hardware resources.
590604
* ``*c:eff1@3``: Use all available sockets, skip the first three cores of
591605
efficiency 1, and then use the rest of the available cores of efficiency 1.
592606

607+
Explicit Hardware Subset Examples:
608+
609+
* ``:2s,6t`` Use exactly the first two sockets and 6 threads per socket.
610+
* ``:1t@7`` Skip the first 7 threads (t0-t6) and use exactly one thread (t7).
611+
* ``:5c,1t`` Use exactly the first 5 cores (c0-c4) and the first thread on each core.
612+
593613
To see the result of the setting, you can specify ``verbose`` modifier in
594614
``KMP_AFFINITY`` environment variable. The OpenMP run-time library will output
595615
to ``stderr`` the information about the discovered hardware topology before and

openmp/runtime/src/kmp_affinity.cpp

Lines changed: 95 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -987,41 +987,6 @@ void kmp_topology_t::canonicalize(int npackages, int ncores_per_pkg,
987987
_discover_uniformity();
988988
}
989989

990-
// Represents running sub IDs for a single core attribute where
991-
// attribute values have SIZE possibilities.
992-
template <size_t SIZE, typename IndexFunc> struct kmp_sub_ids_t {
993-
int last_level; // last level in topology to consider for sub_ids
994-
int sub_id[SIZE]; // The sub ID for a given attribute value
995-
int prev_sub_id[KMP_HW_LAST];
996-
IndexFunc indexer;
997-
998-
public:
999-
kmp_sub_ids_t(int last_level) : last_level(last_level) {
1000-
KMP_ASSERT(last_level < KMP_HW_LAST);
1001-
for (size_t i = 0; i < SIZE; ++i)
1002-
sub_id[i] = -1;
1003-
for (size_t i = 0; i < KMP_HW_LAST; ++i)
1004-
prev_sub_id[i] = -1;
1005-
}
1006-
void update(const kmp_hw_thread_t &hw_thread) {
1007-
int idx = indexer(hw_thread);
1008-
KMP_ASSERT(idx < (int)SIZE);
1009-
for (int level = 0; level <= last_level; ++level) {
1010-
if (hw_thread.sub_ids[level] != prev_sub_id[level]) {
1011-
if (level < last_level)
1012-
sub_id[idx] = -1;
1013-
sub_id[idx]++;
1014-
break;
1015-
}
1016-
}
1017-
for (int level = 0; level <= last_level; ++level)
1018-
prev_sub_id[level] = hw_thread.sub_ids[level];
1019-
}
1020-
int get_sub_id(const kmp_hw_thread_t &hw_thread) const {
1021-
return sub_id[indexer(hw_thread)];
1022-
}
1023-
};
1024-
1025990
#if KMP_AFFINITY_SUPPORTED
1026991
static kmp_str_buf_t *
1027992
__kmp_hw_get_catalog_core_string(const kmp_hw_attr_t &attr, kmp_str_buf_t *buf,
@@ -1084,9 +1049,12 @@ bool kmp_topology_t::filter_hw_subset() {
10841049
// First, sort the KMP_HW_SUBSET items by the machine topology
10851050
__kmp_hw_subset->sort();
10861051

1052+
__kmp_hw_subset->canonicalize(__kmp_topology);
1053+
10871054
// Check to see if KMP_HW_SUBSET is a valid subset of the detected topology
10881055
bool using_core_types = false;
10891056
bool using_core_effs = false;
1057+
bool is_absolute = __kmp_hw_subset->is_absolute();
10901058
int hw_subset_depth = __kmp_hw_subset->get_depth();
10911059
kmp_hw_t specified[KMP_HW_LAST];
10921060
int *topology_levels = (int *)KMP_ALLOCA(sizeof(int) * hw_subset_depth);
@@ -1124,12 +1092,14 @@ bool kmp_topology_t::filter_hw_subset() {
11241092

11251093
// Check to see if each layer's num & offset parameters are valid
11261094
max_count = get_ratio(level);
1127-
if (max_count < 0 ||
1128-
(num != kmp_hw_subset_t::USE_ALL && num + offset > max_count)) {
1129-
bool plural = (num > 1);
1130-
KMP_AFF_WARNING(__kmp_affinity, AffHWSubsetManyGeneric,
1131-
__kmp_hw_get_catalog_string(type, plural));
1132-
return false;
1095+
if (!is_absolute) {
1096+
if (max_count < 0 ||
1097+
(num != kmp_hw_subset_t::USE_ALL && num + offset > max_count)) {
1098+
bool plural = (num > 1);
1099+
KMP_AFF_WARNING(__kmp_affinity, AffHWSubsetManyGeneric,
1100+
__kmp_hw_get_catalog_string(type, plural));
1101+
return false;
1102+
}
11331103
}
11341104

11351105
// Check to see if core attributes are consistent
@@ -1192,7 +1162,7 @@ bool kmp_topology_t::filter_hw_subset() {
11921162
}
11931163

11941164
// Check that the number of requested cores with attributes is valid
1195-
if (using_core_types || using_core_effs) {
1165+
if ((using_core_types || using_core_effs) && !is_absolute) {
11961166
for (int j = 0; j < item.num_attrs; ++j) {
11971167
int num = item.num[j];
11981168
int offset = item.offset[j];
@@ -1248,46 +1218,92 @@ bool kmp_topology_t::filter_hw_subset() {
12481218
}
12491219
}
12501220

1251-
struct core_type_indexer {
1252-
int operator()(const kmp_hw_thread_t &t) const {
1253-
switch (t.attrs.get_core_type()) {
1254-
case KMP_HW_CORE_TYPE_UNKNOWN:
1255-
case KMP_HW_MAX_NUM_CORE_TYPES:
1256-
return 0;
1257-
#if KMP_ARCH_X86 || KMP_ARCH_X86_64
1258-
case KMP_HW_CORE_TYPE_ATOM:
1259-
return 1;
1260-
case KMP_HW_CORE_TYPE_CORE:
1261-
return 2;
1262-
#endif
1263-
}
1264-
KMP_ASSERT2(false, "Unhandled kmp_hw_thread_t enumeration");
1265-
KMP_BUILTIN_UNREACHABLE;
1221+
// For keeping track of sub_ids for an absolute KMP_HW_SUBSET
1222+
// or core attributes (core type or efficiency)
1223+
int prev_sub_ids[KMP_HW_LAST];
1224+
int abs_sub_ids[KMP_HW_LAST];
1225+
int core_eff_sub_ids[KMP_HW_MAX_NUM_CORE_EFFS];
1226+
int core_type_sub_ids[KMP_HW_MAX_NUM_CORE_TYPES];
1227+
for (size_t i = 0; i < KMP_HW_LAST; ++i) {
1228+
abs_sub_ids[i] = -1;
1229+
prev_sub_ids[i] = -1;
1230+
}
1231+
for (size_t i = 0; i < KMP_HW_MAX_NUM_CORE_EFFS; ++i)
1232+
core_eff_sub_ids[i] = -1;
1233+
for (size_t i = 0; i < KMP_HW_MAX_NUM_CORE_TYPES; ++i)
1234+
core_type_sub_ids[i] = -1;
1235+
1236+
// Determine which hardware threads should be filtered.
1237+
1238+
// Helpful to determine if a topology layer is targeted by an absolute subset
1239+
auto is_targeted = [&](int level) {
1240+
if (is_absolute) {
1241+
for (int i = 0; i < hw_subset_depth; ++i)
1242+
if (topology_levels[i] == level)
1243+
return true;
1244+
return false;
12661245
}
1246+
// If not absolute KMP_HW_SUBSET, then every layer is seen as targeted
1247+
return true;
12671248
};
1268-
struct core_eff_indexer {
1269-
int operator()(const kmp_hw_thread_t &t) const {
1270-
return t.attrs.get_core_eff();
1249+
1250+
// Helpful to index into core type sub Ids array
1251+
auto get_core_type_index = [](const kmp_hw_thread_t &t) {
1252+
switch (t.attrs.get_core_type()) {
1253+
case KMP_HW_CORE_TYPE_UNKNOWN:
1254+
case KMP_HW_MAX_NUM_CORE_TYPES:
1255+
return 0;
1256+
#if KMP_ARCH_X86 || KMP_ARCH_X86_64
1257+
case KMP_HW_CORE_TYPE_ATOM:
1258+
return 1;
1259+
case KMP_HW_CORE_TYPE_CORE:
1260+
return 2;
1261+
#endif
12711262
}
1263+
KMP_ASSERT2(false, "Unhandled kmp_hw_thread_t enumeration");
1264+
KMP_BUILTIN_UNREACHABLE;
12721265
};
12731266

1274-
kmp_sub_ids_t<KMP_HW_MAX_NUM_CORE_TYPES, core_type_indexer> core_type_sub_ids(
1275-
core_level);
1276-
kmp_sub_ids_t<KMP_HW_MAX_NUM_CORE_EFFS, core_eff_indexer> core_eff_sub_ids(
1277-
core_level);
1267+
// Helpful to index into core efficiencies sub Ids array
1268+
auto get_core_eff_index = [](const kmp_hw_thread_t &t) {
1269+
return t.attrs.get_core_eff();
1270+
};
12781271

1279-
// Determine which hardware threads should be filtered.
12801272
int num_filtered = 0;
12811273
kmp_affin_mask_t *filtered_mask;
12821274
KMP_CPU_ALLOC(filtered_mask);
12831275
KMP_CPU_COPY(filtered_mask, __kmp_affin_fullMask);
12841276
for (int i = 0; i < num_hw_threads; ++i) {
12851277
kmp_hw_thread_t &hw_thread = hw_threads[i];
1286-
// Update type_sub_id
1287-
if (using_core_types)
1288-
core_type_sub_ids.update(hw_thread);
1289-
if (using_core_effs)
1290-
core_eff_sub_ids.update(hw_thread);
1278+
1279+
// Figure out the absolute sub ids and core eff/type sub ids
1280+
if (is_absolute || using_core_effs || using_core_types) {
1281+
for (int level = 0; level < get_depth(); ++level) {
1282+
if (hw_thread.sub_ids[level] != prev_sub_ids[level]) {
1283+
bool found_targeted = false;
1284+
for (int j = level; j < get_depth(); ++j) {
1285+
bool targeted = is_targeted(j);
1286+
if (!found_targeted && targeted) {
1287+
found_targeted = true;
1288+
abs_sub_ids[j]++;
1289+
if (j == core_level && using_core_effs)
1290+
core_eff_sub_ids[get_core_eff_index(hw_thread)]++;
1291+
if (j == core_level && using_core_types)
1292+
core_type_sub_ids[get_core_type_index(hw_thread)]++;
1293+
} else if (targeted) {
1294+
abs_sub_ids[j] = 0;
1295+
if (j == core_level && using_core_effs)
1296+
core_eff_sub_ids[get_core_eff_index(hw_thread)] = 0;
1297+
if (j == core_level && using_core_types)
1298+
core_type_sub_ids[get_core_type_index(hw_thread)] = 0;
1299+
}
1300+
}
1301+
break;
1302+
}
1303+
}
1304+
for (int level = 0; level < get_depth(); ++level)
1305+
prev_sub_ids[level] = hw_thread.sub_ids[level];
1306+
}
12911307

12921308
// Check to see if this hardware thread should be filtered
12931309
bool should_be_filtered = false;
@@ -1322,20 +1338,24 @@ bool kmp_topology_t::filter_hw_subset() {
13221338
int num = hw_subset_item.num[attr_idx];
13231339
int offset = hw_subset_item.offset[attr_idx];
13241340
if (using_core_types)
1325-
sub_id = core_type_sub_ids.get_sub_id(hw_thread);
1341+
sub_id = core_type_sub_ids[get_core_type_index(hw_thread)];
13261342
else
1327-
sub_id = core_eff_sub_ids.get_sub_id(hw_thread);
1343+
sub_id = core_eff_sub_ids[get_core_eff_index(hw_thread)];
13281344
if (sub_id < offset ||
13291345
(num != kmp_hw_subset_t::USE_ALL && sub_id >= offset + num)) {
13301346
should_be_filtered = true;
13311347
break;
13321348
}
13331349
} else {
1350+
int sub_id;
13341351
int num = hw_subset_item.num[0];
13351352
int offset = hw_subset_item.offset[0];
1336-
if (hw_thread.sub_ids[level] < offset ||
1337-
(num != kmp_hw_subset_t::USE_ALL &&
1338-
hw_thread.sub_ids[level] >= offset + num)) {
1353+
if (is_absolute)
1354+
sub_id = abs_sub_ids[level];
1355+
else
1356+
sub_id = hw_thread.sub_ids[level];
1357+
if (sub_id < offset ||
1358+
(num != kmp_hw_subset_t::USE_ALL && sub_id >= offset + num)) {
13391359
should_be_filtered = true;
13401360
break;
13411361
}

openmp/runtime/src/kmp_affinity.h

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1172,6 +1172,50 @@ class kmp_hw_subset_t {
11721172
qsort(items, depth, sizeof(item_t), hw_subset_compare);
11731173
}
11741174
bool specified(kmp_hw_t type) const { return ((set & (1ull << type)) > 0); }
1175+
1176+
// Canonicalize the KMP_HW_SUBSET value if it is not an absolute subset.
1177+
// This means putting each of {sockets, cores, threads} in the topology if
1178+
// they are not specified:
1179+
// e.g., 1s,2c => 1s,2c,*t | 2c,1t => *s,2c,1t | 1t => *s,*c,1t | etc.
1180+
// e.g., 3module => *s,3module,*c,*t
1181+
// By doing this, the runtime assumes users who fiddle with KMP_HW_SUBSET
1182+
// are expecting the traditional sockets/cores/threads topology. For newer
1183+
// hardware, there can be intervening layers like dies/tiles/modules
1184+
// (usually corresponding to a cache level). So when a user asks for
1185+
// 1s,6c,2t and the topology is really 1s,2modules,4cores,2threads, the user
1186+
// should get 12 hardware threads across 6 cores and effectively ignore the
1187+
// module layer.
1188+
void canonicalize(const kmp_topology_t *top) {
1189+
// Layers to target for KMP_HW_SUBSET canonicalization
1190+
kmp_hw_t targeted[] = {KMP_HW_SOCKET, KMP_HW_CORE, KMP_HW_THREAD};
1191+
1192+
// Do not target-layer-canonicalize absolute KMP_HW_SUBSETS
1193+
if (is_absolute())
1194+
return;
1195+
1196+
// Do not target-layer-canonicalize KMP_HW_SUBSETS when the
1197+
// topology doesn't have these layers
1198+
for (kmp_hw_t type : targeted)
1199+
if (top->get_level(type) == KMP_HW_UNKNOWN)
1200+
return;
1201+
1202+
// Put targeted layers in topology if they do not exist
1203+
for (kmp_hw_t type : targeted) {
1204+
bool found = false;
1205+
for (int i = 0; i < get_depth(); ++i) {
1206+
if (top->get_equivalent_type(items[i].type) == type) {
1207+
found = true;
1208+
break;
1209+
}
1210+
}
1211+
if (!found) {
1212+
push_back(USE_ALL, type, 0, kmp_hw_attr_t{});
1213+
}
1214+
}
1215+
sort();
1216+
// Set as an absolute topology that only targets the targeted layers
1217+
set_absolute();
1218+
}
11751219
void dump() const {
11761220
printf("**********************\n");
11771221
printf("*** kmp_hw_subset: ***\n");

0 commit comments

Comments
 (0)