Skip to content

Commit 4887d09

Browse files
grondomergify[bot]
authored andcommitted
shell: fix hwloc_distrib() usage in affinity plugin
Problem: The job shell uses hwloc_distrib() incorrectly to distribute cores to tasks when cpu-affinity=per-task is specified, resulting in strange, uneven distribution of cores to tasks. Using the source for hwloc-distrib(1) as a reference, distribute cores to tasks by creating a separate hwloc_obj_t "root" per core in the current topology, then calling hwloc_distrib() explicitly with those roots. This seems to force the even distribution of cores to tasks that is expected. Fixes #4525
1 parent 6c8a176 commit 4887d09

File tree

1 file changed

+22
-5
lines changed

1 file changed

+22
-5
lines changed

src/shell/affinity.c

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -63,22 +63,39 @@ static hwloc_cpuset_t *distribute_tasks (hwloc_topology_t topo,
6363
hwloc_cpuset_t cset,
6464
int ntasks)
6565
{
66-
hwloc_obj_t obj[1];
66+
hwloc_obj_t *roots;
6767
hwloc_cpuset_t *cpusetp = NULL;
68+
int cores;
69+
int depth;
6870

6971
/* restrict topology to current cpuset */
70-
if (cset && topology_restrict (topo, cset) < 0)
72+
if (cset && topology_restrict (topo, cset) < 0) {
73+
shell_log_errno ("topology_restrict failed");
7174
return NULL;
75+
}
7276
/* create cpuset array for ntasks */
7377
if (!(cpusetp = calloc (ntasks, sizeof (hwloc_cpuset_t))))
7478
return NULL;
75-
/* Distribute starting at root over remaining objects */
76-
obj[0] = hwloc_get_root_obj (topo);
79+
80+
depth = hwloc_get_type_depth (topo, HWLOC_OBJ_CORE);
81+
cores = hwloc_get_nbobjs_by_depth (topo, depth);
82+
if (cores <= 0 || !(roots = calloc (cores, sizeof (*roots)))) {
83+
shell_log_error ("failed to allocate %d roots for hwloc distrib",
84+
cores);
85+
return NULL;
86+
}
87+
88+
for (int i = 0; i < cores; i++)
89+
roots[i] = hwloc_get_obj_by_depth (topo, depth, i);
90+
91+
shell_trace ("distributing %d tasks across %d cores", ntasks, cores);
7792

7893
/* NB: hwloc_distrib() will alloc ntasks cpusets in cpusetp, which
7994
* later need to be destroyed with hwloc_bitmap_free().
8095
*/
81-
hwloc_distrib (topo, obj, 1, cpusetp, ntasks, INT_MAX, 0);
96+
hwloc_distrib (topo, roots, cores, cpusetp, ntasks, depth, 0);
97+
98+
free (roots);
8299
return (cpusetp);
83100
}
84101

0 commit comments

Comments
 (0)