Skip to content

Commit b267524

Browse files
committed
rmaps: fixed the ordering of mpirun target nodes
Fixed the desync of job-nodelists between mpirun and orted daemons. The issue was observed when using RSH launching because user can provide arbitrary order of nodes regarding HNP placement. The mpirun process propagate the daemon's nodelist order to nodes. The problem was that HNP itself is assembling the nodelist based on user provided order. As the result ranks assignment was calculated differently on orted and mpirun. Consider following example: * User launches mpirun on node cn2. * Hostlist is cn1,cn2,cn3,cn4; ppn=1 * mpirun is passing hostlist cn[2:2,1,3-4]@0(4) to orteds So as result mpirun will assing rank 0 on cn1 while orted will assign rank 0 on cn2 (because orted sees cn2 as the first element in the node list) Signed-off-by: Boris Karasev <[email protected]> (cherry picked from commit 52e81ee)
1 parent 4324c00 commit b267524

File tree

1 file changed

+9
-7
lines changed

1 file changed

+9
-7
lines changed

orte/mca/rmaps/base/rmaps_base_support_fns.c

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@
1414
* All rights reserved.
1515
* Copyright (c) 2014-2017 Intel, Inc. All rights reserved.
1616
* Copyright (c) 2016 IBM Corporation. All rights reserved.
17+
* Copyright (c) 2018 Mellanox Technologies, Inc.
18+
* All rights reserved.
1719
* $COPYRIGHT$
1820
*
1921
* Additional copyrights may follow
@@ -253,13 +255,12 @@ int orte_rmaps_base_get_target_nodes(opal_list_t *allocated_nodes, orte_std_cntr
253255
/* find the nodes in our node array and assemble them
254256
* in daemon order if the vm was launched
255257
*/
256-
while (NULL != (item = opal_list_remove_first(&nodes))) {
257-
nptr = (orte_node_t*)item;
258+
for (i=0; i < orte_node_pool->size; i++) {
258259
nd = NULL;
259-
for (i=0; i < orte_node_pool->size; i++) {
260-
if (NULL == (node = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, i))) {
261-
continue;
262-
}
260+
if (NULL == (node = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, i))) {
261+
continue;
262+
}
263+
OPAL_LIST_FOREACH_SAFE(nptr, next, &nodes, orte_node_t) {
263264
if (0 != strcmp(node->name, nptr->name)) {
264265
OPAL_OUTPUT_VERBOSE((10, orte_rmaps_base_framework.framework_output,
265266
"NODE %s DOESNT MATCH NODE %s",
@@ -332,8 +333,9 @@ int orte_rmaps_base_get_target_nodes(opal_list_t *allocated_nodes, orte_std_cntr
332333
/* reset us back to the end for the next node */
333334
nd = (orte_node_t*)opal_list_get_last(allocated_nodes);
334335
}
336+
opal_list_remove_item(&nodes, (opal_list_item_t*)nptr);
337+
OBJ_RELEASE(nptr);
335338
}
336-
OBJ_RELEASE(nptr);
337339
}
338340
OBJ_DESTRUCT(&nodes);
339341
/* now prune for usage and compute total slots */

0 commit comments

Comments
 (0)