Skip to content

Commit ea35e47

Browse files
author
Ralph Castain
committed
Fat SMPs (i.e., systems with nodes containing large numbers of cpus) were failing to start due to connection failures of the opal/pmix support. Root cause was that (a) we were setting the client socket to non-blocking before calling connect, and (b) the server was using the event library to harvest the accepts, and also did the handshake while in that event. So the server would backup beyond the connection backlog limit, and we would fail.
Changing the client to leave its socket as blocking during the connect doesn't solve the problem by itself - you also have to introduce a sleep delay once the backlog is hit to avoid simply machine-gunning your way thru retries. This gets somewhat difficult to adjust as you don't want to unnecessarily prolong startup time. We've solved this before by adding a listening thread that simply reaps accepts and shoves them into the event library for subsequent processing. This would resolve the problem, but meant yet another daemon-level thread. So I centralized the listening thread support and let multiple elements register listeners on it. Thus, each daemon now has a single listening thread that reaps accepts from multiple sources - for now, the orte/pmix server and the oob/usock support are using it. I'll add in the oob/tcp component later. This still didn't fully resolve the SMP problem, especially on coprocessor cards (e.g., KNC). Removing the shared memory dstore support helped further improve the behavior - it looks like there is some kind of memory paging issue there that needs further understanding. Given that the shared memory support was about to be lost when I bring over the PMIx integration (until it is restored in that library), it seemed like a reasonable thing to just remove it at this point.
1 parent b1c100c commit ea35e47

22 files changed

+787
-1718
lines changed

opal/mca/dstore/base/dstore_base_frame.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
/*
22
* Copyright (c) 2010 Cisco Systems, Inc. All rights reserved.
33
* Copyright (c) 2012-2013 Los Alamos National Security, Inc. All rights reserved.
4-
* Copyright (c) 2014 Intel, Inc. All rights reserved.
4+
* Copyright (c) 2014-2015 Intel, Inc. All rights reserved.
55
* Copyright (c) 2014-2015 Research Organization for Information Science
66
* and Technology (RIST). All rights reserved.
77
* $COPYRIGHT$
@@ -43,7 +43,6 @@ opal_dstore_base_API_t opal_dstore = {
4343
opal_dstore_base_t opal_dstore_base = {0};
4444

4545
int opal_dstore_internal = -1;
46-
int opal_dstore_modex = -1;
4746

4847
static int opal_dstore_base_frame_close(void)
4948
{

opal/mca/dstore/dstore.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,6 @@ BEGIN_C_DECLS
4343
* as someone figures out how to separate the various
4444
* datastore channels
4545
*/
46-
OPAL_DECLSPEC extern int opal_dstore_internal;
47-
OPAL_DECLSPEC extern int opal_dstore_modex;
48-
4946
OPAL_DECLSPEC extern int opal_dstore_peer;
5047
OPAL_DECLSPEC extern int opal_dstore_internal;
5148
OPAL_DECLSPEC extern int opal_dstore_nonpeer;

opal/mca/dstore/sm/Makefile.am

Lines changed: 0 additions & 36 deletions
This file was deleted.

0 commit comments

Comments
 (0)