| 
1 | 1 | # -*- text -*-  | 
2 | 2 | #  | 
3 |  | -# Copyright (c) 2009-2014 Cisco Systems, Inc.  All rights reserved.  | 
 | 3 | +# Copyright (c) 2009-2016 Cisco Systems, Inc.  All rights reserved.  | 
4 | 4 | # Copyright (c) 2015-2016 The University of Tennessee and The University  | 
5 | 5 | #                         of Tennessee Research Foundation.  All rights  | 
6 | 6 | #                         reserved.  | 
@@ -59,21 +59,35 @@ most common causes when it does occur are:  | 
59 | 59 |   * The operating system ran out of file descriptors  | 
60 | 60 |   * The operating system ran out of memory  | 
61 | 61 | 
 
  | 
 | 62 | +Your Open MPI job will likely hang (or crash) until the failure  | 
 | 63 | +resason is fixed (e.g., more file descriptors and/or memory becomes  | 
 | 64 | +available), and may eventually timeout / abort.  | 
 | 65 | + | 
 | 66 | +  Local host: %s  | 
 | 67 | +  PID:        %d  | 
 | 68 | +  Errno:      %d (%s)  | 
 | 69 | +#  | 
62 | 70 | [unsuported progress thread]  | 
63 | 71 | WARNING: Support for the TCP progress thread has not been compiled in.  | 
64 | 72 | Fall back to the normal progress.  | 
65 | 73 | 
 
  | 
66 | 74 |   Local host: %s  | 
67 | 75 |   Value:      %s  | 
68 | 76 |   Message:    %s  | 
69 |  | - | 
70 | 77 | #  | 
 | 78 | +[peer hung up]  | 
 | 79 | +An MPI communication peer process has unexpectedly disconnected.  This  | 
 | 80 | +usually indicates a failure in the peer process (e.g., a crash or  | 
 | 81 | +otherwise exiting without calling MPI_FINALIZE first).  | 
71 | 82 | 
 
  | 
72 |  | -Your Open MPI job will likely hang until the failure resason is fixed  | 
73 |  | -(e.g., more file descriptors and/or memory becomes available), and may  | 
74 |  | -eventually timeout / abort.  | 
 | 83 | +Although this local MPI process will likely now behave unpredictably  | 
 | 84 | +(it may even hang or crash), the root cause of this problem is the  | 
 | 85 | +failure of the peer -- that is what you need to investigate.  For  | 
 | 86 | +example, there may be a core file that you can examine.  More  | 
 | 87 | +generally: such peer hangups are frequently caused by application bugs  | 
 | 88 | +or other external events.  | 
75 | 89 | 
 
  | 
76 | 90 |   Local host: %s  | 
77 |  | -  PID:        %d  | 
78 |  | -  Errno:      %d (%s)  | 
 | 91 | +  Local PID:  %d  | 
 | 92 | +  Peer host:  %s  | 
79 | 93 | #  | 
0 commit comments