|
1 | 1 | # -*- text -*- |
2 | 2 | # |
3 | | -# Copyright (c) 2009-2014 Cisco Systems, Inc. All rights reserved. |
| 3 | +# Copyright (c) 2009-2016 Cisco Systems, Inc. All rights reserved. |
4 | 4 | # Copyright (c) 2015-2016 The University of Tennessee and The University |
5 | 5 | # of Tennessee Research Foundation. All rights |
6 | 6 | # reserved. |
@@ -59,21 +59,35 @@ most common causes when it does occur are: |
59 | 59 | * The operating system ran out of file descriptors |
60 | 60 | * The operating system ran out of memory |
61 | 61 |
|
| 62 | +Your Open MPI job will likely hang (or crash) until the failure |
| 63 | +resason is fixed (e.g., more file descriptors and/or memory becomes |
| 64 | +available), and may eventually timeout / abort. |
| 65 | + |
| 66 | + Local host: %s |
| 67 | + PID: %d |
| 68 | + Errno: %d (%s) |
| 69 | +# |
62 | 70 | [unsuported progress thread] |
63 | 71 | WARNING: Support for the TCP progress thread has not been compiled in. |
64 | 72 | Fall back to the normal progress. |
65 | 73 |
|
66 | 74 | Local host: %s |
67 | 75 | Value: %s |
68 | 76 | Message: %s |
69 | | - |
70 | 77 | # |
| 78 | +[peer hung up] |
| 79 | +An MPI communication peer process has unexpectedly disconnected. This |
| 80 | +usually indicates a failure in the peer process (e.g., a crash or |
| 81 | +otherwise exiting without calling MPI_FINALIZE first). |
71 | 82 |
|
72 | | -Your Open MPI job will likely hang until the failure resason is fixed |
73 | | -(e.g., more file descriptors and/or memory becomes available), and may |
74 | | -eventually timeout / abort. |
| 83 | +Although this local MPI process will likely now behave unpredictably |
| 84 | +(it may even hang or crash), the root cause of this problem is the |
| 85 | +failure of the peer -- that is what you need to investigate. For |
| 86 | +example, there may be a core file that you can examine. More |
| 87 | +generally: such peer hangups are frequently caused by application bugs |
| 88 | +or other external events. |
75 | 89 |
|
76 | 90 | Local host: %s |
77 | | - PID: %d |
78 | | - Errno: %d (%s) |
| 91 | + Local PID: %d |
| 92 | + Peer host: %s |
79 | 93 | # |
0 commit comments