Skip to content

Commit 481a869

Browse files
authored
OWLS-76974: Add more debugging/help when node manager fails to start (#2393)
* Add more debugging/help when node manager fails to start
1 parent eba7b99 commit 481a869

File tree

1 file changed

+35
-4
lines changed

1 file changed

+35
-4
lines changed

operator/src/main/resources/scripts/startNodeManager.sh

Lines changed: 35 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -337,13 +337,44 @@ while [ 1 -eq 1 ]; do
337337
break
338338
fi
339339
if [ $((SECONDS - $start_secs)) -ge $max_wait_secs ]; then
340-
trace INFO "Trying to put a node manager thread dump in '$nodemgr_out_file'."
341-
kill -3 `jps -l | grep weblogic.NodeManager | awk '{ print $1 }'`
340+
pid=$(jps | grep NodeManager | awk '{ print $1 }')
341+
if [ -z $pid ]; then
342+
trace INFO "Node manager process id not found. Cannot create thread dump."
343+
else
344+
trace INFO "Node manager process id is '$pid'."
345+
trace INFO "Trying to put a node manager thread dump in '$nodemgr_out_file'."
346+
kill -3 $pid
347+
if [ -x "$(command -v $JAVA_HOME/bin/jcmd)" ]; then
348+
trace INFO "Node manager thread dump:"
349+
$JAVA_HOME/bin/jcmd $pid Thread.print
350+
fi
351+
fi
352+
trace INFO "Entropy: "
353+
cat /proc/sys/kernel/random/entropy_avail
342354
trace INFO "Contents of node manager log '$nodemgr_log_file':"
343355
cat ${nodemgr_log_file}
344356
trace INFO "Contents of node manager out '$nodemgr_out_file':"
345-
cat ${nodemgr_out_file}
346-
trace SEVERE "Node manager failed to start within $max_wait_secs seconds."
357+
cat ${NODEMGR_OUT_FILE}
358+
359+
trace SEVERE $(cat << EOF
360+
The node manager failed to start within $max_wait_secs seconds.
361+
To increase this timeout, define the NODE_MANAGER_MAX_WAIT
362+
environment variable in your domain resource, and set it higher
363+
than $max_wait_secs. To diagnose the problem, see the above INFO
364+
messages for node manager log contents, stdout contents, pid,
365+
thread dump, and entropy. If the log and stdout contents are
366+
sparse and reveal no errors, then the node manager may be stalled
367+
while generating entropy -- especially if entropy is below 500.
368+
If entropy is the problem, then for testing purposes you can
369+
temporarily work around this problem by specifying
370+
'-Djava.security.egd=file:/dev/./urandom' in a USER_MEM_ARGS
371+
environment variable defined via your domain resource, but
372+
for production purposes the problem should be solved by following
373+
the guidance in
374+
'https://docs.oracle.com/en/middleware/fusion-middleware/weblogic-server/12.2.1.4/nodem/starting_nodemgr.html#GUID-53961E3A-D8E1-4556-B78A-9A56B676D57E'
375+
(search for keyword 'rngd').
376+
EOF
377+
)
347378
exit 1
348379
fi
349380
done

0 commit comments

Comments
 (0)