Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions dracut/99kdumpbase/kdump.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ get_kdump_confs() {
KDUMP_POST="$config_val"
;;
fence_kdump_args)
FENCE_KDUMP_ARGS="$config_val"
FENCE_KDUMP_ARGS="$config_val -i 1"
;;
fence_kdump_nodes)
FENCE_KDUMP_NODES="$config_val"
Expand Down Expand Up @@ -533,10 +533,6 @@ wait_online_network() {

get_host_ip() {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pfliu get_host_ip will wait for the network to be ready. Maybe a simpler solution to modify get_host_ip so it will wait the network to be ready for fence_kdump? Note fence_kdump_notify has the code to check if it's fence_dkump so get_host_ip can reuse the code. I Btw, assume fence_kdump_notify is used for sending fence message so it's better get_host_ip gets moved before it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coiby, sorry to reply late. here is an assumption that saving vmcore is more important than sending out fence message.
As we have observed long wait time for the network readiness, I am a little worry about the cluster manager may reboot the crashed machine forcefully during the period. If things go that way, the vmcore will not be saved.

But I am open to this option. What is your opinion now?

thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! If I understand it correctly, the reason we send out fence message is exactly to notify the cluster manager to not reboot the machine because we are in the process of doing vmcore dumping, right? But I agree there is no need to wait for the network to be ready first since FENCE_KDUMP_SEND can keeping sending message until it succeeds. So if the purpose of fence kdump to make sure vmcore dumping will not be interrupted, does it mean there is no need to wait for the network to be ready after vmcore dumping has finished?

if ! is_nfs_dump_target && ! is_ssh_dump_target; then
return 0
fi

_kdump_remote_ip=$(getarg kdump_remote_ip=)

if [ -z "$_kdump_remote_ip" ]; then
Expand All @@ -560,6 +556,14 @@ get_host_ip() {
HOST_IP=$_kdumpip
}

remote_dump_wait_host_ip() {

if ! is_nfs_dump_target && ! is_ssh_dump_target; then
return 0
fi
get_host_ip
}

read_kdump_confs() {
if [ ! -f "$KDUMP_CONFIG_FILE" ]; then
derror "$KDUMP_CONFIG_FILE not found"
Expand Down Expand Up @@ -659,8 +663,8 @@ fi
read_kdump_confs
fence_kdump_notify

if ! get_host_ip; then
derror "get_host_ip exited with non-zero status!"
if ! remote_dump_wait_host_ip; then
derror "remote_dump_wait_host_ip exited with non-zero status!"
exit 1
fi

Expand Down Expand Up @@ -690,4 +694,12 @@ if [ $DUMP_RETVAL -ne 0 ]; then
fi

kdump_test_set_status "success"
#fence_kdump_send may fail to send a message due to slow network initialization.
#Let's wait for the network to be ready and retry.
if require_fence_message; then
get_host_ip
# Give fence_kdump_send a chance to send out message.
sleep 2
fi

do_final_action
8 changes: 8 additions & 0 deletions kdump-lib-initramfs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,14 @@ get_mntpoint_from_target()
echo $_mntpoint
}

require_fence_message()
{
if [ -n "$(kdump_get_conf_val fence_kdump_nodes)" ]; then
return 0
fi
return 1
}

is_ssh_dump_target()
{
kdump_get_conf_val ssh | grep -q @
Expand Down
Loading