Skip to content

Conversation

@liutgnu
Copy link
Collaborator

@liutgnu liutgnu commented Nov 12, 2025

In kdumpctl and mkdumprd, for nfs dump target, cmd mount will be used to mount nfs dump target to local directory. However there is no time limit for mount, and it may take a long time for mount to finish, no matter success or fail, when network connection is unstable.

E.g we can emulate the package loss using the following command:

$ tc qdisc add dev tun0 root handle 1: prio
$ tc qdisc add dev tun0 parent 1:3 handle 30: netem delay 2000ms 1000ms loss 80%
$ tc filter add dev tun0 protocol ip parent 1:0 prio 3 u32 match ip dst <nfs_server_ip> flowid 1:3

This will emulate 80% network package loss to <nfs_server_ip>, then we invoke nfs mount:

$ time mount -t nfs <nfs_server_ip>:<nfs_dir> /mnt

real	354m56.200s
user	0m0.003s
sys		0m0.013s

$ mount | grep nfs
<nfs_server_ip>:<nfs_dir> on /mnt type nfs4 (...)

As we can see, though mount succeeded finally, it took near 6h to finish. The long time waiting makes no sense and should be ended as early, so user can check network issues and retry.

This issue also noticed in QE testing, because for beaker test tasks, if one task doesn't finish within a specific time period, the task will be killed and fail. We'd better set a timeout limit for nfs mount, in order to prevent it from connecting to nfs server timelessly.

This patch will set timeout duration as 5min for kdumpctl, and 10min for mkdumprd, as I think they are enough for normal cases as well as bail out as early if network do encounter problems.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @liutgnu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces crucial timeout mechanisms for Network File System (NFS) mount operations within the kdumpctl and mkdumprd utilities. Previously, unstable network conditions could cause these mounts to hang for extended periods, impacting system reliability and test execution. The new timeouts ensure that NFS mount attempts fail gracefully within a defined timeframe, allowing for quicker diagnosis of network problems and improving overall system resilience.

Highlights

  • NFS Mount Timeout in kdumpctl: Implemented a 5-minute timeout for NFS mount operations within the kdumpctl script to prevent indefinite hangs during network instability.
  • NFS Mount Timeout in mkdumprd: Introduced a 10-minute timeout for NFS mount operations in mkdumprd to ensure timely failure and user feedback when network issues occur.
  • Improved NFS Error Message: Enhanced the error message for NFS mount failures in mkdumprd to explicitly suggest checking NFS server accessibility, providing clearer guidance to users.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a timeout for NFS mount operations in kdumpctl and mkdumprd to prevent indefinite hangs during network instability. The changes are well-implemented and address the issue described. The updated error message in mkdumprd is also a helpful addition. I have a couple of minor suggestions to initialize variables for improved code clarity and robustness.

In kdumpctl and mkdumprd, for nfs dump target, cmd mount will be used to
mount nfs dump target to local directory. However there is no time limit
for mount, and it may take a long time for mount to finish, no matter
success or fail, when network connection is unstable.

E.g we can emulate the package loss using the following command:

    $ tc qdisc add dev tun0 root handle 1: prio
    $ tc qdisc add dev tun0 parent 1:3 handle 30: netem delay 2000ms 1000ms loss 80%
    $ tc filter add dev tun0 protocol ip parent 1:0 prio 3 u32 match ip dst <nfs_server_ip> flowid 1:3

This will emulate 80% network package loss to <nfs_server_ip>, then we
invoke nfs mount:

    $ time mount -t nfs <nfs_server_ip>:<nfs_dir> /mnt

    real	354m56.200s
    user	0m0.003s
    sys		0m0.013s

    $ mount | grep nfs
    <nfs_server_ip>:<nfs_dir> on /mnt type nfs4 (...)

As we can see, though mount succeeded finally, it took near 6h to finish. The
long time waiting makes no sense and should be ended as early, so user can
check network issues and retry.

This issue also noticed in QE testing, because for beaker test tasks, if
one task doesn't finish within a specific time period, the task will be
killed and fail. We'd better set a timeout limit for nfs mount, in order
to prevent it from connecting to nfs server timelessly.

This patch will set timeout duration as 5min for kdumpctl, and 10min for
mkdumprd, as I think they are enough for normal cases as well as bail
out as early if network do encounter problems.

Signed-off-by: Tao Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant