-
Notifications
You must be signed in to change notification settings - Fork 11
Enhance libvirt ssh timeouts and feedback #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly enhances the user experience when connecting to libvirt VMs by reducing the SSH timeout and providing valuable boot feedback through console scraping. The introduction of a retry mechanism for SSH connections is a robust improvement. The code is generally well-structured, but I've identified a critical deadlock issue in the console feedback implementation that needs to be addressed. I've also suggested a refactoring to improve maintainability by reducing code duplication.
e8e7983 to
8bf9a2d
Compare
crates/kit/src/libvirt/ssh.rs
Outdated
|
|
||
| // Check if retryable (common errors only) | ||
| let stderr_str = String::from_utf8_lossy(&output.stderr); | ||
| let is_retryable = stderr_str.contains("Connection refused") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems really fragile. I'd say we just default to retrying on all errors instead.
That said, what we could also do is direct polling of the socket without forking the ssh binary, and only start spawning SSH when something is at least listening on the port?
There's a ton of prior art on this topic of course...probably worth looking at what e.g. Ansible does?
8bf9a2d to
dad9814
Compare
crates/kit/src/libvirt/ssh.rs
Outdated
| let common_opts = crate::ssh::CommonSshOptions { | ||
| strict_host_keys: self.strict_host_keys, | ||
| connect_timeout: self.timeout, | ||
| server_alive_interval: 60, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit this one could also be a const. Or probably even better we should have it be part of the impl Default for the struct and use ..Default::default()
crates/kit/src/libvirt/ssh.rs
Outdated
| } | ||
| } | ||
|
|
||
| // For command execution: retry with timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is conceptually a bit dangerous as we might end up running the command multiple times.
It seems simpler to have a "connectivity check" that happens for both interactive vs not (per above) and then always do a oneshot attempt at the command.
| Self { | ||
| strict_host_keys: false, | ||
| connect_timeout: 30, | ||
| connect_timeout: 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one could probably be even shorter, maybe 1s even for interactive use cases - might as well burn a few more CPU cycles to get the user into a shell faster.
dad9814 to
1525494
Compare
Reduce default SSH timeout from 30s to 5s for faster failure detection. Implement 60-second time-based retry with 1-second polling intervals. Do connectivity check first with retries, then oneshot command execution to prevent dangerous retries of user commands. Signed-off-by: gursewak1997 <[email protected]>
1525494 to
76c1f1e
Compare
Reduce default SSH timeout from 30s to 5s for faster failure detection. When SSH is not ready, retry up to 2 times while scraping the VM console via to show boot progress. This provides users
with feedback during the boot process instead of silent waiting.
Changes:
Addresses the idea to speed up timeouts and ensure feedback by scraping hvc0 console output when vsock is unavailable.
Fixes: #164