Skip to content

Conversation

@gursewak1997
Copy link
Collaborator

@gursewak1997 gursewak1997 commented Dec 19, 2025

Reduce default SSH timeout from 30s to 5s for faster failure detection. When SSH is not ready, retry up to 2 times while scraping the VM console via to show boot progress. This provides users
with feedback during the boot process instead of silent waiting.

Changes:

  • Reduce default SSH connection timeout from 30s to 5s
  • Implement retry logic with console scraping between attempts
  • Show systemd boot messages, service starts, and SSH readiness
  • Properly clean up virsh console child processes to avoid zombies
  • Thread-based console reading to prevent indefinite blocking

Addresses the idea to speed up timeouts and ensure feedback by scraping hvc0 console output when vsock is unavailable.

Fixes: #164

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the user experience when connecting to libvirt VMs by reducing the SSH timeout and providing valuable boot feedback through console scraping. The introduction of a retry mechanism for SSH connections is a robust improvement. The code is generally well-structured, but I've identified a critical deadlock issue in the console feedback implementation that needs to be addressed. I've also suggested a refactoring to improve maintainability by reducing code duplication.

@gursewak1997 gursewak1997 force-pushed the enhance-libvirt-ssh-timeouts branch 2 times, most recently from e8e7983 to 8bf9a2d Compare December 22, 2025 03:13

// Check if retryable (common errors only)
let stderr_str = String::from_utf8_lossy(&output.stderr);
let is_retryable = stderr_str.contains("Connection refused")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems really fragile. I'd say we just default to retrying on all errors instead.

That said, what we could also do is direct polling of the socket without forking the ssh binary, and only start spawning SSH when something is at least listening on the port?

There's a ton of prior art on this topic of course...probably worth looking at what e.g. Ansible does?

@gursewak1997 gursewak1997 force-pushed the enhance-libvirt-ssh-timeouts branch from 8bf9a2d to dad9814 Compare January 5, 2026 21:50
let common_opts = crate::ssh::CommonSshOptions {
strict_host_keys: self.strict_host_keys,
connect_timeout: self.timeout,
server_alive_interval: 60,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit this one could also be a const. Or probably even better we should have it be part of the impl Default for the struct and use ..Default::default()

}
}

// For command execution: retry with timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is conceptually a bit dangerous as we might end up running the command multiple times.

It seems simpler to have a "connectivity check" that happens for both interactive vs not (per above) and then always do a oneshot attempt at the command.

Self {
strict_host_keys: false,
connect_timeout: 30,
connect_timeout: 5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one could probably be even shorter, maybe 1s even for interactive use cases - might as well burn a few more CPU cycles to get the user into a shell faster.

@cgwalters cgwalters force-pushed the enhance-libvirt-ssh-timeouts branch from dad9814 to 1525494 Compare January 16, 2026 21:53
Reduce default SSH timeout from 30s to 5s for faster failure detection.
Implement 60-second time-based retry with 1-second polling intervals.
Do connectivity check first with retries, then oneshot command execution
to prevent dangerous retries of user commands.

Signed-off-by: gursewak1997 <[email protected]>
@gursewak1997 gursewak1997 force-pushed the enhance-libvirt-ssh-timeouts branch from 1525494 to 76c1f1e Compare January 19, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enhance libvirt ssh timeouts and feedback

3 participants