Skip to content

Conversation

@mpryc
Copy link

@mpryc mpryc commented Dec 17, 2025

Prevents OADP CLI operation hanging indefinitely.

This PR adds a --request-timeout flag (following kubectl conventions)
to all OADP CLI commands, with full support for cluster unreachability:

  • Add --request-timeout flag to nonadmin backup logs and describe commands
    (takes precedence over OADP_CLI_REQUEST_TIMEOUT env var)
  • Add timeoutFactory wrapper that applies dial timeout to all Velero
    commands by overriding all client-creating methods (KubeClient,
    DynamicClient, DiscoveryClient, KubebuilderClient, KubebuilderWatchClient)
  • Add renameTimeoutFlag to rename Velero's --timeout to --request-timeout
    for consistent kubectl-style CLI experience
  • Set custom net.Dialer with timeout to handle TCP dial timeouts
  • Add context-based timeout handling with proper cancellation detection
  • Add FormatDownloadRequestTimeoutError for helpful timeout diagnostics
  • Add tests for timeout functionality (global timeout, config application,
    dialer timeout behavior)

The timeout now applies to both HTTP requests and TCP connection
attempts across all commands, ensuring the CLI times out quickly when
the cluster is unreachable instead of waiting for the default ~30s TCP
timeout.

This PR includes similar changes to #99

Prevents OADP CLI operation hanging indefinitely.

Introduced default timeout of 10min  which can be
controlled via OADP_CLI_HTTP_TIMEOUT variable.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Michal Pryc <[email protected]>
This commit adds a --request-timeout flag (following kubectl conventions)
to all OADP CLI commands, with full support for cluster unreachability:

- Add --request-timeout flag to nonadmin backup logs and describe commands
  (takes precedence over OADP_CLI_REQUEST_TIMEOUT env var)
- Add timeoutFactory wrapper that applies dial timeout to all Velero
  commands by overriding all client-creating methods (KubeClient,
  DynamicClient, DiscoveryClient, KubebuilderClient, KubebuilderWatchClient)
- Add renameTimeoutFlag to rename Velero's --timeout to --request-timeout
  for consistent kubectl-style CLI experience
- Set custom net.Dialer with timeout to handle TCP dial timeouts
- Add context-based timeout handling with proper cancellation detection
- Add FormatDownloadRequestTimeoutError for helpful timeout diagnostics
- Add tests for timeout functionality (global timeout, config application,
  dialer timeout behavior)

The timeout now applies to both HTTP requests and TCP connection
attempts across all commands, ensuring the CLI times out quickly when
the cluster is unreachable instead of waiting for the default ~30s TCP
timeout.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Signed-off-by: Michal Pryc <[email protected]>
Comment on lines +40 to +42
// TimeoutEnvVar is the environment variable name that can be used to override the default timeout.
// Example: OADP_CLI_REQUEST_TIMEOUT=30m kubectl oadp nonadmin backup logs my-backup
const TimeoutEnvVar = "OADP_CLI_REQUEST_TIMEOUT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally should document valid formats, ie. days month year valid? only hour/min/second?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could mention/link the libraries used to parse which has doc.

}
statusInfo = fmt.Sprintf(" Current status: %s.", strings.Join(conditions, ", "))
}
return fmt.Errorf("timed out after %v waiting for NonAdminDownloadRequest %q to be processed.%s", timeout, req.Name, statusInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("timed out after %v waiting for NonAdminDownloadRequest %q to be processed.%s", timeout, req.Name, statusInfo)
return fmt.Errorf("timed out after %v waiting for NonAdminDownloadRequest %q to be processed. statusInfo: %s", timeout, req.Name, statusInfo)

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few nits. could be follow up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants