Skip to content

fix(pkg): use %w for proper error wrapping#609

Merged
ArangoGutierrez merged 1 commit intoNVIDIA:mainfrom
ArangoGutierrez:fix/error-wrapping-pkg
Feb 6, 2026
Merged

fix(pkg): use %w for proper error wrapping#609
ArangoGutierrez merged 1 commit intoNVIDIA:mainfrom
ArangoGutierrez:fix/error-wrapping-pkg

Conversation

@ArangoGutierrez
Copy link
Collaborator

Summary

Replace all fmt.Errorf("...: %v", err) with fmt.Errorf("...: %w", err) in the pkg/ directory for proper error chain preservation.

Motivation

Using %v to format errors breaks the error chain, making it impossible for callers to use errors.Is() or errors.As() to inspect wrapped errors. This makes debugging difficult and prevents proper error handling patterns.

Changes

111 instances fixed across 13 files:

Package File Changes
pkg/provider/aws cluster.go 18
pkg/provider/aws create.go 23
pkg/provider/aws delete.go 17
pkg/provider/aws dryrun.go 1
pkg/provisioner provisioner.go 35
pkg/provisioner/templates container-toolkit.go 1
pkg/provisioner/templates containerd.go 1
pkg/provisioner/templates crio.go 1
pkg/provisioner/templates docker.go 1
pkg/provisioner/templates kubernetes.go 3
pkg/provisioner/templates nv-driver.go 1
pkg/utils ip.go 3
pkg/utils kubeconfig.go 6

What Changed

// Before
return fmt.Errorf("failed to create VPC: %v", err)

// After  
return fmt.Errorf("failed to create VPC: %w", err)

Not Changed

  • Error slices ([]error) - %w only works with single errors
  • Non-fmt.Errorf contexts (loggers, test helpers)
  • Files in cmd/ or vendor/ directories

Test plan

  • go build ./pkg/... - compiles successfully
  • go test ./pkg/... - verify no regressions

Copilot AI review requested due to automatic review settings February 4, 2026 16:13
@coveralls
Copy link

coveralls commented Feb 4, 2026

Pull Request Test Coverage Report for Build 21754753844

Details

  • 22 of 111 (19.82%) changed or added relevant lines in 13 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 45.946%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/provisioner/templates/container-toolkit.go 0 1 0.0%
pkg/provisioner/templates/containerd.go 0 1 0.0%
pkg/provisioner/templates/crio.go 0 1 0.0%
pkg/provisioner/templates/docker.go 0 1 0.0%
pkg/provisioner/templates/nv-driver.go 0 1 0.0%
pkg/utils/ip.go 2 3 66.67%
pkg/provisioner/templates/kubernetes.go 0 3 0.0%
pkg/provider/aws/create.go 18 23 78.26%
pkg/utils/kubeconfig.go 0 6 0.0%
pkg/provider/aws/delete.go 1 17 5.88%
Totals Coverage Status
Change from base Build 21753110877: 0.0%
Covered Lines: 2091
Relevant Lines: 4551

💛 - Coveralls

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates error wrapping across pkg/ to preserve error chains (switching fmt.Errorf(...: %v, err) to %w) and also introduces new CLI surface area (new validate/provision commands and additional command wiring in cmd/cli/main.go).

Changes:

  • Replace %v with %w in fmt.Errorf calls across pkg/ for proper error wrapping.
  • Add new CLI commands: holodeck validate and holodeck provision.
  • Expand CLI command registration/help text in cmd/cli/main.go (including new, currently-missing command packages).

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pkg/utils/kubeconfig.go Wrap SSH/session/file transfer errors with %w for chain preservation.
pkg/utils/ip.go Wrap HTTP request/response errors with %w.
pkg/provisioner/templates/nv-driver.go Wrap template execution errors with %w.
pkg/provisioner/templates/kubernetes.go Wrap template execution and file/working-dir errors with %w.
pkg/provisioner/templates/docker.go Wrap template execution errors with %w.
pkg/provisioner/templates/crio.go Wrap template execution errors with %w.
pkg/provisioner/templates/containerd.go Wrap template execution errors with %w.
pkg/provisioner/templates/container-toolkit.go Wrap template execution errors with %w.
pkg/provisioner/provisioner.go Wrap provisioning/SSH/SFTP/template errors with %w.
pkg/provider/aws/dryrun.go Wrap image-check errors with %w.
pkg/provider/aws/delete.go Wrap delete/condition-update/AWS API errors with %w.
pkg/provider/aws/create.go Wrap AWS resource creation/tagging/waiter errors with %w.
pkg/provider/aws/cluster.go Wrap cluster-creation workflow errors with %w (including goroutine error propagation).
cmd/cli/validate/validate.go New validate command for env file, required fields, SSH keys, AWS creds, component checks.
cmd/cli/provision/provision.go New provision command for instance-based or direct SSH provisioning (+ kubeconfig download).
cmd/cli/main.go Registers additional CLI commands and updates help/examples.


env, err := jyaml.UnmarshalFromFile[v1alpha1.Environment](m.envFile)
if err != nil {
return nil, fmt.Errorf("invalid YAML: %v", err)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateEnvFile uses fmt.Errorf("invalid YAML: %v", err), which drops the wrapped error for errors.Is/As. Use %w when returning the underlying parse error.

Suggested change
return nil, fmt.Errorf("invalid YAML: %v", err)
return nil, fmt.Errorf("invalid YAML: %w", err)

Copilot uses AI. Check for mistakes.
Comment on lines 97 to 173
func (m *command) run() error {
results := make([]ValidationResult, 0)
hasErrors := false
hasWarnings := false

// 1. Validate environment file exists and is valid YAML
env, err := m.validateEnvFile()
if err != nil {
results = append(results, ValidationResult{
Check: "Environment file",
Passed: false,
Message: err.Error(),
})
hasErrors = true
m.printResults(results)
return fmt.Errorf("validation failed")
}
results = append(results, ValidationResult{
Check: "Environment file",
Passed: true,
Message: "Valid YAML structure",
})

// 2. Validate required fields
fieldResults := m.validateRequiredFields(env)
for _, r := range fieldResults {
results = append(results, r)
if !r.Passed {
hasErrors = true
}
}

// 3. Validate SSH keys
keyResults := m.validateSSHKeys(env)
for _, r := range keyResults {
results = append(results, r)
if !r.Passed {
hasErrors = true
}
}

// 4. Validate AWS credentials (if AWS provider)
if env.Spec.Provider == v1alpha1.ProviderAWS {
awsResult := m.validateAWSCredentials()
results = append(results, awsResult)
if !awsResult.Passed {
if strings.Contains(awsResult.Message, "warning") {
hasWarnings = true
} else {
hasErrors = true
}
}
}

// 5. Validate component configuration
compResults := m.validateComponents(env)
for _, r := range compResults {
results = append(results, r)
if !r.Passed {
hasWarnings = true
}
}

// Print results
m.printResults(results)

// Determine exit status
if hasErrors {
return fmt.Errorf("validation failed with errors")
}
if hasWarnings && m.strict {
return fmt.Errorf("validation failed with warnings (strict mode)")
}

m.log.Info("\n✅ Validation passed")
return nil
}
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most cmd/cli/* subcommands have unit tests (e.g., cmd/cli/delete/delete_test.go, cmd/cli/dryrun/dryrun_test.go). This new command introduces substantial logic but adds no tests; please add at least basic table-driven tests for required-field validation and key/AWS credential checks (mockable paths).

Copilot generated this review using guidance from repository custom instructions.
Comment on lines 126 to 167
Action: func(c *cli.Context) error {
if m.sshMode {
return m.runSSHMode()
}

if c.NArg() != 1 {
return fmt.Errorf("instance ID is required (or use --ssh mode)")
}
return m.runInstanceMode(c.Args().Get(0))
},
}

return &provisionCmd
}

func (m *command) runInstanceMode(instanceID string) error {
// Get instance details
manager := instances.NewManager(m.log, m.cachePath)
instance, err := manager.GetInstance(instanceID)
if err != nil {
return fmt.Errorf("failed to get instance: %v", err)
}

// Load environment
env, err := jyaml.UnmarshalFromFile[v1alpha1.Environment](instance.CacheFile)
if err != nil {
return fmt.Errorf("failed to read environment: %v", err)
}

m.log.Info("Provisioning instance %s...", instanceID)

// Run provisioning based on instance type
if env.Spec.Cluster != nil && env.Status.Cluster != nil && len(env.Status.Cluster.Nodes) > 0 {
if err := m.runClusterProvision(&env); err != nil {
return err
}
} else {
if err := m.runSingleNodeProvision(&env); err != nil {
return err
}
}

Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most cmd/cli/* subcommands have unit tests, but this new command adds significant branching behavior (instance vs SSH mode, cluster vs single-node) without tests. Add unit tests for argument/flag validation and host URL selection logic to prevent regressions.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines 410 to 414
var err error
key, err := os.ReadFile(keyPath) // nolint:gosec
if err != nil {
return nil, fmt.Errorf("failed to read key file: %v", err)
return nil, fmt.Errorf("failed to read key file: %w", err)
}
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectOrDie reads keyPath directly with os.ReadFile, so values like ~/.ssh/id_rsa will fail (no shell expansion). Several CLI paths/examples use ~, and cmd/cli/validate even expands it, so validation may pass but provisioning can still fail. Consider expanding ~ (and/or making the path absolute) before reading the key.

Copilot uses AI. Check for mistakes.
cmd/cli/main.go Outdated
Comment on lines 22 to 35
"github.com/NVIDIA/holodeck/cmd/cli/cleanup"
"github.com/NVIDIA/holodeck/cmd/cli/create"
"github.com/NVIDIA/holodeck/cmd/cli/delete"
"github.com/NVIDIA/holodeck/cmd/cli/describe"
"github.com/NVIDIA/holodeck/cmd/cli/dryrun"
"github.com/NVIDIA/holodeck/cmd/cli/get"
"github.com/NVIDIA/holodeck/cmd/cli/list"
oscmd "github.com/NVIDIA/holodeck/cmd/cli/os"
"github.com/NVIDIA/holodeck/cmd/cli/provision"
"github.com/NVIDIA/holodeck/cmd/cli/scp"
"github.com/NVIDIA/holodeck/cmd/cli/ssh"
"github.com/NVIDIA/holodeck/cmd/cli/status"
"github.com/NVIDIA/holodeck/cmd/cli/update"
"github.com/NVIDIA/holodeck/cmd/cli/validate"
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd/cli/main.go imports and registers subcommands (describe/get/scp/ssh/update) that do not exist under cmd/cli/ in this PR/repo, which will break compilation due to missing packages. Either add the missing command packages or remove these imports/command registrations until they’re implemented.

Copilot uses AI. Check for mistakes.
Comment on lines 112 to 116
c.Commands = []*cli.Command{
cleanup.NewCommand(log),
create.NewCommand(log),
delete.NewCommand(log),
describe.NewCommand(log),
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title/description indicate a scoped change in pkg/ to improve error wrapping, but this file also adds multiple new CLI commands to the top-level app. If these CLI additions are intentional, the PR metadata should be updated; otherwise, the cmd/ changes should be split out to keep this PR focused.

Copilot uses AI. Check for mistakes.
}
}

// Update provisioned status
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env.Labels can be nil when unmarshaled; writing to it directly can panic. Initialize the map before setting instances.InstanceProvisionedLabelKey.

Suggested change
// Update provisioned status
// Update provisioned status
if env.Labels == nil {
env.Labels = make(map[string]string)
}

Copilot uses AI. Check for mistakes.
Comment on lines 144 to 153
instance, err := manager.GetInstance(instanceID)
if err != nil {
return fmt.Errorf("failed to get instance: %v", err)
}

// Load environment
env, err := jyaml.UnmarshalFromFile[v1alpha1.Environment](instance.CacheFile)
if err != nil {
return fmt.Errorf("failed to read environment: %v", err)
}
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file wraps underlying errors with fmt.Errorf(... %v, err) in multiple places (e.g., reading instance/env). That loses the error chain and is inconsistent with the stated goal of this PR; use %w where returning a wrapped error so callers can use errors.Is/As.

Copilot uses AI. Check for mistakes.
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
@ArangoGutierrez ArangoGutierrez enabled auto-merge (squash) February 6, 2026 15:01
@ArangoGutierrez ArangoGutierrez merged commit e253b11 into NVIDIA:main Feb 6, 2026
19 checks passed
ArangoGutierrez added a commit to ArangoGutierrez/holodeck that referenced this pull request Feb 10, 2026
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
ArangoGutierrez added a commit to ArangoGutierrez/holodeck that referenced this pull request Feb 10, 2026
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
ArangoGutierrez added a commit to ArangoGutierrez/holodeck that referenced this pull request Feb 10, 2026
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
ArangoGutierrez added a commit to ArangoGutierrez/holodeck that referenced this pull request Feb 12, 2026
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
ArangoGutierrez added a commit to ArangoGutierrez/holodeck that referenced this pull request Feb 13, 2026
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
ArangoGutierrez added a commit to ArangoGutierrez/holodeck that referenced this pull request Feb 13, 2026
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants