fix(aws): implement cleanup on partial creation failures#612
Conversation
Pull Request Test Coverage Report for Build 21756076629Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Pull request overview
This pull request aims to implement automatic rollback for partial AWS resource creation failures by introducing a cleanup stack pattern. However, the PR also includes substantial unrelated changes adding new CLI commands (validate, provision, and registrations for describe, get, scp, ssh, update).
Changes:
- Implements LIFO cleanup stack in AWS provider's Create() method to rollback resources on failure
- Adds new
validatecommand to pre-validate environment files before creation - Adds new
provisioncommand to provision or re-provision existing instances
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| pkg/provider/aws/create.go | Adds cleanup stack pattern with defer to rollback created AWS resources on failure |
| cmd/cli/validate/validate.go | New command to validate environment files, SSH keys, AWS credentials, and component dependencies |
| cmd/cli/provision/provision.go | New command supporting instance-mode and SSH-mode provisioning with kubeconfig download |
| cmd/cli/main.go | Registers 7 new CLI commands (validate, provision, describe, get, scp, ssh, update) |
cmd/cli/provision/provision.go
Outdated
| env.Spec.HostUrl = m.host | ||
| env.Spec.PrivateKey = m.keyPath | ||
| env.Spec.Username = m.username |
There was a problem hiding this comment.
Bug: The field paths are incorrect. Based on the Environment type definition, these should be env.Spec.Instance.HostUrl (not env.Spec.HostUrl), env.Spec.Auth.PrivateKey (not env.Spec.PrivateKey), and env.Spec.Auth.Username (not env.Spec.Username). This code will not compile or will set wrong fields.
| env.Spec.HostUrl = m.host | |
| env.Spec.PrivateKey = m.keyPath | |
| env.Spec.Username = m.username | |
| env.Spec.Instance.HostUrl = m.host | |
| env.Spec.Auth.PrivateKey = m.keyPath | |
| env.Spec.Auth.Username = m.username |
cmd/cli/provision/provision.go
Outdated
| } | ||
|
|
||
| // Create provisioner and run | ||
| p, err := provisioner.New(m.log, env.Spec.PrivateKey, env.Spec.Username, hostUrl) |
There was a problem hiding this comment.
Bug: The field paths are incorrect. Based on the Environment type definition, these should be env.Spec.Auth.PrivateKey and env.Spec.Auth.Username (not env.Spec.PrivateKey and env.Spec.Username). This code will not compile or will access wrong fields.
cmd/cli/provision/provision.go
Outdated
| env.Spec.PrivateKey, | ||
| env.Spec.Username, |
There was a problem hiding this comment.
Bug: The field paths are incorrect. Based on the Environment type definition, these should be env.Spec.Auth.PrivateKey and env.Spec.Auth.Username (not env.Spec.PrivateKey and env.Spec.Username). This code will not compile or will access wrong fields.
cmd/cli/provision/provision.go
Outdated
| } | ||
| } | ||
| } else if env.Spec.Provider == v1alpha1.ProviderSSH { | ||
| return env.Spec.HostUrl, nil |
There was a problem hiding this comment.
Bug: The field path is incorrect. Based on the Environment type definition, this should be env.Spec.Instance.HostUrl (not env.Spec.HostUrl). This code will not compile or will access the wrong field.
| return env.Spec.HostUrl, nil | |
| return env.Spec.Instance.HostUrl, nil |
cmd/cli/main.go
Outdated
| describe.NewCommand(log), | ||
| dryrun.NewCommand(log), | ||
| get.NewCommand(log), | ||
| list.NewCommand(log), | ||
| oscmd.NewCommand(log), | ||
| provision.NewCommand(log), | ||
| scp.NewCommand(log), | ||
| ssh.NewCommand(log), | ||
| status.NewCommand(log), | ||
| update.NewCommand(log), | ||
| validate.NewCommand(log), |
There was a problem hiding this comment.
This PR includes changes to register multiple new CLI commands (describe, get, provision, scp, ssh, update, validate) that appear unrelated to the stated purpose of "implement cleanup on partial creation failures". Only the validate and provision commands have implementation files in this PR, but the others are registered without implementations. This makes the PR scope unclear and harder to review. Consider splitting these changes into separate PRs focused on specific features.
| return p.deleteEC2Instances(cleanupCache) | ||
| }) | ||
|
|
||
| // Save objects ID's into a cache file |
There was a problem hiding this comment.
Bug: If updateAvailableCondition fails after all resources are successfully created, the defer cleanup will be triggered and roll back all successfully created resources. This would leave the AWS resources in limbo - created but not tracked. Consider setting a flag to disable cleanup after all resources are successfully created, or clearing the cleanup stack before this call.
| // Save objects ID's into a cache file | |
| // Save objects ID's into a cache file. At this point all AWS resources | |
| // have been successfully created, so we must not run the deferred | |
| // cleanup logic on failure, otherwise we would delete live resources | |
| // that are no longer tracked. Clear the cleanup stack before updating | |
| // the available condition. | |
| cleanupStack = nil |
cmd/cli/provision/provision.go
Outdated
|
|
||
| // getKubeconfigPath returns the path to save kubeconfig | ||
| func getKubeconfigPath(instanceID string) string { | ||
| homeDir, err := os.UserHomeDir() | ||
| if err != nil { | ||
| return fmt.Sprintf("kubeconfig-%s", instanceID) | ||
| } | ||
| kubeDir := filepath.Join(homeDir, ".kube") | ||
| _ = os.MkdirAll(kubeDir, 0755) | ||
| return filepath.Join(kubeDir, fmt.Sprintf("config-%s", instanceID)) | ||
| } |
There was a problem hiding this comment.
Dead code: The function getKubeconfigPath is defined but never called. Consider removing it if it's not needed, or use it if it was intended to be used somewhere.
| // getKubeconfigPath returns the path to save kubeconfig | |
| func getKubeconfigPath(instanceID string) string { | |
| homeDir, err := os.UserHomeDir() | |
| if err != nil { | |
| return fmt.Sprintf("kubeconfig-%s", instanceID) | |
| } | |
| kubeDir := filepath.Join(homeDir, ".kube") | |
| _ = os.MkdirAll(kubeDir, 0755) | |
| return filepath.Join(kubeDir, fmt.Sprintf("config-%s", instanceID)) | |
| } |
cmd/cli/provision/provision.go
Outdated
| } | ||
| } | ||
|
|
||
| // Update provisioned status |
There was a problem hiding this comment.
Potential nil pointer: If env.Labels is nil, this will panic. Consider initializing the map if it's nil before setting the key, e.g., if env.Labels == nil { env.Labels = make(map[string]string) }.
| // Update provisioned status | |
| // Update provisioned status | |
| if env.Labels == nil { | |
| env.Labels = make(map[string]string) | |
| } |
386ff08 to
e9424ea
Compare
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
e9424ea to
1851e25
Compare
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Summary
Implement automatic rollback when
Create()fails partway through, preventing orphaned AWS resources.Changes
Cleanup Stack Pattern
Resources Covered
deleteVPC()deleteSubnet()deleteInternetGateway()deleteRouteTable()deleteSecurityGroups()deleteEC2Instances()Impact
Test plan
go build ./pkg/provider/aws/...- compiles