Skip to content

Commit 75af502

Browse files
feat(cli): complete CLI with full CRUD operations (NVIDIA#563) (NVIDIA#621)
* docs: add CLI CRUD and verbosity design document Design for issue NVIDIA#563 covering: - New CLI commands (describe, get, ssh, scp, update) - Global verbosity flags (-q, -v, -d) - Output formatting infrastructure - Implementation tasks and testing strategy Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * docs: add CLI CRUD implementation plan Detailed task-by-task implementation plan for: - Global verbosity flags (-q, -v, -d) - Unit tests for all new CLI commands - Output formatter tests 11 tasks with TDD approach and commit checkpoints. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * feat(logger): add verbosity levels with Debug and Trace methods Add Verbosity type with four levels (Quiet, Normal, Verbose, Debug) to control log output. Info, Check, and Warning methods now respect verbosity settings, while Error always prints. New Debug and Trace methods provide additional logging granularity for troubleshooting. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * feat(cli): add global verbosity flags (-q, --verbose, -d) Add global flags for controlling output verbosity: - --quiet, -q: Suppress non-error output - --verbose: Enable verbose output - --debug, -d: Enable debug-level logging (also reads DEBUG env var) Flag precedence: --debug > --verbose > --quiet Note: -v was not used for --verbose to avoid conflict with built-in --version flag. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * refactor(cli): rename list --quiet to --ids-only Rename the --quiet flag to --ids-only to avoid conflict with the new global --quiet flag that controls verbosity. The --ids-only flag now only outputs instance IDs, one per line. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * test(output): add unit tests for output formatting package Add comprehensive unit tests for the pkg/output package covering: - ValidFormats() and IsValidFormat() validation functions - NewFormatter() with empty, valid, and invalid format inputs - Formatter.Format() accessor method - PrintJSON() with struct, map, and slice data types - PrintYAML() with struct, map, and nested struct data - Print() dispatch logic for JSON, YAML, and table formats - PrintTable() with mock TableData interface - TablePrinter fluent interface (Header, Row, Flush) - SetWriter() for output redirection - Error message quality for invalid formats Achieves 97.9% code coverage. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * feat(output): add output formatting package with table/JSON/YAML support Adds pkg/output with Formatter, TablePrinter, and TableData interface for consistent output formatting across all CLI commands. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * feat(cli): add output formatting to status command Adds -o flag (json/yaml/table) to holodeck status with structured output types for JSON/YAML serialization. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * feat(cli): add describe, get, ssh, scp, update commands with tests New commands for issue NVIDIA#563: - describe: detailed instance introspection with JSON/YAML output - get kubeconfig: download kubeconfig from instance - get ssh-config: generate SSH config entry - ssh: SSH into instance or run remote commands - scp: copy files to/from instance via SFTP - update: update instance configuration (add components, labels) All commands include unit tests for core logic. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * test(list): update tests for --ids-only flag rename Updates list_test.go to use --ids-only instead of the old --quiet/-q flag which was renamed to avoid conflict with the global -q flag. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * fix(cli): fix nil map panic and SSH command quoting C1: Add nil check for env.Labels before writing provisioned label in update --reprovision. Previously panicked when no --label flags were provided and the cached environment had no labels. C2: Replace incorrect containsSpace+fmt.Sprintf("%q") quoting with simple strings.Join. SSH session.Run always passes through the remote shell, so Go-style quoting was wrong. Document this behavior. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * refactor(cli): extract shared utilities and fix review issues - Extract getHostURL and connectSSH into cmd/cli/common package, eliminating duplication across ssh, scp, and get commands (I1) - Use path instead of filepath for remote SFTP paths to ensure correct POSIX separators on all platforms (I2) - Log warnings for skipped files during recursive remote copy (I3) - Use c.IsSet() for flags with defaults to prevent overwriting existing config with default values (I4) Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * fix(ci): resolve lint issues, update go.mod, and remove plan docs - Fix gofmt, gosec, staticcheck, unconvert, and errcheck lint issues - Promote gopkg.in/yaml.v3 from indirect to direct dependency - Remove planning documents (development artifacts) Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * fix(security): correct misleading host key comments (CWE-322) Address CodeQL alert NVIDIA#6 (go/insecure-hostkeycallback) by replacing the inaccurate comment about "no pre-established host keys". The env file requires SSH authentication keys (client-to-server), but server host keys are a separate concern — they are generated at instance boot with no trusted distribution channel to the client. The security trade-off is now explicitly documented with CWE reference. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * refactor(cli): use common.GetHostURL in update runProvision (S4) Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * fix(cli): consolidate update command cache file writes (C2) Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * feat(cli): add -q as alias for --ids-only for backwards compatibility (S1) Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * refactor(cli): extract SSH retry constants in ConnectSSH (S2) Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * docs(cli): document SSH remote command quoting behavior (S3) Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * fix(logger): warnings always print regardless of verbosity (M1) Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * refactor(cli): consolidate GetHostURL tests into common package (M3) Move duplicate GetHostURL tests from cmd/cli/get/ and cmd/cli/ssh/ into cmd/cli/common/host_test.go. Merge overlapping test cases and add a new TestGetHostURL_Cluster_FallbackToFirstNode case that covers the fallback path when no control-plane node exists. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * fix(logger): make Verbosity thread-safe with atomic.Int32 (M6) Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * fix(logger): add nolint directive for gosec G115 in SetVerbosity Verbosity is an iota (0-3) so the int->int32 cast cannot overflow. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> --------- Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
1 parent 7c21161 commit 75af502

File tree

19 files changed

+4229
-135
lines changed

19 files changed

+4229
-135
lines changed

cmd/cli/common/host.go

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
/*
2+
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
package common
18+
19+
import (
20+
"fmt"
21+
"os"
22+
"time"
23+
24+
"golang.org/x/crypto/ssh"
25+
26+
"github.com/NVIDIA/holodeck/api/holodeck/v1alpha1"
27+
"github.com/NVIDIA/holodeck/internal/logger"
28+
"github.com/NVIDIA/holodeck/pkg/provider/aws"
29+
)
30+
31+
// GetHostURL resolves the SSH-reachable host URL for an environment.
32+
// If nodeName is set, it looks for that specific node.
33+
// If preferControlPlane is true and no nodeName is set, it prefers a control-plane node.
34+
// Falls back to the first available node, then single-node properties.
35+
func GetHostURL(env *v1alpha1.Environment, nodeName string, preferControlPlane bool) (string, error) {
36+
// For multinode clusters, find the appropriate node
37+
if env.Spec.Cluster != nil && env.Status.Cluster != nil && len(env.Status.Cluster.Nodes) > 0 {
38+
if nodeName != "" {
39+
for _, node := range env.Status.Cluster.Nodes {
40+
if node.Name == nodeName {
41+
return node.PublicIP, nil
42+
}
43+
}
44+
return "", fmt.Errorf("node %q not found in cluster", nodeName)
45+
}
46+
47+
if preferControlPlane {
48+
for _, node := range env.Status.Cluster.Nodes {
49+
if node.Role == "control-plane" {
50+
return node.PublicIP, nil
51+
}
52+
}
53+
}
54+
55+
// Fallback to first node
56+
return env.Status.Cluster.Nodes[0].PublicIP, nil
57+
}
58+
59+
// Single node - get from properties
60+
switch env.Spec.Provider {
61+
case v1alpha1.ProviderAWS:
62+
for _, p := range env.Status.Properties {
63+
if p.Name == aws.PublicDnsName {
64+
return p.Value, nil
65+
}
66+
}
67+
case v1alpha1.ProviderSSH:
68+
return env.Spec.HostUrl, nil
69+
}
70+
71+
return "", fmt.Errorf("unable to determine host URL")
72+
}
73+
74+
const (
75+
// sshMaxRetries is the number of SSH connection attempts before giving up.
76+
sshMaxRetries = 3
77+
// sshRetryDelay is the delay between SSH connection retry attempts.
78+
sshRetryDelay = 2 * time.Second
79+
)
80+
81+
// ConnectSSH establishes an SSH connection with retries.
82+
//
83+
// Host key verification is disabled because server host keys are generated
84+
// at instance boot time and there is no trusted channel to distribute them
85+
// to the client beforehand. The env file's privateKey/publicKey fields are
86+
// SSH *authentication* keys (client-to-server), not server host keys.
87+
func ConnectSSH(log *logger.FunLogger, keyPath, userName, hostUrl string) (*ssh.Client, error) {
88+
key, err := os.ReadFile(keyPath) //nolint:gosec // keyPath is from trusted env config
89+
if err != nil {
90+
return nil, fmt.Errorf("failed to read key file %s: %v", keyPath, err)
91+
}
92+
93+
signer, err := ssh.ParsePrivateKey(key)
94+
if err != nil {
95+
return nil, fmt.Errorf("failed to parse private key: %v", err)
96+
}
97+
98+
config := &ssh.ClientConfig{
99+
User: userName,
100+
Auth: []ssh.AuthMethod{
101+
ssh.PublicKeys(signer),
102+
},
103+
// Server host keys are generated at instance boot with no trusted
104+
// distribution channel; disable verification (CWE-322 accepted risk).
105+
HostKeyCallback: ssh.InsecureIgnoreHostKey(), //nolint:gosec
106+
Timeout: 30 * time.Second,
107+
}
108+
109+
var client *ssh.Client
110+
for i := 0; i < sshMaxRetries; i++ {
111+
client, err = ssh.Dial("tcp", hostUrl+":22", config)
112+
if err == nil {
113+
return client, nil
114+
}
115+
log.Warning("Connection attempt %d failed: %v", i+1, err)
116+
time.Sleep(sshRetryDelay)
117+
}
118+
119+
return nil, fmt.Errorf("failed to connect after %d attempts: %v", sshMaxRetries, err)
120+
}

cmd/cli/common/host_test.go

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
/*
2+
* Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
package common
18+
19+
import (
20+
"testing"
21+
22+
"github.com/NVIDIA/holodeck/api/holodeck/v1alpha1"
23+
"github.com/NVIDIA/holodeck/pkg/provider/aws"
24+
)
25+
26+
func TestGetHostURL_AWS_SingleNode(t *testing.T) {
27+
env := &v1alpha1.Environment{
28+
Spec: v1alpha1.EnvironmentSpec{
29+
Provider: v1alpha1.ProviderAWS,
30+
},
31+
Status: v1alpha1.EnvironmentStatus{
32+
Properties: []v1alpha1.Properties{
33+
{Name: aws.PublicDnsName, Value: "ec2-1-2-3-4.compute.amazonaws.com"},
34+
},
35+
},
36+
}
37+
38+
url, err := GetHostURL(env, "", false)
39+
if err != nil {
40+
t.Fatalf("unexpected error: %v", err)
41+
}
42+
if url != "ec2-1-2-3-4.compute.amazonaws.com" {
43+
t.Errorf("expected DNS name, got %s", url)
44+
}
45+
}
46+
47+
func TestGetHostURL_SSH(t *testing.T) {
48+
env := &v1alpha1.Environment{
49+
Spec: v1alpha1.EnvironmentSpec{
50+
Provider: v1alpha1.ProviderSSH,
51+
Instance: v1alpha1.Instance{
52+
HostUrl: "192.168.1.100",
53+
},
54+
},
55+
}
56+
57+
url, err := GetHostURL(env, "", false)
58+
if err != nil {
59+
t.Fatalf("unexpected error: %v", err)
60+
}
61+
if url != "192.168.1.100" {
62+
t.Errorf("expected 192.168.1.100, got %s", url)
63+
}
64+
}
65+
66+
func TestGetHostURL_NoProperties(t *testing.T) {
67+
env := &v1alpha1.Environment{
68+
Spec: v1alpha1.EnvironmentSpec{
69+
Provider: v1alpha1.ProviderAWS,
70+
},
71+
Status: v1alpha1.EnvironmentStatus{
72+
Properties: []v1alpha1.Properties{},
73+
},
74+
}
75+
76+
_, err := GetHostURL(env, "", false)
77+
if err == nil {
78+
t.Error("expected error for missing properties")
79+
}
80+
}
81+
82+
func TestGetHostURL_Cluster_PreferControlPlane(t *testing.T) {
83+
env := &v1alpha1.Environment{
84+
Spec: v1alpha1.EnvironmentSpec{
85+
Provider: v1alpha1.ProviderAWS,
86+
Cluster: &v1alpha1.ClusterSpec{},
87+
},
88+
Status: v1alpha1.EnvironmentStatus{
89+
Cluster: &v1alpha1.ClusterStatus{
90+
Nodes: []v1alpha1.NodeStatus{
91+
{Name: "worker-0", Role: "worker", PublicIP: "10.0.0.1"},
92+
{Name: "cp-0", Role: "control-plane", PublicIP: "10.0.0.2"},
93+
},
94+
},
95+
},
96+
}
97+
98+
url, err := GetHostURL(env, "", true)
99+
if err != nil {
100+
t.Fatalf("unexpected error: %v", err)
101+
}
102+
if url != "10.0.0.2" {
103+
t.Errorf("expected control-plane IP 10.0.0.2, got %s", url)
104+
}
105+
}
106+
107+
func TestGetHostURL_Cluster_FallbackToFirstNode(t *testing.T) {
108+
env := &v1alpha1.Environment{
109+
Spec: v1alpha1.EnvironmentSpec{
110+
Provider: v1alpha1.ProviderAWS,
111+
Cluster: &v1alpha1.ClusterSpec{},
112+
},
113+
Status: v1alpha1.EnvironmentStatus{
114+
Cluster: &v1alpha1.ClusterStatus{
115+
Nodes: []v1alpha1.NodeStatus{
116+
{Name: "worker-0", Role: "worker", PublicIP: "10.0.0.1"},
117+
{Name: "worker-1", Role: "worker", PublicIP: "10.0.0.2"},
118+
},
119+
},
120+
},
121+
}
122+
123+
url, err := GetHostURL(env, "", true)
124+
if err != nil {
125+
t.Fatalf("unexpected error: %v", err)
126+
}
127+
if url != "10.0.0.1" {
128+
t.Errorf("expected first node IP 10.0.0.1, got %s", url)
129+
}
130+
}
131+
132+
func TestGetHostURL_Cluster_SpecificNode(t *testing.T) {
133+
env := &v1alpha1.Environment{
134+
Spec: v1alpha1.EnvironmentSpec{
135+
Provider: v1alpha1.ProviderAWS,
136+
Cluster: &v1alpha1.ClusterSpec{},
137+
},
138+
Status: v1alpha1.EnvironmentStatus{
139+
Cluster: &v1alpha1.ClusterStatus{
140+
Nodes: []v1alpha1.NodeStatus{
141+
{Name: "cp-0", Role: "control-plane", PublicIP: "10.0.0.1"},
142+
{Name: "worker-0", Role: "worker", PublicIP: "10.0.0.2"},
143+
},
144+
},
145+
},
146+
}
147+
148+
url, err := GetHostURL(env, "worker-0", false)
149+
if err != nil {
150+
t.Fatalf("unexpected error: %v", err)
151+
}
152+
if url != "10.0.0.2" {
153+
t.Errorf("expected worker IP 10.0.0.2, got %s", url)
154+
}
155+
}
156+
157+
func TestGetHostURL_Cluster_NodeNotFound(t *testing.T) {
158+
env := &v1alpha1.Environment{
159+
Spec: v1alpha1.EnvironmentSpec{
160+
Provider: v1alpha1.ProviderAWS,
161+
Cluster: &v1alpha1.ClusterSpec{},
162+
},
163+
Status: v1alpha1.EnvironmentStatus{
164+
Cluster: &v1alpha1.ClusterStatus{
165+
Nodes: []v1alpha1.NodeStatus{
166+
{Name: "cp-0", Role: "control-plane", PublicIP: "10.0.0.1"},
167+
},
168+
},
169+
},
170+
}
171+
172+
_, err := GetHostURL(env, "nonexistent", false)
173+
if err == nil {
174+
t.Error("expected error for nonexistent node")
175+
}
176+
}

0 commit comments

Comments
 (0)