Skip to content

Commit c8a3521

Browse files
committed
WIP: Make lambda work - except for the actual request handling
Tool: gitpod/catfood.gitpod.cloud
1 parent f40e565 commit c8a3521

File tree

18 files changed

+1850
-142
lines changed

18 files changed

+1850
-142
lines changed

.goreleaser.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ builds:
1414
ignore:
1515
- goos: windows
1616
goarch: arm64
17+
ldflags:
18+
- -s -w -extldflags=-static
1719
binary: gitpod-network-check
1820

1921
archives:

gitpod-network-check/.gitignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
2-
gitpod-network-check
1+
gitpod-network-check
2+
*.zip

gitpod-network-check/Makefile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.PHONY: build
2+
3+
build:
4+
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -a -ldflags="-s -w -extldflags=-static" -o gitpod-network-check main.go

gitpod-network-check/README.md

Lines changed: 55 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -44,17 +44,28 @@ A CLI to check if your network setup is suitable for the installation of Gitpod.
4444
pod-subnets: subnet-028d11dce93b8eefc, subnet-04ec8257d95c434b7,subnet-00a83550ce709f39c
4545
https-hosts: accounts.google.com, github.com
4646
instance-ami: # put your custom ami id here if you want to use it, otherwise it will using latest ubuntu AMI from aws
47-
api-endpoint: # optional, put your API endpoint regional sub-domain here to test connectivity, like when the execute-api vpc endpoint is not in the same account as Gitpod
47+
api-endpoint: # optional, put your API endpoint regional sub-domain here to test connectivity, like when the execute-api vpc endpoint is not in the same account as Gitpod
48+
# lambda-role-arn: arn:aws:iam::123456789012:role/MyExistingLambdaRole # Optional: Use existing IAM Role for Lambda mode
49+
# lambda-sg-id: sg-0123456789abcdef0 # Optional: Use existing Security Group for Lambda mode
4850
```
4951

50-
note: if using a custom AMI, please ensure the [SSM agent](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-linux.html) and [curl](https://curl.se/) are both installed. We rely on SSM's [SendCommand](https://docs.aws.amazon.com/code-library/latest/ug/ssm_example_ssm_SendCommand_section.html) to test HTTPS connectivity.
52+
**Note:** The `lambda-role-arn` and `lambda-sg-id` fields correspond to the `--lambda-role-arn` and `--lambda-sg-id` command-line flags, respectively. Setting them in the config file or via environment variables (e.g., `NTCHK_LAMBDA_ROLE_ARN`) achieves the same result.
53+
54+
**EC2 Mode Note:** If using a custom AMI (`instance-ami`), please ensure the [SSM agent](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-linux.html) and [curl](https://curl.se/) are both installed. We rely on SSM's [SendCommand](https://docs.aws.amazon.com/code-library/latest/ug/ssm_example_ssm_SendCommand_section.html) to test HTTPS connectivity in EC2 mode.
5155

5256
2. Run the network diagnosis
5357

54-
To start the diagnosis, the the command: `./gitpod-network-check diagnose`
58+
The tool supports different modes for running the checks, specified by the `--mode` flag (`ec2`, `lambda`, `local`).
59+
60+
**Using EC2 Mode (Default):**
61+
62+
This mode launches temporary EC2 instances in your specified subnets to perform the network checks. This most closely simulates the environment where Gitpod components will run.
63+
64+
To start the diagnosis using EC2 mode: `./gitpod-network-check diagnose --mode ec2` (or simply `./gitpod-network-check diagnose` as EC2 is the default).
5565

5666
```console
57-
./gitpod-network-check diagnose
67+
# Example output for EC2 mode
68+
./gitpod-network-check diagnose --mode ec2
5869
INFO[0000] ℹ️ Running with region `eu-central-1`, main subnet `[subnet-0ed211f14362b224f subnet-041703e62a05d2024]`, pod subnet `[subnet-075c44edead3b062f subnet-06eb311c6b92e0f29]`, hosts `[accounts.google.com https://github.com]`, ami ``, and API endpoint ``
5970
INFO[0000] ✅ Main Subnets are valid
6071
INFO[0000] ✅ Pod Subnets are valid
@@ -116,22 +127,51 @@ A CLI to check if your network setup is suitable for the installation of Gitpod.
116127
INFO[0306] ✅ Security group 'sg-00d4a66a7840ebd67' deleted
117128
```
118129

130+
**Using Lambda Mode:**
131+
132+
This mode uses AWS Lambda functions deployed into your specified subnets to perform the network checks. It avoids the need to launch full EC2 instances but has its own prerequisites.
133+
134+
* **Prerequisites for Lambda Mode:**
135+
* **IAM Permissions:** The AWS credentials used to run `gitpod-network-check` need permissions to manage Lambda functions, IAM roles, security groups, and CloudWatch Logs. Specifically, it needs to perform actions like: `lambda:CreateFunction`, `lambda:GetFunction`, `lambda:DeleteFunction`, `lambda:InvokeFunction`, `iam:CreateRole`, `iam:GetRole`, `iam:DeleteRole`, `iam:AttachRolePolicy`, `iam:DetachRolePolicy`, `ec2:CreateSecurityGroup`, `ec2:DescribeSecurityGroups`, `ec2:DeleteSecurityGroup`, `ec2:AuthorizeSecurityGroupEgress`, `ec2:DescribeSubnets`, `logs:DeleteLogGroup`.
136+
* **Network Connectivity:** Lambda functions running within a VPC need a route to the internet or required AWS service endpoints. This typically requires a **NAT Gateway** in your VPC or **VPC Endpoints** for all necessary services (e.g., STS, CloudWatch Logs, ECR, S3, DynamoDB, and any target HTTPS hosts). Without proper outbound connectivity, the Lambda checks will fail.
137+
138+
* **Running Lambda Mode:**
139+
To start the diagnosis using Lambda mode:
140+
```bash
141+
./gitpod-network-check diagnose --mode lambda
142+
```
143+
144+
* **Using Existing Resources (Lambda Mode):**
145+
If you have pre-existing IAM roles or Security Groups you want the Lambda functions to use, you can specify them using flags. This will prevent the tool from creating or deleting these specific resources.
146+
```bash
147+
./gitpod-network-check diagnose --mode lambda \
148+
--lambda-role-arn arn:aws:iam::123456789012:role/MyExistingLambdaRole \
149+
--lambda-sg-id sg-0123456789abcdef0
150+
```
151+
152+
* **Example Output (Lambda Mode):**
153+
The output will be similar to EC2 mode but will show Lambda function creation/invocation instead of EC2 instance management.
154+
155+
**Using Local Mode:**
156+
157+
This mode runs the checks directly from the machine where you execute the CLI. It's useful for basic outbound connectivity tests but **does not** accurately reflect the network environment within your AWS subnets.
158+
159+
To start the diagnosis using local mode: `./gitpod-network-check diagnose --mode local`
160+
119161
3. Clean up after network diagnosis
120162
121-
Dianosis is designed to do clean-up before it finishes. However, if the process terminates unexpectedly, you may clean-up AWS resources it creates like so:
163+
The `diagnose` command is designed to clean up the AWS resources it creates (EC2 instances, Lambda functions, IAM roles, Security Groups, CloudWatch Log groups) before it finishes. However, if the process terminates unexpectedly, you can manually trigger cleanup using the `clean` command. This command respects the `--mode` flag to clean up resources specific to that mode.
122164
123-
```console
124-
./gitpod-network-check clean
125-
INFO[0000] ✅ Main Subnets are valid
126-
INFO[0000] ✅ Pod Subnets are valid
127-
INFO[0000] ✅ Instances terminated
128-
INFO[0000] Cleaning up: Waiting for 2 minutes so network interfaces are deleted
129-
INFO[0121] ✅ Role 'GitpodNetworkCheck' deleted
130-
INFO[0121] ✅ Instance profile deleted
131-
INFO[0122] ✅ Security group 'sg-0a6119dcb6a564fc1' deleted
132-
INFO[0122] ✅ Security group 'sg-07373362953212e54' deleted
165+
```bash
166+
# Clean up resources potentially left by EC2 mode
167+
./gitpod-network-check clean --mode ec2
168+
169+
# Clean up resources potentially left by Lambda mode
170+
./gitpod-network-check clean --mode lambda
133171
```
134172
173+
**Note:** The `clean` command will *not* delete IAM roles or Security Groups if they were provided using the `--lambda-role-arn` or `--lambda-sg-id` flags during the `diagnose` run.
174+
135175
## FAQ
136176
137177
If the EC2 instances are timing out, or you cannot connect to them with Session Manager, be sure to add the following policies.

gitpod-network-check/cmd/checks.go

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,20 +12,32 @@ import (
1212
testrunner "github.com/gitpod-io/enterprise-deployment-toolkit/gitpod-network-check/pkg/runner"
1313
)
1414

15+
var skipCleanup bool
16+
17+
func init() {
18+
checkCommand.Flags().BoolVar(&skipCleanup, "skip-cleanup", false, "Skip the cleanup false (default: false). Useful for debugging purposes.")
19+
NetworkCheckCmd.AddCommand(checkCommand)
20+
}
21+
1522
var checkCommand = &cobra.Command{ // nolint:gochecknoglobals
16-
PersistentPreRunE: validateArguments,
17-
Use: "diagnose",
18-
Short: "Runs the network check diagnosis",
19-
SilenceUsage: false,
23+
PreRunE: validateArguments,
24+
Use: "diagnose",
25+
Short: "Runs the network check diagnosis",
26+
SilenceUsage: false,
2027
RunE: func(cmd *cobra.Command, args []string) error {
2128
ctx := cmd.Context()
2229

23-
runner, err := testrunner.NewRunner(ctx, flags.Mode, &networkConfig)
30+
runner, err := testrunner.NewRunner(ctx, Flags.Mode, &NetworkConfig)
2431
if err != nil {
2532
return fmt.Errorf("❌ failed to create test runner: %v", err)
2633
}
2734

2835
defer (func() {
36+
if skipCleanup {
37+
log.Info("⚠️ Skipping cleanup, because --skip-cleanup flag is set.")
38+
return
39+
}
40+
2941
// Ensure runner was actually assigned before trying to clean up
3042
if runner == nil {
3143
log.Info("ℹ️ No runner initialized, skipping cleanup.")
@@ -45,12 +57,12 @@ var checkCommand = &cobra.Command{ // nolint:gochecknoglobals
4557
return fmt.Errorf("❌ failed to prepare: %v", err)
4658
}
4759

48-
for _, testset := range flags.SelectedTestsets {
60+
for _, testset := range Flags.SelectedTestsets {
4961
log.Infof("ℹ️ Running testset: %s", testset)
5062

5163
ts := checks.TestSets[checks.TestsetName(testset)]
52-
serviceEndpoints, subnetType := ts(&networkConfig)
53-
subnets := Filter(networkConfig.GetAllSubnets(), func(subnet checks.Subnet) bool {
64+
serviceEndpoints, subnetType := ts(&NetworkConfig)
65+
subnets := Filter(NetworkConfig.GetAllSubnets(), func(subnet checks.Subnet) bool {
5466
return subnet.Type == subnetType
5567
})
5668

@@ -73,8 +85,8 @@ var checkCommand = &cobra.Command{ // nolint:gochecknoglobals
7385

7486
func validateArguments(cmd *cobra.Command, args []string) error {
7587
// Validate testsets if specified
76-
if len(flags.SelectedTestsets) > 0 {
77-
for _, testset := range flags.SelectedTestsets {
88+
if len(Flags.SelectedTestsets) > 0 {
89+
for _, testset := range Flags.SelectedTestsets {
7890
if _, exists := checks.TestSets[checks.TestsetName(testset)]; !exists {
7991
return fmt.Errorf("Invalid testset: %s. Available testsets: %v",
8092
testset,

gitpod-network-check/cmd/cleanup.go

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@ import (
1010
)
1111

1212
var cleanCommand = &cobra.Command{ // nolint:gochecknoglobals
13-
Use: "clean",
14-
Short: "Explicitly cleans up after the network check diagnosis",
15-
SilenceUsage: false,
13+
Use: "clean",
14+
Short: "Explicitly cleans up after the network check diagnosis",
15+
SilenceUsage: false,
1616
RunE: func(cmd *cobra.Command, args []string) error {
1717
ctx := cmd.Context()
1818

1919
log.Infof("ℹ️ Running cleanup")
20-
runner, err := runner.NewRunner(ctx, flags.Mode, &networkConfig)
20+
runner, err := runner.LoadRunnerFromTags(ctx, Flags.Mode, &NetworkConfig)
2121
if err != nil {
2222
return fmt.Errorf("❌ failed to create test runner: %v", err)
2323
}
@@ -31,3 +31,7 @@ var cleanCommand = &cobra.Command{ // nolint:gochecknoglobals
3131
return nil
3232
},
3333
}
34+
35+
func init() {
36+
NetworkCheckCmd.AddCommand(cleanCommand)
37+
}
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
package cmd
2+
3+
import (
4+
"encoding/json"
5+
"fmt"
6+
"io"
7+
"net/http"
8+
"os"
9+
"time"
10+
11+
log "github.com/sirupsen/logrus"
12+
"github.com/spf13/cobra"
13+
14+
"github.com/gitpod-io/enterprise-deployment-toolkit/gitpod-network-check/pkg/lambda_types"
15+
)
16+
17+
var lambdaHandlerCmd = &cobra.Command{
18+
Use: "lambda-handler",
19+
Short: "Internal command to execute network checks within AWS Lambda (reads JSON request from stdin, writes JSON response to stdout)",
20+
Hidden: true, // Hide this command from user help output
21+
PersistentPreRun: func(cmd *cobra.Command, args []string) {
22+
// override parent, as we don't care about the config or other flags
23+
},
24+
RunE: func(cmd *cobra.Command, args []string) error {
25+
// Lambda environment might not have sophisticated logging setup, print directly
26+
fmt.Fprintln(os.Stderr, "Lambda Handler: Starting execution.")
27+
28+
// Read request payload from stdin
29+
stdinBytes, err := io.ReadAll(os.Stdin)
30+
if err != nil {
31+
fmt.Fprintf(os.Stderr, "Lambda Handler: Error reading stdin: %v\n", err)
32+
return fmt.Errorf("error reading stdin: %w", err)
33+
}
34+
35+
var request lambda_types.CheckRequest
36+
err = json.Unmarshal(stdinBytes, &request)
37+
if err != nil {
38+
fmt.Fprintf(os.Stderr, "Lambda Handler: Error unmarshalling request JSON: %v\n", err)
39+
fmt.Fprintf(os.Stderr, "Lambda Handler: Received input: %s\n", string(stdinBytes))
40+
return fmt.Errorf("error unmarshalling request: %w", err)
41+
}
42+
43+
fmt.Fprintf(os.Stderr, "Lambda Handler: Received check request for %d endpoints.\n", len(request.Endpoints))
44+
45+
response := lambda_types.CheckResponse{
46+
Results: make(map[string]lambda_types.CheckResult),
47+
}
48+
49+
client := &http.Client{
50+
Timeout: 10 * time.Second, // Slightly longer timeout for Lambda environment?
51+
}
52+
53+
// Perform checks (similar logic to the previous dedicated handler)
54+
for name, url := range request.Endpoints {
55+
fmt.Fprintf(os.Stderr, "Lambda Handler: Checking endpoint: %s (%s)\n", name, url)
56+
// Use context from command if needed, otherwise background context is fine here
57+
req, err := http.NewRequestWithContext(cmd.Context(), "GET", url, nil)
58+
if err != nil {
59+
response.Results[name] = lambda_types.CheckResult{Success: false, Error: fmt.Sprintf("failed to create request: %v", err)}
60+
fmt.Fprintf(os.Stderr, " -> Failed (request creation): %v\n", err)
61+
continue
62+
}
63+
64+
resp, err := client.Do(req)
65+
if err != nil {
66+
response.Results[name] = lambda_types.CheckResult{Success: false, Error: fmt.Sprintf("HTTP request failed: %v", err)}
67+
fmt.Fprintf(os.Stderr, " -> Failed (HTTP request): %v\n", err)
68+
} else {
69+
resp.Body.Close() // Ensure body is closed
70+
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
71+
response.Results[name] = lambda_types.CheckResult{Success: true}
72+
fmt.Fprintf(os.Stderr, " -> Success (Status: %d)\n", resp.StatusCode)
73+
} else {
74+
response.Results[name] = lambda_types.CheckResult{Success: false, Error: fmt.Sprintf("unexpected status code: %d", resp.StatusCode)}
75+
fmt.Fprintf(os.Stderr, " -> Failed (Status: %d)\n", resp.StatusCode)
76+
}
77+
}
78+
}
79+
80+
// Marshal response payload to stdout
81+
responseBytes, err := json.Marshal(response)
82+
if err != nil {
83+
fmt.Fprintf(os.Stderr, "Lambda Handler: Error marshalling response JSON: %v\n", err)
84+
return fmt.Errorf("error marshalling response: %w", err)
85+
}
86+
87+
_, err = fmt.Fprint(os.Stdout, string(responseBytes))
88+
if err != nil {
89+
fmt.Fprintf(os.Stderr, "Lambda Handler: Error writing response to stdout: %v\n", err)
90+
return fmt.Errorf("error writing response: %w", err)
91+
}
92+
93+
fmt.Fprintln(os.Stderr, "Lambda Handler: Execution complete.")
94+
return nil
95+
},
96+
// Disable flag parsing for this internal command as it gets input via stdin
97+
DisableFlagParsing: true,
98+
}
99+
100+
func init() {
101+
// Note: We don't add this to networkCheckCmd directly in init() here
102+
// because it might interfere with normal flag parsing if not careful.
103+
// It will be added in the main Execute() function or similar central place.
104+
// For now, just define the command struct.
105+
// We also need to ensure logging doesn't interfere with stdout JSON output.
106+
// Maybe configure logging to stderr specifically for this command?
107+
lambdaHandlerCmd.PersistentPreRun = func(cmd *cobra.Command, args []string) {
108+
// Ensure logs go to stderr for this command to keep stdout clean for JSON
109+
log.SetOutput(os.Stderr)
110+
}
111+
112+
NetworkCheckCmd.AddCommand(lambdaHandlerCmd) // Register the hidden lambda handler command
113+
}

0 commit comments

Comments
 (0)