Skip to content

Commit 9ed4045

Browse files
authored
[checks] wait for healthiness and termination (#16)
* [checks] add a status OK (healthiness) checker Also removes extra minute of wait, inbetween EC2 start and SSM testing * Use a terminateWaiter * Log the AMI used for testing * Log when we start terminating * Log as we terminate * Add some padding * Wait 5m for healthiness (it takes 3m) * Update docs * fix syntax
1 parent df6132c commit 9ed4045

File tree

5 files changed

+89
-45
lines changed

5 files changed

+89
-45
lines changed

gitpod-network-check/README.md

Lines changed: 59 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -55,38 +55,65 @@ A CLI to check if your network setup is suitable for the installation of Gitpod.
5555

5656
```console
5757
./gitpod-network-check diagnose
58-
INFO[0000] ✅ Main Subnets are valid
59-
INFO[0000] ✅ Pod Subnets are valid
60-
INFO[0000] ℹ️ Checking prerequisites
61-
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ec2messages is configured
62-
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ssm is configured
63-
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.ssmmessages is configured
64-
INFO[0001] ℹ️ Launching EC2 instance in a Main subnet
65-
INFO[0007] ℹ️ Launching EC2 instance in a Pod subnet
66-
INFO[0009] ℹ️ Waiting for EC2 instances to become ready (can take up to 2 minutes)
67-
INFO[0167] ✅ EC2 Instances are now running successfully
68-
INFO[0167] ℹ️ Connecting to SSM...
69-
INFO[0175] ℹ️ Checking if the required AWS Services can be reached from the ec2 instances
70-
INFO[0178] ✅ Autoscaling is available
71-
INFO[0179] ✅ CloudFormation is available
72-
INFO[0179] ✅ CloudWatch is available
73-
INFO[0180] ✅ EC2 is available
74-
INFO[0181] ✅ EC2messages is available
75-
INFO[0182] ✅ ECR is available
76-
INFO[0183] ✅ ECR Api is available
77-
INFO[0184] ✅ EKS is available
78-
INFO[0185] ✅ Elastic LoadBalancing is available
79-
INFO[0185] ✅ KMS is available
80-
INFO[0186] ✅ Kinesis Firehose is available
81-
INFO[0187] ✅ SSM is available
82-
INFO[0188] ✅ SSMmessages is available
83-
INFO[0189] ✅ SecretsManager is available
84-
INFO[0190] ✅ Sts is available
85-
INFO[0190] ✅ DynamoDB is available
86-
INFO[0191] ✅ S3 is available
87-
INFO[0194] ✅ accounts.google.com is available
88-
INFO[0194] ✅ github.com is available
89-
INFO[0194] ✅ Instances terminated
58+
INFO[0000] ℹ️ Running with region `eu-central-1`, main subnet `[subnet-0ed211f14362b224f subnet-041703e62a05d2024]`, pod subnet `[subnet-075c44edead3b062f subnet-06eb311c6b92e0f29]`, hosts `[accounts.google.com https://github.com]`, ami ``, and API endpoint ``
59+
INFO[0000] ✅ Main Subnets are valid
60+
INFO[0000] ✅ Pod Subnets are valid
61+
INFO[0000] ℹ️ Checking prerequisites
62+
INFO[0000] ℹ️ VPC endpoint com.amazonaws.eu-central-1.ec2messages is not configured, testing service connectivity...
63+
INFO[0000] ✅ Service ec2messages.eu-central-1.amazonaws.com has connectivity
64+
INFO[0000] ℹ️ VPC endpoint com.amazonaws.eu-central-1.ssm is not configured, testing service connectivity...
65+
INFO[0000] ✅ Service ssm.eu-central-1.amazonaws.com has connectivity
66+
INFO[0000] ℹ️ VPC endpoint com.amazonaws.eu-central-1.ssmmessages is not configured, testing service connectivity...
67+
INFO[0000] ✅ Service ssmmessages.eu-central-1.amazonaws.com has connectivity
68+
INFO[0000] ✅ VPC endpoint com.amazonaws.eu-central-1.execute-api is configured
69+
INFO[0001] ✅ IAM role created and policy attached
70+
INFO[0001] ℹ️ Launching EC2 instances in Main subnets
71+
INFO[0001] ℹ️ Created security group with ID: sg-0784ba9ba1731f522
72+
INFO[0002] ℹ️ Instance type t2.micro shall be used
73+
INFO[0009] ℹ️ Created security group with ID: sg-088d7ea455ba271f5
74+
INFO[0010] ℹ️ Instance type t2.micro shall be used
75+
INFO[0011] ℹ️ Main EC2 instances: [i-00675f1d3d0162acb i-041d127c852b5c1ab]
76+
INFO[0011] ℹ️ Launching EC2 instances in a Pod subnets
77+
INFO[0012] ℹ️ Created security group with ID: sg-03575b98e15e8b184
78+
INFO[0012] ℹ️ Instance type t2.micro shall be used
79+
INFO[0014] ℹ️ Created security group with ID: sg-00d4a66a7840ebd67
80+
INFO[0014] ℹ️ Instance type t2.micro shall be used
81+
INFO[0016] ℹ️ Pod EC2 instances: [i-00e2b26e784c900c6 i-077cbced73ee64c1d]
82+
INFO[0016] ℹ️ Waiting for EC2 instances to become Running (times out in 4 minutes)
83+
INFO[0021] ℹ️ Waiting for EC2 instances to become Healthy (times out in 4 minutes)
84+
INFO[0199] ✅ EC2 Instances are now running successfully
85+
INFO[0199] ℹ️ Connecting to SSM...
86+
INFO[0199] ℹ️ Checking if the required AWS Services can be reached from the ec2 instances in the pod subnet
87+
INFO[0201] ✅ Autoscaling is available
88+
INFO[0202] ✅ CloudFormation is available
89+
INFO[0203] ✅ CloudWatch is available
90+
INFO[0204] ✅ EC2 is available
91+
INFO[0205] ✅ EC2messages is available
92+
INFO[0206] ✅ ECR is available
93+
INFO[0206] ✅ ECR Api is available
94+
INFO[0207] ✅ EKS is available
95+
INFO[0209] ✅ Elastic LoadBalancing is available
96+
INFO[0210] ✅ KMS is available
97+
INFO[0211] ✅ Kinesis Firehose is available
98+
INFO[0212] ✅ SSM is available
99+
INFO[0212] ✅ SSMmessages is available
100+
INFO[0214] ✅ SecretsManager is available
101+
INFO[0215] ✅ Sts is available
102+
INFO[0215] ℹ️ Checking if certain AWS Services can be reached from ec2 instances in the main subnet
103+
INFO[0216] ✅ DynamoDB is available
104+
INFO[0217] ✅ S3 is available
105+
INFO[0217] ℹ️ Checking if hosts can be reached with HTTPS from ec2 instances in the main subnets
106+
INFO[0218] ✅ accounts.google.com is available
107+
INFO[0219] ✅ https://github.com is available
108+
INFO[0219] ℹ️ Terminating EC2 instances
109+
INFO[0219] ℹ️ Waiting for EC2 instances to Terminate (times out in 4 minutes)
110+
INFO[0304] ✅ Instances terminated
111+
INFO[0305] ✅ Role 'GitpodNetworkCheck' deleted
112+
INFO[0305] ✅ Instance profile deleted
113+
INFO[0305] ✅ Security group 'sg-0784ba9ba1731f522' deleted
114+
INFO[0306] ✅ Security group 'sg-088d7ea455ba271f5' deleted
115+
INFO[0306] ✅ Security group 'sg-03575b98e15e8b184' deleted
116+
INFO[0306] ✅ Security group 'sg-00d4a66a7840ebd67' deleted
90117
```
91118

92119
3. Clean up after network diagnosis

gitpod-network-check/cmd/checks.go

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -82,14 +82,23 @@ var checkCommand = &cobra.Command{ // nolint:gochecknoglobals
8282
log.Infof("ℹ️ Pod EC2 instances: %v", podInstanceIds)
8383
InstanceIds = append(InstanceIds, podInstanceIds...)
8484

85-
log.Infof("ℹ️ Waiting for EC2 instances to become ready (can take up to 2 minutes)")
86-
waiter := ec2.NewInstanceRunningWaiter(ec2Client, func(irwo *ec2.InstanceRunningWaiterOptions) {
85+
log.Infof("ℹ️ Waiting for EC2 instances to become Running (times out in 4 minutes)")
86+
runningWaiter := ec2.NewInstanceRunningWaiter(ec2Client, func(irwo *ec2.InstanceRunningWaiterOptions) {
8787
irwo.MaxDelay = 15 * time.Second
8888
irwo.MinDelay = 5 * time.Second
8989
})
90-
err = waiter.Wait(cmd.Context(), &ec2.DescribeInstancesInput{InstanceIds: InstanceIds}, *aws.Duration(4 * time.Minute))
90+
err = runningWaiter.Wait(cmd.Context(), &ec2.DescribeInstancesInput{InstanceIds: InstanceIds}, *aws.Duration(4 * time.Minute))
9191
if err != nil {
92-
return fmt.Errorf("❌ Nodes never got ready: %v", err)
92+
return fmt.Errorf("❌ Nodes never got Running: %v", err)
93+
}
94+
log.Infof("ℹ️ Waiting for EC2 instances to become Healthy (times out in 5 minutes)")
95+
waitstatusOK := ec2.NewInstanceStatusOkWaiter(ec2Client, func(isow *ec2.InstanceStatusOkWaiterOptions) {
96+
isow.MaxDelay = 15 * time.Second
97+
isow.MinDelay = 5 * time.Second
98+
})
99+
err = waitstatusOK.Wait(cmd.Context(), &ec2.DescribeInstanceStatusInput{InstanceIds: InstanceIds}, *aws.Duration(5 * time.Minute))
100+
if err != nil {
101+
return fmt.Errorf("❌ Nodes never got Healthy: %v", err)
93102
}
94103
log.Info("✅ EC2 Instances are now running successfully")
95104

@@ -99,8 +108,6 @@ var checkCommand = &cobra.Command{ // nolint:gochecknoglobals
99108
return fmt.Errorf("❌ could not connect to SSM: %w", err)
100109
}
101110

102-
time.Sleep(time.Minute)
103-
104111
log.Infof("ℹ️ Checking if the required AWS Services can be reached from the ec2 instances in the pod subnet")
105112
serviceEndpoints := map[string]string{
106113
"SSM": fmt.Sprintf("https://ssm.%s.amazonaws.com", networkConfig.AwsRegion),

gitpod-network-check/cmd/common.go

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -66,17 +66,27 @@ func cleanup(ctx context.Context, svc *ec2.Client, iamsvc *iam.Client) {
6666
}
6767

6868
if len(InstanceIds) > 0 {
69+
log.Info("ℹ️ Terminating EC2 instances")
6970
_, err := svc.TerminateInstances(ctx, &ec2.TerminateInstancesInput{
7071
InstanceIds: InstanceIds,
7172
})
7273
if err != nil {
7374
log.WithError(err).WithField("instanceIds", InstanceIds).Warnf("Failed to cleanup instances, please cleanup manually")
7475
}
7576

76-
log.Info("✅ Instances terminated")
77-
78-
log.Info("Cleaning up: Waiting for 2 minutes so network interfaces are deleted")
79-
time.Sleep(2 * time.Minute)
77+
terminateWaiter := ec2.NewInstanceTerminatedWaiter(svc, func(itwo *ec2.InstanceTerminatedWaiterOptions) {
78+
itwo.MaxDelay = 15 * time.Second
79+
itwo.MinDelay = 5 * time.Second
80+
})
81+
log.Info("ℹ️ Waiting for EC2 instances to Terminate (times out in 4 minutes)")
82+
err = terminateWaiter.Wait(ctx, &ec2.DescribeInstancesInput{InstanceIds: InstanceIds}, *aws.Duration(4 * time.Minute))
83+
if err != nil {
84+
log.WithError(err).Warn("Failed to wait for instances to terminate")
85+
log.Warn("Waiting 2 minutes so network interfaces are deleted")
86+
time.Sleep(2 * time.Minute)
87+
} else {
88+
log.Info("✅ Instances terminated")
89+
}
8090
}
8191

8292
if len(Roles) == 0 {

gitpod-network-check/cmd/root.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ func init() {
9393
networkCheckCmd.PersistentFlags().StringVar(&networkConfig.InstanceAMI, "instance-ami", "", "Custom ec2 instance AMI id, if not set will use latest ubuntu")
9494
networkCheckCmd.PersistentFlags().StringVar(&networkConfig.ApiEndpoint, "api-endpoint", "", "The Gitpod Enterprise control plane's regional API endpoint subdomain")
9595
bindFlags(networkCheckCmd, v)
96-
log.Infof("ℹ️ Running with region `%s`, main subnet `%v`, pod subnet `%v`, hosts `%v`, and api endpoint `%v`", networkConfig.AwsRegion, networkConfig.MainSubnets, networkConfig.PodSubnets, networkConfig.HttpsHosts, networkConfig.ApiEndpoint)
96+
log.Infof("ℹ️ Running with region `%s`, main subnet `%v`, pod subnet `%v`, hosts `%v`, ami `%v`, and API endpoint `%v`", networkConfig.AwsRegion, networkConfig.MainSubnets, networkConfig.PodSubnets, networkConfig.HttpsHosts, networkConfig.InstanceAMI, networkConfig.ApiEndpoint)
9797
}
9898

9999
func readConfigFile() *viper.Viper {

gitpod-network-check/gitpod-network-check.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
log-level: debug # Options: debug, info, warning, error
22
region: eu-central-1
3-
main-subnets: subnet-03ed4c7f3f10ee64a, subnet-03ae0d9e3ad063d83
4-
pod-subnets: subnet-09704642a44a1ae9b, subnet-0fc43a731956656cd
3+
main-subnets: subnet-0ed211f14362b224f, subnet-041703e62a05d2024
4+
pod-subnets: subnet-075c44edead3b062f, subnet-06eb311c6b92e0f29
55
https-hosts: accounts.google.com, https://github.com
66
# put your custom ami id here if you want to use it, otherwise it will using latest ubuntu AMI from aws
77
instance-ami:

0 commit comments

Comments
 (0)