Skip to content

Commit debe6c0

Browse files
authored
Merge pull request #1026 from shiftstack/devstack-on-openstack
✨Devstack on openstack and multi-AZ support
2 parents 8b17f0e + 8176039 commit debe6c0

32 files changed

+1576
-742
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ test: ## Run tests
125125
E2E_GINKGO_ARGS ?= -stream
126126
.PHONY: test-e2e ## Run e2e tests using clusterctl
127127
test-e2e: $(GINKGO) $(KIND) $(KUSTOMIZE) e2e-image test-e2e-image-prerequisites ## Run e2e tests
128-
time $(GINKGO) -trace -progress -v -tags=e2e --nodes=$(E2E_GINKGO_PARALLEL) $(E2E_GINKGO_ARGS) ./test/e2e/suites/e2e/... -- -config-path="$(E2E_CONF_PATH)" -artifacts-folder="$(ARTIFACTS)" --data-folder="$(E2E_DATA_DIR)" $(E2E_ARGS)
128+
time $(GINKGO) --failFast -trace -progress -v -tags=e2e --nodes=$(E2E_GINKGO_PARALLEL) $(E2E_GINKGO_ARGS) ./test/e2e/suites/e2e/... -- -config-path="$(E2E_CONF_PATH)" -artifacts-folder="$(ARTIFACTS)" --data-folder="$(E2E_DATA_DIR)" $(E2E_ARGS)
129129

130130
.PHONY: e2e-image
131131
e2e-image: CONTROLLER_IMG_TAG = "gcr.io/k8s-staging-capi-openstack/capi-openstack-controller:e2e"

docs/book/src/SUMMARY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@
77
- [external cloud provider](./topics/external-cloud-provider.md)
88
- [move from bootstrap](./topics/mover.md)
99
- [trouble shooting](./topics/troubleshooting.md)
10+
- [Development](./development/development.md)
11+
- [Hacking CI](./development/ci.md)

docs/book/src/development/ci.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
2+
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
3+
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
4+
5+
- [Hacking CI for the E2E tests](#hacking-ci-for-the-e2e-tests)
6+
- [Prow](#prow)
7+
- [DevStack](#devstack)
8+
- [Configuration](#configuration)
9+
- [Build order](#build-order)
10+
- [Networking](#networking)
11+
- [Availability zones](#availability-zones)
12+
- [Connecting to DevStack](#connecting-to-devstack)
13+
14+
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
15+
16+
# Hacking CI for the E2E tests
17+
18+
## Prow
19+
20+
CAPO tests are executed by Prow. They are defined in the [Kubernetes test-infra repository](https://github.com/kubernetes/test-infra/tree/master/config/jobs/kubernetes-sigs/cluster-api-provider-openstack). The E2E tests run as a presubmit. They run in a docker container in Prow infrastructure which contains a checkout of the CAPO tree under test. The entry point for tests is `scripts/ci-e2e.sh`, which is defined in the job in Prow.
21+
22+
## DevStack
23+
24+
The E2E tests require an OpenStack cloud to run against, which we provision during the test with DevStack. The project has access to capacity on GCP, so we provision DevStack on 2 GCP instances.
25+
26+
The entry point for the creation of the test DevStack is `hack/ci/create_devstack.sh`, which is executed by `scripts/ci-e2e.sh`. We create 2 instances: `controller` and `worker`. Each will provision itself via cloud-init using config defined in `hack/ci/cloud-init`.
27+
28+
### Configuration
29+
30+
We configure a 2 node DevStack. `controller` is running:
31+
32+
* All control plane services
33+
* Nova: all services, including compute
34+
* Glance: all services
35+
* Octavia: all services
36+
* Neutron: all services with ML2/OVS, including L3 agent
37+
* Cinder: all services, including volume with default LVM/iSCSI backend
38+
39+
`worker` is running:
40+
41+
* Nova: compute only
42+
* Neutron: agent only (not L3 agent)
43+
* Cinder: volume only with default LVM/iSCSI backend
44+
45+
`controller` is using the `n2-standard-16` machine type with 16 vCPUs and 64 GB RAM. `worker` is using the `n2-standard-8` machine type with 8 vCPUs and 32 GB RAM. Each job has a quota limit of 24 vCPUs.
46+
47+
### Build order
48+
49+
We build `controller` first, and then `worker`. We let `worker` build asynchronously because tests which don't require a second AZ can run without it while it builds. A systemd job defined in the cloud-init of `controller` polls for `worker` coming up and automatically configures it.
50+
51+
### Networking
52+
53+
Both instances share a common network which uses the CIDR defined in `PRIVATE_NETORK_CIDR` in `hack/ci/create_devstack.sh`. Each instance has a single IP on this network:
54+
55+
* `controller`: `10.0.2.15`
56+
* `worker`: `10.0.2.16`
57+
58+
In addition, DevStack will create a floating IP network using CIDR defined in `FLOATING_RANGE` in `hack/ci/create_devstack.sh`. As the neutron L3 agent is only running on the controller, all of this traffic is handled on the controller, even if the source is an instance running on the worker. The controller creates `iptables` rules to NAT this traffic.
59+
60+
The effect of this is that instances created on either `controller` or `worker` can get a floating ip from the `public` network. Traffic using this floating IP will be routed via `controller` and externally via NAT.
61+
62+
### Availability zones
63+
64+
We are running `nova compute` and `cinder volume` on each of `controller` and `worker`. Each `nova compute` and `cinder volume` are configured to be in their own availability zone. The names of the availability zones are defined in `OPENSTACK_FAILURE_DOMAIN` and `OPENSTACK_FAILURE_DOMAIN_ALT` in `test/e2e/data/e2e_conf.yaml`, with the services running on `controller` being in `OPENSTACK_FAILURE_DOMAIN` and the services running on `worker` being in `OPENSTACK_FAILURE_DOMAIN_ALT`.
65+
66+
This configuration is intended only to allow the testing of functionality related to availability zones, and does not imply any robustness to failure.
67+
68+
Nova is configured (via `[DEFAULT]/default_schedule_zone`) to place all workloads on the controller unless they have an explicit availability zone. The intention is that `controller` should have the capacity to run all tests which are agnostic to availability zones. This means that the explicitly multi-az tests do not risk failure due to capacity issues.
69+
70+
However, this is not sufficient because by default [CAPI explicitly schedules the control plane across all discovered availability zones](https://github.com/kubernetes-sigs/cluster-api/blob/e7769d7a6b3a4eb32292938eed8c470b7018a8b3/controlplane/kubeadm/controllers/scale.go#L77-L82). Consequently we explicitly confine all clusters to `OPENSTACK_FAILURE_DOMAIN` (`controller`) in the test cluster definitions in `test/e2e/data/infrastructure-openstack`.
71+
72+
## Connecting to DevStack
73+
74+
The E2E tests running in Prow create a kind cluster. This also running in Prow using Docker in Docker. The E2E tests configure this cluster with clusterctl, which is where CAPO executes.
75+
76+
`create_devstack.sh` wrote a `clouds.yaml` to the working directory, which is passed to CAPO via the cluster definitions in `test/e2e/data/infrastructure-openstack`. This `clouds.yaml` references the public, routable IP of `controller`. However, DevStack created all the service endpoints using `controller`'s private IP, which is not publicly routable. In addition, the tests need to be able to SSH to the floating IP of the Bastion. This floating IP is also allocated from a range which is not publicly routable.
77+
78+
To allow this access we run `sshuttle` from `create_devstack.sh`. This creates an SSH tunnel and routes traffic for `PRIVATE_NETWORK_CIDR` and `FLOATING_RANGE` over it.
79+
80+
Note that the semantics of a `sshuttle` tunnel are problematic. While they happen to work currently for DinD, Podman runs the kind cluster in a separate network namespace. This means that kind running in podman cannot route over `sshuttle` running outside the kind cluster. This may also break in future versions of Docker.

docs/book/src/development/development.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,13 @@
77
- [Building and upload your own capi-openstack controller image](#building-and-upload-your-own-capi-openstack-controller-image)
88
- [Using your own capi-openstack controller image](#using-your-own-capi-openstack-controller-image)
99
- [Developing with Tilt](#developing-with-tilt)
10+
- [Running E2E tests locally](#running-e2e-tests-locally)
11+
- [Support for clouds using SSL](#support-for-clouds-using-ssl)
12+
- [Support for clouds with multiple external networks](#support-for-clouds-with-multiple-external-networks)
13+
- [OpenStack prerequisites](#openstack-prerequisites)
14+
- [Running E2E tests using rootless podman](#running-e2e-tests-using-rootless-podman)
15+
- [Host configuration](#host-configuration)
16+
- [Running podman system service to emulate docker daemon](#running-podman-system-service-to-emulate-docker-daemon)
1017

1118
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
1219

docs/book/src/getting-started.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
2+
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
3+
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
4+
5+
- [Getting Started](#getting-started)
6+
7+
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
8+
19
# Getting Started
210

311
{{#embed-github repo:"kubernetes-sigs/cluster-api" path:"docs/book/src/user/quick-start.md"}}

docs/book/src/topics/external-cloud-provider.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1+
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
2+
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
3+
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
4+
5+
- [External Cloud Provider](#external-cloud-provider)
6+
- [Steps](#steps)
7+
8+
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
9+
110
# External Cloud Provider
211

312
To deploy a cluster using [external cloud provider](https://github.com/kubernetes/cloud-provider-openstack), create a cluster configuration with the [external cloud provider template](https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/master/templates/cluster-template-external-cloud-provider.yaml).

docs/book/src/topics/mover.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,8 @@
33
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
44

55
- [Pre-condition](#pre-condition)
6-
- [Install openstack providers into target cluster](#install-openstack-providers-into-target-cluster)
7-
- [Move objects in `bootstrap` cluster into `target` cluster.](#move-objects-in-bootstrap-cluster-into-target-cluster)
8-
- [Create secret in `target` cluster](#create-secret-in-target-cluster)
6+
- [Install OpenStack Cluster API provider into target cluster](#install-openstack-cluster-api-provider-into-target-cluster)
7+
- [Move objects from `bootstrap` cluster into `target` cluster.](#move-objects-from-bootstrap-cluster-into-target-cluster)
98
- [Check cluster status](#check-cluster-status)
109

1110
<!-- END doctoc generated TOC please keep comment here to allow auto update -->

hack/ci/.gitignore

Lines changed: 0 additions & 2 deletions
This file was deleted.

hack/ci/aws-project.sh

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
#!/usr/bin/env bash
2+
3+
# Copyright 2021 The Kubernetes Authors.
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
# hack script for preparing AWS to run cluster-api-provider-openstack e2e
18+
19+
set -x -o errexit -o nounset -o pipefail
20+
21+
function cloud_init {
22+
AWS_REGION=${AWS_REGION:-"eu-central-1"}
23+
AWS_ZONE=${AWS_ZONE:-"eu-central-1a"}
24+
# AMIs:
25+
# * capa-ami-ubuntu-20.04-1.20.4-00-1613898574 id: ami-0120656d38c206057
26+
# * ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210223 id: ami-0767046d1677be5a0
27+
AWS_AMI=${AWS_AMI:-"ami-0767046d1677be5a0"}
28+
# Choose via: https://eu-central-1.console.aws.amazon.com/ec2/v2/home?region=eu-central-1#InstanceTypes:
29+
AWS_MACHINE_TYPE=${AWS_MACHINE_TYPE:-"c5.metal"}
30+
AWS_NETWORK_NAME=${AWS_NETWORK_NAME:-"${CLUSTER_NAME}-mynetwork"}
31+
# prepare with:
32+
# * create key pair:
33+
# aws ec2 create-key-pair --key-name capo-e2e --query 'KeyMaterial' --region "${AWS_REGION}" --output text > ~/.ssh/aws-capo-e2e
34+
# * add to local agent and generate public key:
35+
# ssh-add ~/.ssh/aws-capo-e2e
36+
# ssh-keygen -y -f ~/.ssh/aws-capo-e2e > ~/.ssh/aws-capo-e2e.pub
37+
AWS_KEY_PAIR=${AWS_KEY_PAIR:-"capo-e2e"}
38+
# disable pagination of AWS cli
39+
export AWS_PAGER=""
40+
41+
echo "Using: AWS_REGION: ${AWS_REGION} AWS_NETWORK_NAME: ${AWS_NETWORK_NAME}"
42+
}
43+
44+
function init_infrastructure() {
45+
if [[ ${AWS_NETWORK_NAME} != "default" ]]; then
46+
if [[ $(aws ec2 describe-vpcs --filters Name=tag:Name,Values=capo-e2e-mynetwork --region="${AWS_REGION}" --query 'length(*[0])') = "0" ]];
47+
then
48+
aws ec2 create-vpc --cidr-block "$PRIVATE_NETWORK_CIDR" --tag-specifications "ResourceType=vpc,Tags=[{Key=Name,Value=${AWS_NETWORK_NAME}}]" --region="${AWS_REGION}"
49+
AWS_VPC_ID=$(aws ec2 describe-vpcs --filters Name=tag:Name,Values=capo-e2e-mynetwork --region "${AWS_REGION}" --query '*[0].VpcId' --output text)
50+
51+
aws ec2 create-subnet --cidr-block "$PRIVATE_NETWORK_CIDR" --vpc-id "${AWS_VPC_ID}" --tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=${AWS_NETWORK_NAME}}]" --region "${AWS_REGION}" --availability-zone "${AWS_ZONE}"
52+
AWS_SUBNET_ID=$(aws ec2 describe-subnets --filters Name=tag:Name,Values=capo-e2e-mynetwork --region "${AWS_REGION}" --query '*[0].SubnetId' --output text)
53+
# It's also the route table of the VPC
54+
AWS_SUBNET_ROUTE_TABLE_ID=$(aws ec2 describe-route-tables --filters "Name=vpc-id,Values=${AWS_VPC_ID}" --region "${AWS_REGION}" --query '*[0].RouteTableId' --output text)
55+
56+
aws ec2 create-security-group --group-name "${AWS_NETWORK_NAME}" --description "${AWS_NETWORK_NAME}" --vpc-id "${AWS_VPC_ID}" --tag-specifications "ResourceType=security-group,Tags=[{Key=Name,Value=${AWS_NETWORK_NAME}}]" --region="${AWS_REGION}"
57+
AWS_SECURITY_GROUP_ID=$(aws ec2 describe-security-groups --filters Name=tag:Name,Values=capo-e2e-mynetwork --region "${AWS_REGION}" --query '*[0].GroupId' --output text)
58+
59+
aws ec2 authorize-security-group-ingress --group-id "${AWS_SECURITY_GROUP_ID}" --protocol tcp --port 22 --cidr 0.0.0.0/0 --region="${AWS_REGION}"
60+
61+
# Documentation to enable internet access for subnet:
62+
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesConnecting.html#TroubleshootingInstancesConnectionTimeout
63+
aws ec2 create-internet-gateway --tag-specifications "ResourceType=internet-gateway,Tags=[{Key=Name,Value=${AWS_NETWORK_NAME}}]" --region="${AWS_REGION}"
64+
aws ec2 attach-internet-gateway --internet-gateway-id "${AWS_INTERNET_GATEWAY_ID}" --vpc-id "${AWS_VPC_ID}" --region="${AWS_REGION}"
65+
AWS_INTERNET_GATEWAY_ID=$(aws ec2 describe-internet-gateways --filters Name=tag:Name,Values=capo-e2e-mynetwork --region "${AWS_REGION}" --query '*[0].InternetGatewayId' --output text)
66+
67+
aws ec2 create-route --route-table-id "${AWS_SUBNET_ROUTE_TABLE_ID}" --destination-cidr-block 0.0.0.0/0 --gateway-id "${AWS_INTERNET_GATEWAY_ID}" --region "${AWS_REGION}"
68+
aws ec2 create-route --route-table-id "${AWS_SUBNET_ROUTE_TABLE_ID}" --destination-ipv6-cidr-block ::/0 --gateway-id "${AWS_INTERNET_GATEWAY_ID}" --region "${AWS_REGION}"
69+
fi
70+
fi
71+
}
72+
73+
function create_vm {
74+
local name=$1 && shift
75+
local ip=$1 && shift
76+
local userdata=$1 && shift
77+
local public=$1 && shift # Unused by AWS
78+
79+
if [[ $(aws ec2 describe-instances --filters Name=tag:Name,Values="${name}" --region="${AWS_REGION}" --query 'length(*[0])') = "0" ]];
80+
then
81+
AWS_SUBNET_ID=$(aws ec2 describe-subnets --filters Name=tag:Name,Values=capo-e2e-mynetwork --region "${AWS_REGION}" --query '*[0].SubnetId' --output text)
82+
AWS_SECURITY_GROUP_ID=$(aws ec2 describe-security-groups --filters Name=tag:Name,Values=capo-e2e-mynetwork --region "${AWS_REGION}" --query '*[0].GroupId' --output text)
83+
84+
# /dev/sda1 is renamed to /dev/nvme0n1 by AWS
85+
aws ec2 run-instances --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=${name}}]" \
86+
--region "${AWS_REGION}" \
87+
--placement "AvailabilityZone=${AWS_ZONE}" \
88+
--image-id "${AWS_AMI}" \
89+
--instance-type "${AWS_MACHINE_TYPE}" \
90+
--block-device-mappings 'DeviceName=/dev/sda1,Ebs={VolumeSize=300}' \
91+
--subnet-id "${AWS_SUBNET_ID}" \
92+
--private-ip-address "${ip}" \
93+
--count 1 \
94+
--associate-public-ip-address \
95+
--security-group-ids "${AWS_SECURITY_GROUP_ID}" \
96+
--key-name "${AWS_KEY_PAIR}" \
97+
--user-data "file://${userdata}" \
98+
--no-paginate
99+
fi
100+
101+
# wait a bit so the server has time to get a public ip
102+
sleep 30
103+
}
104+
105+
function get_public_ip {
106+
aws ec2 describe-instances --filters "Name=tag:Name,Values=${CLUSTER_NAME}-controller" --region "${AWS_REGION}" \
107+
--query 'Reservations[*].Instances[*].PublicIpAddress' --output text
108+
}
109+
110+
function get_mtu {
111+
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html
112+
echo 1300
113+
}
114+
115+
function get_ssh_public_key {
116+
cat "${SSH_PUBLIC_KEY_FILE}"
117+
}
118+
119+
function get_ssh_private_key_file {
120+
echo "${SSH_PRIVATE_KEY_FILE}"
121+
}
122+
123+
function cloud_cleanup {
124+
echo Not implemented
125+
exit 1
126+
}

hack/ci/cloud-init/common.yaml.tpl

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
#cloud-config
2+
runcmd:
3+
- sysctl -p /etc/sysctl.d/devstack.conf
4+
- /root/devstack.sh
5+
final_message: "The system is finally up, after $UPTIME seconds"
6+
users:
7+
- name: cloud
8+
lock_passwd: true
9+
sudo: ALL=(ALL) NOPASSWD:ALL
10+
ssh_authorized_keys:
11+
- ${SSH_PUBLIC_KEY}
12+
# Infrastructure packages required:
13+
# python3 - required by sshuttle
14+
# git - required to obtain devstack
15+
# jq - required by devstack-common.sh
16+
packages:
17+
- python3
18+
- git
19+
- jq
20+
package_upgrade: true
21+
write_files:
22+
- path: /etc/sysctl.d/devstack.conf
23+
permissions: 0644
24+
content: |
25+
net.ipv4.ip_forward=1
26+
net.ipv4.conf.default.rp_filter=0
27+
net.ipv4.conf.all.rp_filter=0
28+
- path: /tmp/devstack-common.sh
29+
permissions: 0644
30+
content: |
31+
# ensure nested virtualization
32+
function ensure_kvm {
33+
sudo modprobe kvm-intel
34+
if [ ! -c /dev/kvm ]; then
35+
echo /dev/kvm is not present
36+
exit 1
37+
fi
38+
}
39+
40+
function run_devstack {
41+
su - stack -c "TERM=vt100 /opt/stack/devstack/stack.sh"
42+
}
43+
44+
function upload_images {
45+
# Add environment variables for auth/endpoints
46+
echo 'source /opt/stack/devstack/openrc admin admin' >> /opt/stack/.bashrc
47+
48+
# Upload the images so we don't have to upload them from Prow
49+
su - stack -c "source /opt/stack/devstack/openrc admin admin && /opt/stack/devstack/tools/upload_image.sh https://storage.googleapis.com/artifacts.k8s-staging-capi-openstack.appspot.com/test/ubuntu/2021-03-27/ubuntu-2004-kube-v1.18.15.qcow2"
50+
su - stack -c "source /opt/stack/devstack/openrc admin admin && /opt/stack/devstack/tools/upload_image.sh https://storage.googleapis.com/artifacts.k8s-staging-capi-openstack.appspot.com/test/cirros/2021-03-27/cirros-0.5.1-x86_64-disk.img"
51+
}

0 commit comments

Comments
 (0)