Skip to content

Commit 8176039

Browse files
committed
docs: Add Hacking CI
1 parent 60ae2d0 commit 8176039

File tree

2 files changed

+81
-0
lines changed

2 files changed

+81
-0
lines changed

docs/book/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@
88
- [move from bootstrap](./topics/mover.md)
99
- [trouble shooting](./topics/troubleshooting.md)
1010
- [Development](./development/development.md)
11+
- [Hacking CI](./development/ci.md)

docs/book/src/development/ci.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
2+
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
3+
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
4+
5+
- [Hacking CI for the E2E tests](#hacking-ci-for-the-e2e-tests)
6+
- [Prow](#prow)
7+
- [DevStack](#devstack)
8+
- [Configuration](#configuration)
9+
- [Build order](#build-order)
10+
- [Networking](#networking)
11+
- [Availability zones](#availability-zones)
12+
- [Connecting to DevStack](#connecting-to-devstack)
13+
14+
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
15+
16+
# Hacking CI for the E2E tests
17+
18+
## Prow
19+
20+
CAPO tests are executed by Prow. They are defined in the [Kubernetes test-infra repository](https://github.com/kubernetes/test-infra/tree/master/config/jobs/kubernetes-sigs/cluster-api-provider-openstack). The E2E tests run as a presubmit. They run in a docker container in Prow infrastructure which contains a checkout of the CAPO tree under test. The entry point for tests is `scripts/ci-e2e.sh`, which is defined in the job in Prow.
21+
22+
## DevStack
23+
24+
The E2E tests require an OpenStack cloud to run against, which we provision during the test with DevStack. The project has access to capacity on GCP, so we provision DevStack on 2 GCP instances.
25+
26+
The entry point for the creation of the test DevStack is `hack/ci/create_devstack.sh`, which is executed by `scripts/ci-e2e.sh`. We create 2 instances: `controller` and `worker`. Each will provision itself via cloud-init using config defined in `hack/ci/cloud-init`.
27+
28+
### Configuration
29+
30+
We configure a 2 node DevStack. `controller` is running:
31+
32+
* All control plane services
33+
* Nova: all services, including compute
34+
* Glance: all services
35+
* Octavia: all services
36+
* Neutron: all services with ML2/OVS, including L3 agent
37+
* Cinder: all services, including volume with default LVM/iSCSI backend
38+
39+
`worker` is running:
40+
41+
* Nova: compute only
42+
* Neutron: agent only (not L3 agent)
43+
* Cinder: volume only with default LVM/iSCSI backend
44+
45+
`controller` is using the `n2-standard-16` machine type with 16 vCPUs and 64 GB RAM. `worker` is using the `n2-standard-8` machine type with 8 vCPUs and 32 GB RAM. Each job has a quota limit of 24 vCPUs.
46+
47+
### Build order
48+
49+
We build `controller` first, and then `worker`. We let `worker` build asynchronously because tests which don't require a second AZ can run without it while it builds. A systemd job defined in the cloud-init of `controller` polls for `worker` coming up and automatically configures it.
50+
51+
### Networking
52+
53+
Both instances share a common network which uses the CIDR defined in `PRIVATE_NETORK_CIDR` in `hack/ci/create_devstack.sh`. Each instance has a single IP on this network:
54+
55+
* `controller`: `10.0.2.15`
56+
* `worker`: `10.0.2.16`
57+
58+
In addition, DevStack will create a floating IP network using CIDR defined in `FLOATING_RANGE` in `hack/ci/create_devstack.sh`. As the neutron L3 agent is only running on the controller, all of this traffic is handled on the controller, even if the source is an instance running on the worker. The controller creates `iptables` rules to NAT this traffic.
59+
60+
The effect of this is that instances created on either `controller` or `worker` can get a floating ip from the `public` network. Traffic using this floating IP will be routed via `controller` and externally via NAT.
61+
62+
### Availability zones
63+
64+
We are running `nova compute` and `cinder volume` on each of `controller` and `worker`. Each `nova compute` and `cinder volume` are configured to be in their own availability zone. The names of the availability zones are defined in `OPENSTACK_FAILURE_DOMAIN` and `OPENSTACK_FAILURE_DOMAIN_ALT` in `test/e2e/data/e2e_conf.yaml`, with the services running on `controller` being in `OPENSTACK_FAILURE_DOMAIN` and the services running on `worker` being in `OPENSTACK_FAILURE_DOMAIN_ALT`.
65+
66+
This configuration is intended only to allow the testing of functionality related to availability zones, and does not imply any robustness to failure.
67+
68+
Nova is configured (via `[DEFAULT]/default_schedule_zone`) to place all workloads on the controller unless they have an explicit availability zone. The intention is that `controller` should have the capacity to run all tests which are agnostic to availability zones. This means that the explicitly multi-az tests do not risk failure due to capacity issues.
69+
70+
However, this is not sufficient because by default [CAPI explicitly schedules the control plane across all discovered availability zones](https://github.com/kubernetes-sigs/cluster-api/blob/e7769d7a6b3a4eb32292938eed8c470b7018a8b3/controlplane/kubeadm/controllers/scale.go#L77-L82). Consequently we explicitly confine all clusters to `OPENSTACK_FAILURE_DOMAIN` (`controller`) in the test cluster definitions in `test/e2e/data/infrastructure-openstack`.
71+
72+
## Connecting to DevStack
73+
74+
The E2E tests running in Prow create a kind cluster. This also running in Prow using Docker in Docker. The E2E tests configure this cluster with clusterctl, which is where CAPO executes.
75+
76+
`create_devstack.sh` wrote a `clouds.yaml` to the working directory, which is passed to CAPO via the cluster definitions in `test/e2e/data/infrastructure-openstack`. This `clouds.yaml` references the public, routable IP of `controller`. However, DevStack created all the service endpoints using `controller`'s private IP, which is not publicly routable. In addition, the tests need to be able to SSH to the floating IP of the Bastion. This floating IP is also allocated from a range which is not publicly routable.
77+
78+
To allow this access we run `sshuttle` from `create_devstack.sh`. This creates an SSH tunnel and routes traffic for `PRIVATE_NETWORK_CIDR` and `FLOATING_RANGE` over it.
79+
80+
Note that the semantics of a `sshuttle` tunnel are problematic. While they happen to work currently for DinD, Podman runs the kind cluster in a separate network namespace. This means that kind running in podman cannot route over `sshuttle` running outside the kind cluster. This may also break in future versions of Docker.

0 commit comments

Comments
 (0)