Skip to content

Commit 7810a40

Browse files
Merge branch 'main' into vpc-only-backends
2 parents a0632ec + 6e6db1b commit 7810a40

38 files changed

+1321
-373
lines changed

.github/copilot-instructions.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# Cluster API Provider Linode (CAPL) - AI Agent Instructions
2+
3+
## Project Overview
4+
CAPL is a Kubernetes Cluster API infrastructure provider for Linode/Akamai cloud services. It enables declarative management of Kubernetes clusters on Linode infrastructure using native Kubernetes APIs and follows the Cluster API v1beta1 specification.
5+
6+
## Architecture Patterns
7+
8+
### Controller-Scope-Service Architecture
9+
All infrastructure resources follow a three-layer pattern:
10+
1. **Controllers** (`internal/controller/`) - Handle Kubernetes reconciliation events
11+
2. **Scopes** (`cloud/scope/`) - Encapsulate reconciliation context with both K8s and Linode clients
12+
3. **Services** (`cloud/services/`) - Abstract Linode API interactions (loadbalancers, domains, object storage)
13+
14+
### Standard Controller Structure
15+
```go
16+
func (r *LinodeResourceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
17+
// 1. Fetch the resource
18+
// 2. Create scope with clients: scope.NewResourceScope(ctx, r.LinodeClientConfig, params)
19+
// 3. Check pause conditions
20+
// 4. Call reconcile helper with proper defer for status updates
21+
// 5. Handle errors and events
22+
}
23+
```
24+
25+
### Scope Pattern Usage
26+
Every resource controller creates a scope that manages:
27+
- Kubernetes client for CRD operations
28+
- Linode client for API calls
29+
- Resource references and credentials
30+
- Patch helpers for status updates
31+
32+
Example: `scope.NewClusterScope()` combines `LinodeCluster` + CAPI `Cluster` resources.
33+
34+
## Key Resource Types
35+
36+
### Core Infrastructure
37+
- **LinodeCluster**: Cluster networking, load balancers (NodeBalancer/DNS), VPC references, firewalls
38+
- **LinodeMachine**: Compute instances with placement groups, disk configuration, networking
39+
- **LinodeVPC**: Virtual Private Cloud with IPv4/IPv6 subnets
40+
- **LinodeFirewall**: Cloud firewall rules with AddressSet references
41+
- **LinodePlacementGroup**: Anti-affinity constraints for high availability
42+
43+
### API Design Conventions
44+
- **Dual Reference Pattern**: Support both direct IDs (`vpcID: 123`) and K8s object refs (`vpcRef: {name: "vpc-1"}`)
45+
- **Credential References**: All resources support `credentialsRef` for multi-tenancy
46+
- **Immutable Fields**: Use `+kubebuilder:validation:XValidation:rule="self == oldSelf"` for region, type, etc.
47+
- **Status Structure**: Always include `ready`, `failureReason`, `failureMessage`, `conditions`
48+
49+
## Development Workflows
50+
51+
### Build & Test Commands
52+
- `make generate` - Regenerate CRDs and mocks after API changes
53+
- `make test` - Run unit tests with mocked clients
54+
- `make e2e E2E_SELECTOR=quick` - Run specific E2E tests using Chainsaw
55+
- `make lint` - Run golangci-lint with project-specific rules
56+
- `make build` - Build the controller manager binary
57+
58+
### Adding New Resources
59+
1. Define API types in `api/v1alpha2/` with proper validation markers
60+
2. Implement controller in `internal/controller/` following the standard pattern
61+
3. Add scope in `cloud/scope/` for client management
62+
4. Add validation webhook in `internal/webhook/v1alpha2/`
63+
5. Add cloud services in `cloud/services/` if needed
64+
6. Run `make generate` to update CRDs and mocks
65+
7. Add E2E tests in `e2e/<resource>-controller/`
66+
67+
### Testing Patterns
68+
- Unit tests use GoMock with `mock.MockLinodeClient` and `mock.MockK8sClient`
69+
- Mock expectations pattern: `mockLinodeClient.EXPECT().Method().Return(result, error)`
70+
- E2E tests use Chainsaw YAML manifests in `e2e/` directories organized by controller and flavors
71+
- Service tests mock both success and error scenarios from Linode API
72+
- Test naming uses dynamic identifiers: `(join('-', ['e2e', 'feature', env('GIT_REF')]))`
73+
- Table-driven tests with `name`, `objects`, `expectedError`, `expectedResult`, `expectations` structure
74+
75+
## Linode Platform Integration
76+
77+
### Load Balancer Types
78+
- **NodeBalancer**: Linode's managed load balancer for cluster API endpoints
79+
- **DNS**: Uses Linode or Akamai DNS for API endpoint resolution
80+
- **External**: For existing external load balancers
81+
82+
### Networking Features
83+
- **VPC**: Private networking with configurable IPv4/IPv6 subnets
84+
- **Firewalls**: Cloud firewalls with inbound/outbound rules and AddressSet reuse
85+
- **Placement Groups**: Anti-affinity for spreading instances across failure domains
86+
87+
### Bootstrap Integration
88+
- Supports kubeadm, k3s, and rke2 bootstrap providers
89+
- Uses cloud-init with Linode's metadata service
90+
- Object storage integration for large bootstrap payloads via pre-signed URLs
91+
92+
## Common Patterns
93+
94+
### Standard Reconciliation Structure
95+
```go
96+
func (r *Controller) reconcile(ctx context.Context, scope *ScopeType) (res ctrl.Result, err error) {
97+
scope.Resource.Status.Ready = false
98+
scope.Resource.Status.FailureReason = nil
99+
100+
defer func() {
101+
if err != nil {
102+
scope.Resource.Status.FailureReason = util.Pointer("ReconcileError")
103+
scope.Resource.Status.FailureMessage = util.Pointer(err.Error())
104+
}
105+
if patchErr := scope.Close(ctx); patchErr != nil {
106+
err = errors.Join(err, patchErr)
107+
}
108+
}()
109+
110+
// Add finalizer, handle deletion, or ensure resource
111+
}
112+
```
113+
114+
### Error Handling
115+
```go
116+
// Ignore specific HTTP errors
117+
if util.IgnoreLinodeAPIError(err, http.StatusNotFound) != nil {
118+
return fmt.Errorf("failed to get resource: %w", err)
119+
}
120+
121+
// Handle retryable vs terminal errors
122+
if linodego.ErrHasStatus(err, http.StatusBadRequest) {
123+
// Terminal error - set failure reason, don't requeue
124+
return ctrl.Result{}, fmt.Errorf("terminal error: %w", err)
125+
}
126+
// Retryable error - requeue with backoff
127+
return ctrl.Result{RequeueAfter: time.Minute * 5}, nil
128+
```
129+
130+
### Finalizer Management
131+
Add finalizers early in reconciliation, remove during deletion after cleanup.
132+
133+
### Credential Resolution
134+
Controllers resolve credentials from `credentialsRef` Secret or default to cluster-wide token.
135+
136+
### Template System
137+
Cluster templates in `templates/flavors/` define different configurations (VPC vs vpcless, dual-stack networking, etc.).
138+
139+
## Environment Variables
140+
141+
### Core Authentication & API
142+
- `LINODE_TOKEN`: Primary Linode API authentication token (required)
143+
- `LINODE_DNS_TOKEN`: Separate token for DNS operations (optional, defaults to LINODE_TOKEN)
144+
- `LINODE_URL`: Custom Linode API endpoint (optional, for testing/dev environments)
145+
- `LINODE_DNS_URL`: Custom DNS API endpoint (optional)
146+
- `LINODE_DNS_CA`: Custom CA certificate for DNS API (optional)
147+
- `LINODE_CA_BASE64`: Base64-encoded CA certificate for Linode API (optional)
148+
149+
### Akamai Integration
150+
- `AKAMAI_HOST`: Akamai EdgeRC API hostname
151+
- `AKAMAI_CLIENT_TOKEN`: Akamai EdgeRC client token
152+
- `AKAMAI_CLIENT_SECRET`: Akamai EdgeRC client secret
153+
- `AKAMAI_ACCESS_TOKEN`: Akamai EdgeRC access token
154+
155+
### Development & Debugging
156+
- `CAPL_DEBUG`: Enable debug logging and OpenTelemetry tracing (`true`/`false`)
157+
- `CAPL_MONITORING`: Enable Prometheus metrics and Grafana dashboards (`true`/`false`)
158+
- `ENABLE_WEBHOOKS`: Enable/disable admission webhooks (`true`/`false`)
159+
- `GZIP_COMPRESSION_ENABLED`: Enable gzip compression for metadata (`true`/`false`)
160+
- `SKIP_DOCKER_BUILD`: Skip Docker build in Tilt development (`true`/`false`)
161+
- `VERSION`: Build version override
162+
163+
### Provider Installation (Tilt Development)
164+
- `INSTALL_KUBEADM_PROVIDER`: Install kubeadm bootstrap/control-plane providers (`true`/`false`, default: `true`)
165+
- `INSTALL_HELM_PROVIDER`: Install Cluster API Addon Provider Helm (`true`/`false`, default: `true`)
166+
- `INSTALL_K3S_PROVIDER`: Install K3s bootstrap/control-plane providers (`true`/`false`, default: `false`)
167+
- `INSTALL_RKE2_PROVIDER`: Install RKE2 bootstrap/control-plane providers (`true`/`false`, default: `false`)
168+
169+
### Cluster Configuration (Templates)
170+
- `CLUSTER_NAME`: Name for generated clusters
171+
- `LINODE_REGION`: Default Linode region (e.g., `us-ord`, `us-sea`)
172+
- `LINODE_CONTROL_PLANE_MACHINE_TYPE`: Instance type for control plane nodes (e.g., `g6-standard-2`)
173+
- `LINODE_MACHINE_TYPE`: Instance type for worker nodes (e.g., `g6-standard-2`)
174+
- `LINODE_SSH_PUBKEY`: SSH public key for cluster node access
175+
176+
### DNS LoadBalancer Configuration
177+
- `DNS_ROOT_DOMAIN`: Root domain for DNS-based load balancing (e.g., `example.com`)
178+
- `DNS_UNIQUE_ID`: Unique identifier for DNS records (e.g., `abc123`)
179+
180+
### Backup & Storage
181+
- `OBJ_BUCKET_REGION`: Object storage region for etcd backups (e.g., `us-ord`)
182+
- `ETCDBR_IMAGE`: Custom etcd backup/restore controller image
183+
- `SSE_KEY`: Server-side encryption key for object storage
184+
185+
### E2E Testing
186+
- `E2E_SELECTOR`: Chainsaw test selector (`quick`, `all`, `flavors`, `linodecluster`, etc.)
187+
- `E2E_FLAGS`: Additional flags passed to Chainsaw (e.g., `--assert-timeout 10m0s`)
188+
- `CLUSTER_AUTOSCALER_VERSION`: Version for cluster autoscaler tests (e.g., `v1.29.0`)
189+
190+
### OpenTelemetry Tracing
191+
Standard OpenTelemetry environment variables are supported via `autoexport` package:
192+
- `OTEL_EXPORTER_OTLP_ENDPOINT`: OTLP endpoint URL
193+
- `OTEL_EXPORTER_JAEGER_ENDPOINT`: Jaeger endpoint URL
194+
- `OTEL_SERVICE_NAME`: Service name for traces
195+
- `OTEL_RESOURCE_ATTRIBUTES`: Additional resource attributes
196+
197+
## Debugging Tips
198+
- Check controller logs for reconciliation errors and API failures
199+
- Verify Linode API permissions and regional capabilities
200+
- Use `kubectl describe` on resources to see status conditions and events
201+
- E2E test failures often indicate webhook validation or API compatibility issues
202+
- Enable debug logging with `CAPL_DEBUG=true` for detailed tracing
203+
- Validate CRDs are current with `make generate` after API changes
204+
- Generate local-release whenever making changes to the /templates directory. Use command `make local-release` to generate the release files.

.github/workflows/e2e-test.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,7 @@ jobs:
135135
LINODE_CONTROL_PLANE_MACHINE_TYPE: g6-standard-2
136136
LINODE_MACHINE_TYPE: g6-standard-2
137137
CLUSTERCTL_CONFIG: /home/runner/work/cluster-api-provider-linode/cluster-api-provider-linode/e2e/gha-clusterctl-config.yaml
138+
LINODE_CLIENT_TIMEOUT: 30
138139
run: make e2etest
139140

140141
- name: cleanup stale clusters

Tiltfile

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,13 @@ for resource in manager_yaml:
196196
resource["spec"]["template"]["spec"].pop("securityContext")
197197
for container in resource["spec"]["template"]["spec"]["containers"]:
198198
container.pop("securityContext")
199+
timeout_value = os.getenv("LINODE_CLIENT_TIMEOUT")
200+
if timeout_value:
201+
env = container.setdefault("env", [])
202+
env.append({
203+
"name": "LINODE_CLIENT_TIMEOUT",
204+
"value": timeout_value
205+
})
199206

200207
k8s_yaml(encode_yaml_stream(manager_yaml))
201208

api/v1alpha2/linodemachine_types.go

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,45 @@ type LinodeMachineSpec struct {
111111
// VPCID is the ID of an existing VPC in Linode. This allows using a VPC that is not managed by CAPL.
112112
// +optional
113113
VPCID *int `json:"vpcID,omitempty"`
114+
115+
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="Value is immutable"
116+
// IPv6Options defines the IPv6 options for the instance.
117+
// If not specified, IPv6 ranges won't be allocated to instance.
118+
// +optional
119+
IPv6Options *IPv6CreateOptions `json:"ipv6Options,omitempty"`
120+
121+
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="Value is immutable"
122+
// +optional
123+
// NetworkHelper is an option usually enabled on account level. It helps configure networking automatically for instances.
124+
// You can use this to enable/disable the network helper for a specific instance.
125+
// For more information, see https://techdocs.akamai.com/cloud-computing/docs/automatically-configure-networking
126+
// Defaults to true.
127+
NetworkHelper *bool `json:"networkHelper,omitempty"`
128+
}
129+
130+
// IPv6CreateOptions defines the IPv6 options for the instance.
131+
type IPv6CreateOptions struct {
132+
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="Value is immutable"
133+
// EnableSLAAC is an option to enable SLAAC (Stateless Address Autoconfiguration) for the instance.
134+
// This is useful for IPv6 addresses, allowing the instance to automatically configure its own IPv6 address.
135+
// Defaults to false.
136+
// +optional
137+
EnableSLAAC *bool `json:"enableSLAAC,omitempty"`
138+
139+
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="Value is immutable"
140+
// EnableRanges is an option to enable IPv6 ranges for the instance.
141+
// If set to true, the instance will have a range of IPv6 addresses.
142+
// This is useful for instances that require multiple IPv6 addresses.
143+
// Defaults to false.
144+
// +optional
145+
EnableRanges *bool `json:"enableRanges,omitempty"`
146+
147+
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="Value is immutable"
148+
// IsPublicIPv6 is an option to enable public IPv6 for the instance.
149+
// If set to true, the instance will have a publicly routable IPv6 range.
150+
// Defaults to false.
151+
// +optional
152+
IsPublicIPv6 *bool `json:"isPublicIPv6,omitempty"`
114153
}
115154

116155
// InstanceDisk defines a list of disks to use for an instance

api/v1alpha2/zz_generated.deepcopy.go

Lines changed: 40 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

cloud/services/domains.go

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,11 @@ func EnsureLinodeDNSEntries(ctx context.Context, cscope *scope.ClusterScope, ope
174174
if err != nil {
175175
return err
176176
}
177-
domainRecords, err := cscope.LinodeDomainsClient.ListDomainRecords(ctx, domainID, linodego.NewListOptions(0, string(filter)))
177+
178+
listOptions := linodego.NewListOptions(0, string(filter))
179+
listOptions.PageSize = 500 // set a high page size to avoid multiple requests
180+
181+
domainRecords, err := cscope.LinodeDomainsClient.ListDomainRecords(ctx, domainID, listOptions)
178182
if err != nil {
179183
return err
180184
}

cmd/main.go

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,8 +178,23 @@ func validateEnvironment() (linodeConfig, dnsConfig scope.ClientConfig) {
178178
linodeDNSToken = linodeToken
179179
}
180180

181-
return scope.ClientConfig{Token: linodeToken},
182-
scope.ClientConfig{Token: linodeDNSToken, BaseUrl: linodeDNSURL, RootCertificatePath: linodeDNSCA}
181+
linodeClientTimeout := 0
182+
if raw, ok := os.LookupEnv("LINODE_CLIENT_TIMEOUT"); ok {
183+
if timeout, err := strconv.Atoi(raw); timeout > 0 && err == nil {
184+
linodeClientTimeout = timeout
185+
setupLog.Info("LINODE_CLIENT_TIMEOUT set", "timeout", linodeClientTimeout)
186+
} else {
187+
setupLog.Error(fmt.Errorf("invalid LINODE_CLIENT_TIMEOUT value: %s", raw), "using default timeout")
188+
}
189+
}
190+
191+
linodeConfig = scope.ClientConfig{Token: linodeToken}
192+
dnsConfig = scope.ClientConfig{Token: linodeDNSToken, BaseUrl: linodeDNSURL, RootCertificatePath: linodeDNSCA}
193+
if linodeClientTimeout > 0 {
194+
linodeConfig.Timeout = time.Duration(linodeClientTimeout) * time.Second
195+
dnsConfig.Timeout = time.Duration(linodeClientTimeout) * time.Second
196+
}
197+
return linodeConfig, dnsConfig
183198
}
184199

185200
// setupManager initializes and returns a new manager instance with the provided configurations.

0 commit comments

Comments
 (0)