Skip to content

Commit 052c9ea

Browse files
authored
Merge pull request #3365 from richardcase/launch_templates_caep
docs: proposal for using launch templates with machine pools
2 parents 0402f41 + 0af915f commit 052c9ea

File tree

1 file changed

+259
-0
lines changed

1 file changed

+259
-0
lines changed
Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
---
2+
title: Launch Templates for Managed Machine Pools
3+
authors:
4+
- "@richardcase"
5+
reviewers:
6+
- "@sedefsavas"
7+
- "@richardchen331"
8+
creation-date: 2021-12-10
9+
last-updated: 2022-03-29
10+
status: provisional
11+
see-also: []
12+
replaces: []
13+
superseded-by: []
14+
---
15+
16+
# Launch Templates for Managed Machine Pools
17+
18+
## Table of Contents
19+
20+
- [Launch Templates for Managed Machine Pools](#launch-templates-for-managed-machine-pools)
21+
- [Table of Contents](#table-of-contents)
22+
- [Glossary](#glossary)
23+
- [Summary](#summary)
24+
- [Motivation](#motivation)
25+
- [Goals](#goals)
26+
- [Non-Goals/Future Work](#non-goalsfuture-work)
27+
- [Proposal](#proposal)
28+
- [User Stories](#user-stories)
29+
- [Story 1](#story-1)
30+
- [Story 2](#story-2)
31+
- [Story 3](#story-3)
32+
- [Story 4](#story-4)
33+
- [Story 5](#story-5)
34+
- [Requirements](#requirements)
35+
- [Functional Requirements](#functional-requirements)
36+
- [Non-Functional Requirements](#non-functional-requirements)
37+
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
38+
- [Security Model](#security-model)
39+
- [Risks and Mitigations](#risks-and-mitigations)
40+
- [Alternatives](#alternatives)
41+
- [New `AWSLaunchTemplate` CRD & Controller](#new-awslaunchtemplate-crd--controller)
42+
- [Benefits](#benefits)
43+
- [Downsides](#downsides)
44+
- [Decision](#decision)
45+
- [Upgrade Strategy](#upgrade-strategy)
46+
- [Additional Details](#additional-details)
47+
- [Test Plan](#test-plan)
48+
- [Graduation Criteria](#graduation-criteria)
49+
- [Implementation History](#implementation-history)
50+
51+
## Glossary
52+
53+
- [CAPA](https://cluster-api.sigs.k8s.io/reference/glossary.html#capa) - Cluster API Provider AWS.
54+
- [CAPI](https://github.com/kubernetes-sigs/cluster-api) - Cluster API.
55+
- [Launch Template](https://docs.aws.amazon.com/autoscaling/ec2/userguide/LaunchTemplates.html) - a configuration template that is used to configure an AWS EC2 instance when its created.
56+
- [ASG](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html) - an auto-scale group that represents a pool of EC2 instances that can scale up & down automatically.
57+
58+
## Summary
59+
60+
Currently, with CAPA we have 2 varieties of **Machine Pools** implemented called `AWSMachinePool` and `AWSManagedMachinePool`. Each variety has a differing level of support for [launch templates](https://docs.aws.amazon.com/autoscaling/ec2/userguide/LaunchTemplates.html).
61+
62+
The `AWSMachinePool` is used to create an **ASG** who's EC2 instances are used as worker nodes for the Kubernetes cluster. The specification for `AWSMachinePool` exposes settings that are ultimately used to create a EC2 launch template (and version of it thereafter) via the `AWSLaunchTemplate` field and struct:
63+
64+
```go
65+
// AWSLaunchTemplate specifies the launch template and version to use when an instance is launched.
66+
// +kubebuilder:validation:Required
67+
AWSLaunchTemplate AWSLaunchTemplate `json:"awsLaunchTemplate"`
68+
```
69+
70+
([source](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/exp/api/v1beta1/awsmachinepool_types.go#L67))
71+
72+
The `AWSManagedMachinePool` is used to create a [EKS managed node group](https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html) which results in an AWS managed **ASG** being created that utilises AWS managed EC2 instances. In the spec for `AWSManagedMachinePool` we expose details of the pool to create but we don't support using a launch template, and we don't automatically create launch templates (like we do for `AWSMachinePool`). There have been a number of requests from users of CAPA that have wanted to use `AWSManagedMachinePool` but we don't expose required functionality that only comes with using launch templates.
73+
74+
This proposal outlines changes to CAPA that will introduce new capabilities to utilise launch templates for `AWSManagedMachinePool` and brings its functionality in line with `AWSMachinePool`.
75+
76+
## Motivation
77+
78+
We are increasingly hearing requests from users of CAPA that a particular feature / configuration option isn't exposed by CAPAs implementation of managed machine pools (i.e. `AWSManagedMachinePool`) and on investigation the feature is available via a launch template (nitro enclaves or placement as an example). In some instances, users of CAPA have had to use unmanaged machine pools (i.e. `AWSMachinePool`) instead.
79+
80+
The motivation is to improve consistency between the 2 varieties of machine pools and expose to the user features of launch templates.
81+
82+
> Note: it may not be completely consistent in the initial implementation as we may need to deprecate some API definitions over time but the plan will be to be eventually consistent ;)
83+
84+
### Goals
85+
86+
- Consistent API to use launch templates for `AWSMachinePool` and `AWSManagedMachinePool`
87+
- Single point of reconciliation of launch templates
88+
- Guide to the deprecation of certain API elements in `AWSManagedMachinePool`
89+
90+
### Non-Goals/Future Work
91+
92+
- Add non-existent controller unit tests for `AWSMachinePool` and `AWSManagedMachinePool`
93+
94+
## Proposal
95+
96+
At a high level, the plan is to:
97+
98+
1. Add a new `AWSLaunchTemplate` field to [AWSManagedMachinePoolSpec](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/exp/api/v1beta1/awsmanagedmachinepool_types.go#L65) that uses the existing [AWSLaunchTemplate](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/ec057ad6e613a6578f67bf68a6c77fbe772af933/exp/api/v1beta1/types.go#L58) struct. For example:
99+
100+
```go
101+
// AWSLaunchTemplate specifies the launch template and version to use when an instance is launched. This field
102+
// will become mandatory in the future and its recommended you use this over fields AMIType,AMIVersion,InstanceType,DiskSize,InstanceProfile.
103+
// +optional
104+
AWSLaunchTemplate AWSLaunchTemplate `json:"awsLaunchTemplate"`
105+
```
106+
107+
2. Update the comments on the below fields of [AWSManagedMachinePoolSpec](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/9bc29570614aa7123d79f042b6e6efc2aaf3e490/exp/api/v1beta1/awsmanagedmachinepool_types.go#L65) to indicate that the fields is deprecated and that `AWSlaunchTemplate` should be used.
108+
- AMIVersion
109+
- AMIType
110+
- DiskSize
111+
- InstanceType
112+
3. Add new `LaunchTemplateID` and `LaunchTemplateVersion` fields to [AWSManagedMachinePoolStatus](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/9bc29570614aa7123d79f042b6e6efc2aaf3e490/exp/api/v1beta1/awsmanagedmachinepool_types.go#L171) to store details of the launch template and version used.
113+
4. Add a new `LaunchTemplateVersion` field to [AWSMachinePoolStatus](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/exp/api/v1beta1/awsmachinepool_types.go#L112) to store the version of the launch template used.
114+
5. [Refactor the code](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/ec057ad6e613a6578f67bf68a6c77fbe772af933/exp/controllers/awsmachinepool_controller.go#L383) from the `AWSMachinePool` controller that reconciles `AWSLaunchTemplate` into a common location so that it can be shared.
115+
6. Update the controller for `AWSManagedMachinePool` to use the `AWSLaunchTemplate` reconciliation logic.
116+
7. Add checks in the `AWSManagedMachinePool` create/update validation webhooks that stops users specifying `AWSLaunchTemplate` if fields `AMIType,AMIVersion,InstanceType,DiskSize,InstanceProfile` are set
117+
8. Add warning logs to the `AWSManagedMachinePool` create/update validation webhooks if fields `AMIType,AMIVersion,InstanceType,DiskSize,InstanceProfile` stating that these fields will be deprecated in the future and that `AWSLaunchTemplate` should be used instead
118+
> An area that is undecided upon is should we auto convert the `AMIType,AMIVersion,InstanceType,DiskSize,InstanceProfile` fields if specified into a `AWSLaunchTemplate`. We should investigate this as part of implementation.
119+
10. Update the cluster templates that use `AWSManagedMachinePool` so that they use `AWSLaunchTemplate`
120+
11. Update the API version roundtrip tests for v1alpha4<->v1beta1 conversions of `AWSManagedMachinePool`
121+
12. Update the EKS e2e tests to add an additional test step where we create an additional managed machine pool using `AWSLaunchTemplate`.
122+
13. Update any relevant documentation
123+
14. Release note must mention that "action is required" in the future, as fields are being deprecated.
124+
15. Ensure that we capture the field deprecations for future removal in an API version bump.
125+
126+
### User Stories
127+
128+
#### Story 1
129+
130+
AS a CAPA user
131+
I want to create a managed machine pool using a launch template
132+
So that I can use functionality from the AWS launch template
133+
134+
#### Story 2
135+
136+
As a CAPA user
137+
I want to have consistency between managed and unmanaged machine pools
138+
So that I can choose which to use based on whether I want managed and not based on missing functionality
139+
140+
#### Story 3
141+
142+
As a CAPA user
143+
I want to ensure that changes to the pool result in a new version of the launch templates
144+
So that I can see a history of the changes in the console
145+
146+
#### Story 4
147+
148+
As a CAPA user
149+
I want the controller to clean up old launch templates / launch template versions
150+
So that I don't have to worry about cleaning up old versions and so i don't exceed the AWS limits
151+
(see [AWS docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-launch-templates.html) for limits)
152+
153+
#### Story 5
154+
155+
As a CAPA user
156+
I want to be able to use the output of a bootstrap provider in my launch template
157+
So that i can bootstrap Kubernetes on the nodes
158+
159+
### Requirements
160+
161+
#### Functional Requirements
162+
163+
**FR1:** CAPA MUST continue to support using launch templates with non-managed ASG based machine pools (i.e. `AWSMachinePool`).
164+
165+
**FR2:** CAPA MUST support using launch templates with EKS managed nodegroup based machine pools (i.e. `AWSManagedMachinePool`).
166+
167+
**FR3:** CAPA MUST provide a consistent declarative API to expose Launch Template configuration to the machine pool implementations.
168+
169+
**FR4:** CAPA MUST manage the lifecycle of a launch template in AWS based on its declaration.
170+
171+
**FR5:** CAPA MUST version launch templates in AWS.
172+
173+
**FR6:** CAPA MUST allow keeping a configurable number of previous versions of launch templates.
174+
175+
**FR7:** CAPA MUST validate the declarations for `AWSLaunchTemplate`
176+
177+
#### Non-Functional Requirements
178+
179+
**NFR1:** CAPA MUST provide logging and tracing to expose the progress of reconciliation of `AWSLaunhcTemplate`.
180+
181+
**NFR2:** CAPA MUST raise events at important milestones during reconciliation.
182+
183+
**NFR3:** CAPA MUST requeue where possible and not wait during reconciliation so as to free up the reconciliation loop
184+
185+
**NFR4:** CAPA must have e2e tests that cover usage of launch templates with BOTH variants of machine pools.
186+
187+
### Implementation Details/Notes/Constraints
188+
189+
The code in [reconcileLaunchTemplate](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/ec057ad6e613a6578f67bf68a6c77fbe772af933/exp/controllers/awsmachinepool_controller.go#L383) must be refactored into a package that can be use by the `AWSManagedMachinePool` controller as well. We could think about shifting more of this functionality into the "ec2" service.
190+
191+
Cleaning up old versions of launch templates is currently handled by [PruneLaunchTemplateVersions](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/ec057ad6e613a6578f67bf68a6c77fbe772af933/pkg/cloud/services/ec2/launchtemplate.go#L265) which is sufficient for this change. We may want to make the minimum number of versions to keep configurable in the future but this can be covered by a different change.
192+
193+
### Security Model
194+
195+
There are no changes required to the security model. Access to the required CRDs is already declared for the controllers and as we are not adding any new kinds this doesn't need to change.
196+
197+
No change is required to the AWS permissions the controller requires for reconciliation.
198+
199+
### Risks and Mitigations
200+
201+
The risk is that we are being constrained by the existing API definition used in unmanaged machine pools. This may raise unforeseen issues.
202+
203+
## Alternatives
204+
205+
### New `AWSLaunchTemplate` CRD & Controller
206+
207+
The idea is that a `AWSLaunchTemplate` CRD would be created with an associated controller. The controller would then be responsible for reconciling the definition and managing the lifecycle of launch templates on AWS.
208+
209+
#### Benefits
210+
211+
- Single point of reconciliation and lifecycle management of launch templates in AWS.
212+
- Separate lifecycle per launch template. So, we can change the number of previous instances to keep etc.
213+
214+
#### Downsides
215+
216+
- Additional complexity of orchestrating the creation of the launch template with the bootstrap data. The machine pool reconcilers would need to wait for the bootstrap data and the launch template.
217+
- Would require deprecation of fields in 2 CRDs (i.e both machine pool varieties).
218+
219+
#### Decision
220+
221+
As `AWSMachinePool` already managed launch templates, it was felt that we should follow the same approach for consistency and it would be a smaller change.
222+
223+
We can revisit the idea of a separate launch template kind in the future. The proposed change in this proposal will not preclude implementing this alternative in the future.
224+
225+
## Upgrade Strategy
226+
227+
The changes we are making to `AWSManagedMachinePool` are optional. Therefore, current users do not have to use the new `AWSLaunchTemplate` field. On upgrading there will be a new log entry written that informs the user that certain fields will be deprecated in the future.
228+
229+
## Additional Details
230+
231+
### Test Plan
232+
233+
- There are currently no controller unit tests for the machine pools in CAPA. We do need to add tests, but this can be done as part of a separate change.
234+
- The EKS e2e tests will need to be updated so that a managed machine pool is created with a launch template specified.
235+
236+
### Graduation Criteria
237+
238+
With this proposal, we are planning to deprecate a number of fields on `AWSManagedMachinePool`
239+
240+
The current API version is **beta level** and this normally means:
241+
242+
- We must support the beta API for 9 months or 3 releases (whichever is longer). See [rule 4a](https://kubernetes.io/docs/reference/using-api/deprecation-policy/)
243+
244+
However, the machine pools feature is marked as experimental in CAPI/CAPA and as such it has to be explicitly enabled via a feature flag. Therefore its proposed that we remove the deprecated fields when we bump the api version from v1beta. As part of the field removal we will update the API conversion functions to automatically populate `AWSLaunchTemplate` on create.
245+
246+
## Implementation History
247+
248+
- [x] 2021-12-10: Initial WIP proposal created
249+
- [x] 2021-12-13: Discussed in [community meeting]
250+
- [x] 2022-01-14: Discussions between richardcase and richardchen331 on slack
251+
- [x] 2022-02-04: Updated proposal based on discussions
252+
- [x] 2022-02-05: Created proposal [discussion]
253+
- [x] 2022-02-07: Present proposal at a [community meeting]
254+
- [x] 2022-02-05: Open proposal PR
255+
- [x] 2022-03-29: Updated based on review feedback
256+
257+
<!-- Links -->
258+
[community meeting]: https://docs.google.com/document/d/1iW-kqcX-IhzVGFrRKTSPGBPOc-0aUvygOVoJ5ETfEZU/edit#
259+
[discussion]: https://github.com/kubernetes-sigs/cluster-api-provider-aws/discussions/3154

0 commit comments

Comments
 (0)