Skip to content

Commit bc028c9

Browse files
committed
add IRSA for self-managed clusters proposal
1 parent c42dc92 commit bc028c9

File tree

1 file changed

+189
-0
lines changed

1 file changed

+189
-0
lines changed
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
---
2+
title: IRSA Support for Self-Managed Clusters
3+
authors:
4+
- "@luthermonson"
5+
reviewers:
6+
- "@richardcase"
7+
- "@Skarlso"
8+
creation-date: 2023-03-17
9+
last-updated: 2023-03-17
10+
status: provisional
11+
see-also: []
12+
replaces: []
13+
superseded-by: []
14+
---
15+
16+
# Add Support for IRSA to Non-Managed Clusters
17+
18+
## Table of Contents
19+
20+
- [Add Support for IRSA to Non-Managed Clusters](#launch-templates-for-managed-machine-pools)
21+
- [Table of Contents](#table-of-contents)
22+
- [Glossary](#glossary)
23+
- [Summary](#summary)
24+
- [Motivation](#motivation)
25+
- [Goals](#goals)
26+
- [Non-Goals/Future Work](#non-goalsfuture-work)
27+
- [Proposal](#proposal)
28+
- [User Stories](#user-stories)
29+
- [Story 1](#story-1)
30+
- [Requirements](#requirements)
31+
- [Functional Requirements](#functional-requirements)
32+
- [Non-Functional Requirements](#non-functional-requirements)
33+
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
34+
- [Security Model](#security-model)
35+
- [Risks and Mitigations](#risks-and-mitigations)
36+
- [Alternatives](#alternatives)
37+
- [Upgrade Strategy](#upgrade-strategy)
38+
- [Additional Details](#additional-details)
39+
- [Test Plan](#test-plan)
40+
- [Graduation Criteria](#graduation-criteria)
41+
- [Implementation History](#implementation-history)
42+
43+
## Glossary
44+
45+
- [CAPA](https://cluster-api.sigs.k8s.io/reference/glossary.html#capa) - Cluster API Provider AWS.
46+
- [CAPI](https://github.com/kubernetes-sigs/cluster-api) - Cluster API.
47+
- [IRSA](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) - IAM Roles for Service Accounts
48+
- [pod-identity-webhook](https://github.com/aws/amazon-eks-pod-identity-webhook) - Pod Identity Webhook Repo
49+
50+
## Summary
51+
The IAM Roles for Service Accounts take the access control enabled by IAM and bridge the gap to Kubernetes by adding role-based access to service accounts. CAPA users of self-managed clusters can now give granular role-based access to the AWS API at a pod level.
52+
53+
## Motivation
54+
This functionality is currently built into EKS, with a simple boolean in the AWSManagedCluster API called `AssociateOIDCProvider` CAPA will build an IAM OIDC provider for the cluster and create a trust policy template in a config map to be used for created IAM Roles. Self-managed clusters can use IRSA but require additional manual steps already done in Managed Clusters, including patching kube-api-server, creating an OIDC provider and deploying the `pod-identity-webhook`, which is documented in their [self-hosted setup](https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/SELF_HOSTED_SETUP.md) walkthrough but with CAPA style ingredients like using the management cluster, kubeadm config modification and the built-in serving certs' OpenID Configuration API endpoints.
55+
56+
The pieces to IRSA are easily created with the existing access for CAPA. By adding `AssociateOIDCProvider` to `AWSCluster` we can kick off a reconciliation process to generate all pieces necessary to utilize IRSA in your self-managed cluster.
57+
58+
### Goals
59+
60+
1. On cluster creation, add all components to self-managed clusters to use IAM Roles for Service Accounts.
61+
2. On cluster deletion, remove all external dependencies from the AWS account.
62+
63+
### Non-Goals/Future Work
64+
- Migrate all IAM work for Managed cluster to the IAM service.
65+
- S3 bucket code currently dies when the bucket exists, needs to see if the bucket exists, we can write to it to reuse one bucket for multiple clusters.
66+
- S3 bucket code creates a client that is locked to the region chosen for the cluster, not all regions support S3 and the code should be smarter and here are some options.
67+
- Add a region to the s3 bucket configs and reconfigure the client is set, default to the AWS default of us-east-1 if empty string
68+
- S3 enabled regions is a finite list, we could take the cluster region and see if s3 enabled and default to us-east-1 if no match
69+
- Force all buckets to S3 default region us-east-1
70+
71+
## Proposal
72+
- Create a boolean on `AWSCluster` called `AssociateOIDCProvider` to match the `AWSManagedCluster` API and have a default value of `false`.
73+
- Migrate the status types for `OIDCProvider` out of the experimental EKS APIs and into the v1beta2 APIs.
74+
- Build an IAM cloud service and add a reconciler to work to persist all components required for IRSA; the logic is as follows.
75+
1. Create a self-signed issuer for the workload cluster namespace to be used to make the pod identity webhook serving cert.
76+
2. Generate the patch file and update kubeadm configs to write the patch to disk for the control plane nodes.
77+
3. Create the Identity Provider in IAM pointed to the S3 bucket.
78+
4. Pause the reconciler until the workload cluster is online, as we have created all the pieces we can without a working kube api, the `AWSMachine` controller has additional code to annotate the `AWSCluster` if a control plane node is up and if the management cluster has a kubeconfig which will unpause our reconciler.
79+
5. Copy the [JWKS](https://auth0.com/docs/secure/tokens/json-web-tokens/json-web-key-sets) and OpenID Configuration from the kubeapi to the S3 bucket.
80+
6. Create all kube components in the workload cluster to run the pod-identity-webhook
81+
7. Create the trust policy boilerplate configmap in the workload cluster
82+
83+
Identical to the EKS implementation, a trust policy document boilerplate will reference the ARN for the Identity Provider created in step 3. This can be used to generate IAM roles, and the ARNs for those roles can be annotated on a service account. The pod-identity-webhook works by watching all service accounts and pods. When it finds a pod using a service account with the annotation, it will inject AWS STS Tokens via environment variables generated from the role ARN.
84+
85+
### S3 Bucket
86+
A previous implementation for ignition support added an S3 bucket to support the configuration needed for ignition boots. The original functionality used two sub-folders, `control-plane` and `node`. These remain the same in this proposal with an addition of a new folder which matches the CAPA cluster name and makes a directory structure like the following.
87+
88+
```
89+
unique-s3-bucket-name/
90+
|-- cluster1
91+
| |-- .well-known
92+
| `-- openid
93+
| `-- v1
94+
|-- cluster2
95+
| |-- .well-known
96+
| `-- openid
97+
| `-- v1
98+
|-- control-plane
99+
`-- node
100+
```
101+
102+
**Note**: today the code does not support reusing an S3 bucket as it errors if the bucket exists but support can be added to catch the exist error and attempt to write to the bucket to confirm access and reuse it for another cluster.
103+
104+
### Sample YAML
105+
To add IRSA Support to an self-managed cluster your AWSCluster YAML will look something like the following.
106+
107+
```
108+
---
109+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
110+
kind: AWSCluster
111+
metadata:
112+
name: capi-quickstart
113+
namespace: default
114+
spec:
115+
region: us-west-2
116+
sshKeyName: luther
117+
associateOIDCProvider: true
118+
s3Bucket:
119+
name: capi-quickstart-1234 # regionally unique, be careful of name clashes with other AWS users
120+
nodesIAMInstanceProfiles:
121+
- nodes.cluster-api-provider-aws.sigs.k8s.io
122+
controlPlaneIAMInstanceProfile: control-plane.cluster-api-provider-aws.sigs.k8s.io
123+
```
124+
125+
### User Stories
126+
127+
Story 1:
128+
As an EKS cluster user who uses IRSA I want to...
129+
- Migrate to self-managed clusters and maintain the same AWS API access
130+
131+
Story 2:
132+
As a self-managed cluster user I want to...
133+
- Give pods granular access to the AWS API based on IAM Roles
134+
135+
### Security Model
136+
137+
Access to the necessary CRDs is already declared for the controllers, and we are not adding any new kinds, so there is no change.
138+
139+
Since the jwks and openid config need public access the S3 Bucket config will need to be modified to allow both private and public access to objects. This is done by setting `PublicAccessBlockConfiguration` to false setting bucket ownership to `BucketOwnerPreferred`
140+
141+
Additional Permissions granted to the IAM Policies as follows
142+
143+
**Controllers Policy**
144+
- iam:CreateOpenIDConnectProvider
145+
- iam:DeleteOpenIDConnectProvider
146+
- iam:ListOpenIDConnectProviders
147+
- iam:GetOpenIDConnectProvider
148+
- iam:TagOpenIDConnectProvider
149+
- s3:PutBucketOwnershipControls
150+
- s3:PutObjectAcl
151+
- s3:PutBucketPublicAccessBlock
152+
153+
### Risks and Mitigations
154+
155+
156+
## Alternatives
157+
158+
The process to install everything to use IRSA is documented and could be done by hand if necessary, but CAPA has complete control over the pieces needed and auto-mating this through a reconciler would make the feature on par with the existing functionality for Managed Clusters.
159+
160+
#### Benefits
161+
162+
This approach makes IRSA in self-managed clusters relatively trivial. The kube-api-server patch is tricky to manage by hand, and CAPA already has access to all the AWS Infrastructure it needs to auto-manage this problem.
163+
164+
#### Downsides
165+
166+
- Might be too much for CAPA to manage and not worth the complexity.
167+
168+
#### Decision
169+
170+
## Upgrade Strategy
171+
Moving the OIDCProvider type from the experimental EKS API to the v1beta2 API for both cluster types will have converters for upgrading and downgrading. Through testing we can confirm but IRSA should be able to be added to a cluster after the fact, CAPA will need to patch kube-apiserver and create new control planes and the upgrade process should make this process seamless.
172+
173+
## Additional Details
174+
175+
### Test Plan
176+
* Test creating a cluster, confirm all pieces work and have a simple AWS CLI example with a service account attached to a pod and exec commands successfully gaining auth through STS tokens attached via environment variables.
177+
* Test deleting a cluster and confirm all AWS components are removed (s3 bucket contents, management cluster configmaps, etc.)
178+
* Test upgrading a cluster with no IRSA to add the feature and confirm all components deployed successfully and test the AWS CLI example.
179+
180+
### Graduation Criteria
181+
182+
## Implementation History
183+
184+
- [x] 2023-03-22: Open proposal (PR)
185+
- [x] 2023-02-22: WIP Implementation (PR)[https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/4094]
186+
187+
<!-- Links -->
188+
[community meeting]: https://docs.google.com/document/d/1iW-kqcX-IhzVGFrRKTSPGBPOc-0aUvygOVoJ5ETfEZU/edit#
189+
[discussion]: https://github.com/kubernetes-sigs/cluster-api-provider-aws/discussions/4153

0 commit comments

Comments
 (0)