|
| 1 | +--- |
| 2 | +title: IRSA Support for Self-Managed Clusters |
| 3 | +authors: |
| 4 | + - "@luthermonson" |
| 5 | +reviewers: |
| 6 | + - "@richardcase" |
| 7 | + - "@Skarlso" |
| 8 | +creation-date: 2023-03-17 |
| 9 | +last-updated: 2023-03-17 |
| 10 | +status: provisional |
| 11 | +see-also: [] |
| 12 | +replaces: [] |
| 13 | +superseded-by: [] |
| 14 | +--- |
| 15 | + |
| 16 | +# Add Support for IRSA to Non-Managed Clusters |
| 17 | + |
| 18 | +## Table of Contents |
| 19 | + |
| 20 | +- [Add Support for IRSA to Non-Managed Clusters](#launch-templates-for-managed-machine-pools) |
| 21 | + - [Table of Contents](#table-of-contents) |
| 22 | + - [Glossary](#glossary) |
| 23 | + - [Summary](#summary) |
| 24 | + - [Motivation](#motivation) |
| 25 | + - [Goals](#goals) |
| 26 | + - [Non-Goals/Future Work](#non-goalsfuture-work) |
| 27 | + - [Proposal](#proposal) |
| 28 | + - [User Stories](#user-stories) |
| 29 | + - [Story 1](#story-1) |
| 30 | + - [Requirements](#requirements) |
| 31 | + - [Functional Requirements](#functional-requirements) |
| 32 | + - [Non-Functional Requirements](#non-functional-requirements) |
| 33 | + - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) |
| 34 | + - [Security Model](#security-model) |
| 35 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 36 | + - [Alternatives](#alternatives) |
| 37 | + - [Upgrade Strategy](#upgrade-strategy) |
| 38 | + - [Additional Details](#additional-details) |
| 39 | + - [Test Plan](#test-plan) |
| 40 | + - [Graduation Criteria](#graduation-criteria) |
| 41 | + - [Implementation History](#implementation-history) |
| 42 | + |
| 43 | +## Glossary |
| 44 | + |
| 45 | +- [CAPA](https://cluster-api.sigs.k8s.io/reference/glossary.html#capa) - Cluster API Provider AWS. |
| 46 | +- [CAPI](https://github.com/kubernetes-sigs/cluster-api) - Cluster API. |
| 47 | +- [IRSA](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) - IAM Roles for Service Accounts |
| 48 | +- [pod-identity-webhook](https://github.com/aws/amazon-eks-pod-identity-webhook) - Pod Identity Webhook Repo |
| 49 | + |
| 50 | +## Summary |
| 51 | +The IAM Roles for Service Accounts take the access control enabled by IAM and bridge the gap to Kubernetes by adding role-based access to service accounts. CAPA users of self-managed clusters can now give granular role-based access to the AWS API at a pod level. |
| 52 | + |
| 53 | +## Motivation |
| 54 | +This functionality is currently built into EKS, with a simple boolean in the AWSManagedCluster API called `AssociateOIDCProvider` CAPA will build an IAM OIDC provider for the cluster and create a trust policy template in a config map to be used for created IAM Roles. Self-managed clusters can use IRSA but require additional manual steps already done in Managed Clusters, including patching kube-api-server, creating an OIDC provider and deploying the `pod-identity-webhook`, which is documented in their [self-hosted setup](https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/SELF_HOSTED_SETUP.md) walkthrough but with CAPA style ingredients like using the management cluster, kubeadm config modification and the built-in serving certs' OpenID Configuration API endpoints. |
| 55 | + |
| 56 | +The pieces to IRSA are easily created with the existing access for CAPA. By adding `AssociateOIDCProvider` to `AWSCluster` we can kick off a reconciliation process to generate all pieces necessary to utilize IRSA in your self-managed cluster. |
| 57 | + |
| 58 | +### Goals |
| 59 | + |
| 60 | +1. On cluster creation, add all components to self-managed clusters to use IAM Roles for Service Accounts. |
| 61 | +2. On cluster deletion, remove all external dependencies from the AWS account. |
| 62 | + |
| 63 | +### Non-Goals/Future Work |
| 64 | +- Migrate all IAM work for Managed cluster to the IAM service. |
| 65 | +- S3 bucket code currently dies when the bucket exists, needs to see if the bucket exists, we can write to it to reuse one bucket for multiple clusters. |
| 66 | +- S3 bucket code creates a client that is locked to the region chosen for the cluster, not all regions support S3 and the code should be smarter and here are some options. |
| 67 | + - Add a region to the s3 bucket configs and reconfigure the client is set, default to the AWS default of us-east-1 if empty string |
| 68 | + - S3 enabled regions is a finite list, we could take the cluster region and see if s3 enabled and default to us-east-1 if no match |
| 69 | + - Force all buckets to S3 default region us-east-1 |
| 70 | + |
| 71 | +## Proposal |
| 72 | +- Create a boolean on `AWSCluster` called `AssociateOIDCProvider` to match the `AWSManagedCluster` API and have a default value of `false`. |
| 73 | +- Migrate the status types for `OIDCProvider` out of the experimental EKS APIs and into the v1beta2 APIs. |
| 74 | +- Build an IAM cloud service and add a reconciler to work to persist all components required for IRSA; the logic is as follows. |
| 75 | + 1. Create a self-signed issuer for the workload cluster namespace to be used to make the pod identity webhook serving cert. |
| 76 | + 2. Generate the patch file and update kubeadm configs to write the patch to disk for the control plane nodes. |
| 77 | + 3. Create the Identity Provider in IAM pointed to the S3 bucket. |
| 78 | + 4. Pause the reconciler until the workload cluster is online, as we have created all the pieces we can without a working kube api, the `AWSMachine` controller has additional code to annotate the `AWSCluster` if a control plane node is up and if the management cluster has a kubeconfig which will unpause our reconciler. |
| 79 | + 5. Copy the [JWKS](https://auth0.com/docs/secure/tokens/json-web-tokens/json-web-key-sets) and OpenID Configuration from the kubeapi to the S3 bucket. |
| 80 | + 6. Create all kube components in the workload cluster to run the pod-identity-webhook |
| 81 | + 7. Create the trust policy boilerplate configmap in the workload cluster |
| 82 | + |
| 83 | +Identical to the EKS implementation, a trust policy document boilerplate will reference the ARN for the Identity Provider created in step 3. This can be used to generate IAM roles, and the ARNs for those roles can be annotated on a service account. The pod-identity-webhook works by watching all service accounts and pods. When it finds a pod using a service account with the annotation, it will inject AWS STS Tokens via environment variables generated from the role ARN. |
| 84 | + |
| 85 | +### S3 Bucket |
| 86 | +A previous implementation for ignition support added an S3 bucket to support the configuration needed for ignition boots. The original functionality used two sub-folders, `control-plane` and `node`. These remain the same in this proposal with an addition of a new folder which matches the CAPA cluster name and makes a directory structure like the following. |
| 87 | + |
| 88 | +``` |
| 89 | +unique-s3-bucket-name/ |
| 90 | +|-- cluster1 |
| 91 | +| |-- .well-known |
| 92 | +| `-- openid |
| 93 | +| `-- v1 |
| 94 | +|-- cluster2 |
| 95 | +| |-- .well-known |
| 96 | +| `-- openid |
| 97 | +| `-- v1 |
| 98 | +|-- control-plane |
| 99 | +`-- node |
| 100 | +``` |
| 101 | + |
| 102 | +**Note**: today the code does not support reusing an S3 bucket as it errors if the bucket exists but support can be added to catch the exist error and attempt to write to the bucket to confirm access and reuse it for another cluster. |
| 103 | + |
| 104 | +### Sample YAML |
| 105 | +To add IRSA Support to an self-managed cluster your AWSCluster YAML will look something like the following. |
| 106 | + |
| 107 | +``` |
| 108 | +--- |
| 109 | +apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 |
| 110 | +kind: AWSCluster |
| 111 | +metadata: |
| 112 | + name: capi-quickstart |
| 113 | + namespace: default |
| 114 | +spec: |
| 115 | + region: us-west-2 |
| 116 | + sshKeyName: luther |
| 117 | + associateOIDCProvider: true |
| 118 | + s3Bucket: |
| 119 | + name: capi-quickstart-1234 # regionally unique, be careful of name clashes with other AWS users |
| 120 | + nodesIAMInstanceProfiles: |
| 121 | + - nodes.cluster-api-provider-aws.sigs.k8s.io |
| 122 | + controlPlaneIAMInstanceProfile: control-plane.cluster-api-provider-aws.sigs.k8s.io |
| 123 | +``` |
| 124 | + |
| 125 | +### User Stories |
| 126 | + |
| 127 | +Story 1: |
| 128 | +As an EKS cluster user who uses IRSA I want to... |
| 129 | +- Migrate to self-managed clusters and maintain the same AWS API access |
| 130 | + |
| 131 | +Story 2: |
| 132 | +As a self-managed cluster user I want to... |
| 133 | +- Give pods granular access to the AWS API based on IAM Roles |
| 134 | + |
| 135 | +### Security Model |
| 136 | + |
| 137 | +Access to the necessary CRDs is already declared for the controllers, and we are not adding any new kinds, so there is no change. |
| 138 | + |
| 139 | +Since the jwks and openid config need public access the S3 Bucket config will need to be modified to allow both private and public access to objects. This is done by setting `PublicAccessBlockConfiguration` to false setting bucket ownership to `BucketOwnerPreferred` |
| 140 | + |
| 141 | +Additional Permissions granted to the IAM Policies as follows |
| 142 | + |
| 143 | +**Controllers Policy** |
| 144 | +- iam:CreateOpenIDConnectProvider |
| 145 | +- iam:DeleteOpenIDConnectProvider |
| 146 | +- iam:ListOpenIDConnectProviders |
| 147 | +- iam:GetOpenIDConnectProvider |
| 148 | +- iam:TagOpenIDConnectProvider |
| 149 | +- s3:PutBucketOwnershipControls |
| 150 | +- s3:PutObjectAcl |
| 151 | +- s3:PutBucketPublicAccessBlock |
| 152 | + |
| 153 | +### Risks and Mitigations |
| 154 | + |
| 155 | + |
| 156 | +## Alternatives |
| 157 | + |
| 158 | +The process to install everything to use IRSA is documented and could be done by hand if necessary, but CAPA has complete control over the pieces needed and auto-mating this through a reconciler would make the feature on par with the existing functionality for Managed Clusters. |
| 159 | + |
| 160 | +#### Benefits |
| 161 | + |
| 162 | +This approach makes IRSA in self-managed clusters relatively trivial. The kube-api-server patch is tricky to manage by hand, and CAPA already has access to all the AWS Infrastructure it needs to auto-manage this problem. |
| 163 | + |
| 164 | +#### Downsides |
| 165 | + |
| 166 | +- Might be too much for CAPA to manage and not worth the complexity. |
| 167 | + |
| 168 | +#### Decision |
| 169 | + |
| 170 | +## Upgrade Strategy |
| 171 | +Moving the OIDCProvider type from the experimental EKS API to the v1beta2 API for both cluster types will have converters for upgrading and downgrading. Through testing we can confirm but IRSA should be able to be added to a cluster after the fact, CAPA will need to patch kube-apiserver and create new control planes and the upgrade process should make this process seamless. |
| 172 | + |
| 173 | +## Additional Details |
| 174 | + |
| 175 | +### Test Plan |
| 176 | +* Test creating a cluster, confirm all pieces work and have a simple AWS CLI example with a service account attached to a pod and exec commands successfully gaining auth through STS tokens attached via environment variables. |
| 177 | +* Test deleting a cluster and confirm all AWS components are removed (s3 bucket contents, management cluster configmaps, etc.) |
| 178 | +* Test upgrading a cluster with no IRSA to add the feature and confirm all components deployed successfully and test the AWS CLI example. |
| 179 | + |
| 180 | +### Graduation Criteria |
| 181 | + |
| 182 | +## Implementation History |
| 183 | + |
| 184 | +- [x] 2023-03-22: Open proposal (PR) |
| 185 | +- [x] 2023-02-22: WIP Implementation (PR)[https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/4094] |
| 186 | + |
| 187 | +<!-- Links --> |
| 188 | +[community meeting]: https://docs.google.com/document/d/1iW-kqcX-IhzVGFrRKTSPGBPOc-0aUvygOVoJ5ETfEZU/edit# |
| 189 | +[discussion]: https://github.com/kubernetes-sigs/cluster-api-provider-aws/discussions/4153 |
0 commit comments