|
| 1 | +# Security Patches and CVEs in ACK |
| 2 | + |
| 3 | +## Background |
| 4 | + |
| 5 | +The AWS Controllers for Kubernetes (ACK) project plays a role in bridging the |
| 6 | +gap between Kubernetes clusters and various AWS services, offering users a way |
| 7 | +to manage and integrate their AWS resources within Kubernetes environments. |
| 8 | +However, like any software project, ACK is not immune to security vulnerabilities, |
| 9 | +particularly those identified with CVE scanners such as [quay][quay] and |
| 10 | +[trivy][trivy] reports. These vulnerabilities may arise from outdated |
| 11 | +dependencies, including libraries, base images, and compilers, leaving the |
| 12 | +delivered controller for our users vulnerable to security risks. |
| 13 | + |
| 14 | +## Problem statement |
| 15 | + |
| 16 | +The core of our current issue lies in the fact that, until recently, ACK's |
| 17 | +images and components relied on outdated fundemental elements. For example |
| 18 | +a significant number of reported CVEs pointed out the usage of an outdated |
| 19 | +`eks-distro-build-minimal` image and an older Go compiler (version `1.19`) |
| 20 | +during the compilation process. This situation persisted because ACK |
| 21 | +contributors had to manually update these dependencies. Given the rapid pace |
| 22 | +at which these dependencies were updated, it was easy to forget about them. |
| 23 | +This presented a substantial challenge to the project's security stance. |
| 24 | + |
| 25 | +To address these vulnerabilities and improve the security of ACK controllers, |
| 26 | +we had to take some actions: The first of these was upgrading the |
| 27 | +`eks-distro-build-minimal` image and the Go compiler to version `1.21` |
| 28 | +(the latest one as of september 2023), the second was re-releasing |
| 29 | +new controller versions using those up-to-date depdendecies. While this |
| 30 | +solved an important portionn of the security issues, it did not entirely |
| 31 | +mitigate the risk, as it remained necessary to proactively keep an eye the |
| 32 | +newest dependencies and ship the necessary patches. |
| 33 | + |
| 34 | +## Purpose of this document |
| 35 | + |
| 36 | +Right now, the project faces the challenge of automating the process of updating |
| 37 | +the controllers "in-image-runtime" and dependencies to address security |
| 38 | +vulnerabilities without disrupting the already established release chain. This |
| 39 | +document explores the existing procedures for releasing new controllers, the |
| 40 | +current approach to handling CVEs, and proposes solutions to enhance those |
| 41 | +processes. |
| 42 | + |
| 43 | +In a nutchel, the objective is to ensure that the ACK project can proactively |
| 44 | +manage security patches, minimize CVE reports in its images, and keep its |
| 45 | +dependencies up-to-date. Achieving this goal requires a careful consideration |
| 46 | +of the project's release workflows, security update procedures, and |
| 47 | +the intersection of these two aspects. The solutions proposed here aim to strike |
| 48 | +a balance between automation and control, facilitating the project's ability to |
| 49 | +rapidly respond to security vulnerabilities while maintaining a stable and |
| 50 | +reliable release pace. |
| 51 | + |
| 52 | +## Scope |
| 53 | + |
| 54 | +- Development and implementation of strategies for automating the detection |
| 55 | + and mitigation of security vulnerabilities. |
| 56 | +- Integration of security patching processes into the existing ACK release |
| 57 | + workflows and procedures |
| 58 | +- Updates and changes to the project's code and infrastructure necessary to |
| 59 | + automate security patching. |
| 60 | +- Solutions that ensure the proactive management of security patches while |
| 61 | + maintaining the reliability of the project's release process. (needs rewording) |
| 62 | + |
| 63 | +## Out of scope |
| 64 | + |
| 65 | +- In-depth discussions or analysis of specific CVEs or individual security |
| 66 | + vulnerabilities. |
| 67 | +- Completly reworking the release chain of ACK controllers containers and |
| 68 | + helm charts. |
| 69 | +- Non-security-related updates or enhancements to the ACK project. |
| 70 | + |
| 71 | +## Description of the current release chain |
| 72 | + |
| 73 | +[//]: <> (move this section to test-infra repository) |
| 74 | + |
| 75 | +## Current release process |
| 76 | + |
| 77 | +In ACK, The release process is carefully structured and includes steps |
| 78 | +such as detecting code changes, testing, approvals, git tagging, prow jobs, |
| 79 | +and documentation updates. The process ensures that customers have access |
| 80 | +to the latest controller functionality and improvements while maintaining |
| 81 | +compatibility with the Kubernetes environments |
| 82 | + |
| 83 | +Keep in mind that ACK releases involve two primary deliverable that are shipped |
| 84 | +for users: |
| 85 | +- controller container image. |
| 86 | +- Helm chart release. |
| 87 | + |
| 88 | +The release process follows a structured workflow that can be described |
| 89 | +as followed: |
| 90 | +- ACK maintainers and contributors make changes to a specific controller. |
| 91 | +- a release pull request (PR) is created, which includes version updates |
| 92 | + in the Helm chart. (incrementing from the previous release, e.g., `0.0.6`) |
| 93 | +- E2E and unit tests have to pass in the raised PR ^ |
| 94 | +- One of the maintainers reviews and approves the PR, typically by adding |
| 95 | + an `/lgtm`` comment. |
| 96 | +- Prow (tide component) merge to the main branch, incorporating the changes. |
| 97 | +- A Prow job detects the changes and tags the repository with the next |
| 98 | + patch release version, such as `0.0.6` |
| 99 | +- A GitHub action automatically creates a GitHub release, detailing the |
| 100 | + changes since the last release. (add example here) |
| 101 | +- Multiple Prow jobs are triggered to release various artifacts: |
| 102 | + - A container image is tagged with the new release version, such as `0.0.6` (*1) |
| 103 | + - A Helm chart is tagged with the new release version (`0.0.6`) (out of scope) |
| 104 | + - Documentation updates are made in the community repository. (out of scope) |
| 105 | + - A pull request is raised for integration with Operator Lifecycle Manager |
| 106 | + (OLM) to ship artifacts to the operator hub platforms. (out of scope) |
| 107 | + |
| 108 | +[//]: <> (NOTES(a-hilaly): insert diagrams) |
| 109 | + |
| 110 | +(*1): The container image used in releases depends on a base image defined |
| 111 | +in the project's code, linked to the [code-generator][code-generator]'s |
| 112 | +Dockerfile and the environment variables and images in [test-infra][test-infra] |
| 113 | +Prow jobs. |
| 114 | + |
| 115 | +### CVE reports |
| 116 | + |
| 117 | +When a CVE is reported for a specific image, ACK maintainers face a series of |
| 118 | +tasks. The primary actions revolve around updating the project's components |
| 119 | +to mitigate the identified security risk. This typically involves five main |
| 120 | +steps: |
| 121 | + |
| 122 | +- Base image version bumping for the main [Dockerfile][code-generator-dockerfile] |
| 123 | + used to build the controllers images. |
| 124 | +- Go compiler version bumping in the environment variables of the prowjobs |
| 125 | + responsible of releasing container images and helm charts. |
| 126 | +- ACK runtime depdencies bumping, those generally made by dependabot by ACK |
| 127 | + maintainers can also raise PR to address similar issues ([example PR][runtime-deps-bump]) |
| 128 | +- re-release the [code-generator][code-generator] to open PR bumping the |
| 129 | + versions and depedencies for all the controllers ([example prowjob][code-generator-autogen]) |
| 130 | +- Maintainers monitor the tests for each the controller repository and merges |
| 131 | + and merges the PRs as soon as they pass. |
| 132 | + |
| 133 | +[//]: <> (NOTES(a-hilaly): insert diagram) |
| 134 | + |
| 135 | +## Solutions |
| 136 | + |
| 137 | +The goal of this section is to find a balance between enhancing security and |
| 138 | +maintaining the efficiency and reliabiliity of the current release process. |
| 139 | +The objective is to proactively address security vulnerabilities, particularly |
| 140 | +CVEs, while minimizing disruptions to the already-functioning release chain. |
| 141 | + |
| 142 | +### Solution 1: Periodic scanning and CVE detection (prefered) |
| 143 | + |
| 144 | +To address these challenges while ensuring timely security updates, we propose |
| 145 | +implementing a solution focused on detecting CVEs, newly released base images |
| 146 | +and Go releases periodically. Here's how it would work: |
| 147 | + |
| 148 | +- **Automated Detection**: implement automated scripts periodically scan for |
| 149 | + CVE reports related to ACK's dependencies, track updates to base images, |
| 150 | + and monitor Go releases. These scripts would run at regular intervals, |
| 151 | + ensuring that security vulnerabilities and dependency changes are quickly |
| 152 | + identified. |
| 153 | + |
| 154 | +- **Pull Request Generation**: when a security vulnerability, new base image |
| 155 | + or Go release is detected, an automated process opens pull requests against |
| 156 | + the concerned repositories. |
| 157 | + |
| 158 | +- **Notification to Maintainers**: Parallely, notifications are sent to |
| 159 | + project maintainers, providing them with essential information about the |
| 160 | + detected vulnerabilities, dependency updates, and the a link to the generated |
| 161 | + PRs. Maintainers would have a clear overview of the necessary actions required. |
| 162 | + |
| 163 | +- **Maintainer Actions**: maintainers review the generated PRs and take |
| 164 | + appropriate actions. This may involve temporarily reverting unfinished |
| 165 | + features, making necessary code adjustments, or completing unfinished |
| 166 | + features before merging the PRs. |
| 167 | + |
| 168 | +#### Technical details: |
| 169 | + |
| 170 | +[//]: <> (NOTES(a-hilaly): develop each section with technical details on the approach and technologies that will be leveraged) |
| 171 | + |
| 172 | +### Solution 2: Tweaking the release process, by introducing branch based releases |
| 173 | + |
| 174 | +[//]: <> (NOTES(a-hilaly): add more solutions if needed) |
| 175 | + |
| 176 | +[//]: <> (NOTES(a-hilaly): add more solutions if needed) |
| 177 | + |
| 178 | +[quay]: https://github.com/quay/clair |
| 179 | +[trivy]: https://github.com/aquasecurity/trivy |
| 180 | +[code-generator]: https://github.com/aws-controllers-k8s/code-generator |
| 181 | +[code-generator-dockerfile]: https://github.com/aws-controllers-k8s/code-generator/blob/main/Dockerfile |
| 182 | +[code-generator-autogen]: https://prow.ack.aws.dev/view/s3/ack-prow-logs/logs/auto-generate-controllers/1702457878050246656 |
| 183 | +[runtime]: https://github.com/aws-controllers-k8s/runtime |
| 184 | +[runtime-deps-bump]: https://github.com/aws-controllers-k8s/runtime/pull/125 |
| 185 | +[test-infra]: https://github.com/aws-controllers-k8s/test-infra |
| 186 | +[test-infra-release-prowjob]: https://github.com/aws-controllers-k8s/test-infra/blob/main/prow/jobs/jinja/postsubmits/controller_release.jinja2#L2-L33 |
0 commit comments