Skip to content

Commit 45285b9

Browse files
authored
[Design] Security Patches and CVE scanning for ACK (#1911)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
1 parent 603ae93 commit 45285b9

File tree

1 file changed

+186
-0
lines changed

1 file changed

+186
-0
lines changed
Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# Security Patches and CVEs in ACK
2+
3+
## Background
4+
5+
The AWS Controllers for Kubernetes (ACK) project plays a role in bridging the
6+
gap between Kubernetes clusters and various AWS services, offering users a way
7+
to manage and integrate their AWS resources within Kubernetes environments.
8+
However, like any software project, ACK is not immune to security vulnerabilities,
9+
particularly those identified with CVE scanners such as [quay][quay] and
10+
[trivy][trivy] reports. These vulnerabilities may arise from outdated
11+
dependencies, including libraries, base images, and compilers, leaving the
12+
delivered controller for our users vulnerable to security risks.
13+
14+
## Problem statement
15+
16+
The core of our current issue lies in the fact that, until recently, ACK's
17+
images and components relied on outdated fundemental elements. For example
18+
a significant number of reported CVEs pointed out the usage of an outdated
19+
`eks-distro-build-minimal` image and an older Go compiler (version `1.19`)
20+
during the compilation process. This situation persisted because ACK
21+
contributors had to manually update these dependencies. Given the rapid pace
22+
at which these dependencies were updated, it was easy to forget about them.
23+
This presented a substantial challenge to the project's security stance.
24+
25+
To address these vulnerabilities and improve the security of ACK controllers,
26+
we had to take some actions: The first of these was upgrading the
27+
`eks-distro-build-minimal` image and the Go compiler to version `1.21`
28+
(the latest one as of september 2023), the second was re-releasing
29+
new controller versions using those up-to-date depdendecies. While this
30+
solved an important portionn of the security issues, it did not entirely
31+
mitigate the risk, as it remained necessary to proactively keep an eye the
32+
newest dependencies and ship the necessary patches.
33+
34+
## Purpose of this document
35+
36+
Right now, the project faces the challenge of automating the process of updating
37+
the controllers "in-image-runtime" and dependencies to address security
38+
vulnerabilities without disrupting the already established release chain. This
39+
document explores the existing procedures for releasing new controllers, the
40+
current approach to handling CVEs, and proposes solutions to enhance those
41+
processes.
42+
43+
In a nutchel, the objective is to ensure that the ACK project can proactively
44+
manage security patches, minimize CVE reports in its images, and keep its
45+
dependencies up-to-date. Achieving this goal requires a careful consideration
46+
of the project's release workflows, security update procedures, and
47+
the intersection of these two aspects. The solutions proposed here aim to strike
48+
a balance between automation and control, facilitating the project's ability to
49+
rapidly respond to security vulnerabilities while maintaining a stable and
50+
reliable release pace.
51+
52+
## Scope
53+
54+
- Development and implementation of strategies for automating the detection
55+
and mitigation of security vulnerabilities.
56+
- Integration of security patching processes into the existing ACK release
57+
workflows and procedures
58+
- Updates and changes to the project's code and infrastructure necessary to
59+
automate security patching.
60+
- Solutions that ensure the proactive management of security patches while
61+
maintaining the reliability of the project's release process. (needs rewording)
62+
63+
## Out of scope
64+
65+
- In-depth discussions or analysis of specific CVEs or individual security
66+
vulnerabilities.
67+
- Completly reworking the release chain of ACK controllers containers and
68+
helm charts.
69+
- Non-security-related updates or enhancements to the ACK project.
70+
71+
## Description of the current release chain
72+
73+
[//]: <> (move this section to test-infra repository)
74+
75+
## Current release process
76+
77+
In ACK, The release process is carefully structured and includes steps
78+
such as detecting code changes, testing, approvals, git tagging, prow jobs,
79+
and documentation updates. The process ensures that customers have access
80+
to the latest controller functionality and improvements while maintaining
81+
compatibility with the Kubernetes environments
82+
83+
Keep in mind that ACK releases involve two primary deliverable that are shipped
84+
for users:
85+
- controller container image.
86+
- Helm chart release.
87+
88+
The release process follows a structured workflow that can be described
89+
as followed:
90+
- ACK maintainers and contributors make changes to a specific controller.
91+
- a release pull request (PR) is created, which includes version updates
92+
in the Helm chart. (incrementing from the previous release, e.g., `0.0.6`)
93+
- E2E and unit tests have to pass in the raised PR ^
94+
- One of the maintainers reviews and approves the PR, typically by adding
95+
an `/lgtm`` comment.
96+
- Prow (tide component) merge to the main branch, incorporating the changes.
97+
- A Prow job detects the changes and tags the repository with the next
98+
patch release version, such as `0.0.6`
99+
- A GitHub action automatically creates a GitHub release, detailing the
100+
changes since the last release. (add example here)
101+
- Multiple Prow jobs are triggered to release various artifacts:
102+
- A container image is tagged with the new release version, such as `0.0.6` (*1)
103+
- A Helm chart is tagged with the new release version (`0.0.6`) (out of scope)
104+
- Documentation updates are made in the community repository. (out of scope)
105+
- A pull request is raised for integration with Operator Lifecycle Manager
106+
(OLM) to ship artifacts to the operator hub platforms. (out of scope)
107+
108+
[//]: <> (NOTES(a-hilaly): insert diagrams)
109+
110+
(*1): The container image used in releases depends on a base image defined
111+
in the project's code, linked to the [code-generator][code-generator]'s
112+
Dockerfile and the environment variables and images in [test-infra][test-infra]
113+
Prow jobs.
114+
115+
### CVE reports
116+
117+
When a CVE is reported for a specific image, ACK maintainers face a series of
118+
tasks. The primary actions revolve around updating the project's components
119+
to mitigate the identified security risk. This typically involves five main
120+
steps:
121+
122+
- Base image version bumping for the main [Dockerfile][code-generator-dockerfile]
123+
used to build the controllers images.
124+
- Go compiler version bumping in the environment variables of the prowjobs
125+
responsible of releasing container images and helm charts.
126+
- ACK runtime depdencies bumping, those generally made by dependabot by ACK
127+
maintainers can also raise PR to address similar issues ([example PR][runtime-deps-bump])
128+
- re-release the [code-generator][code-generator] to open PR bumping the
129+
versions and depedencies for all the controllers ([example prowjob][code-generator-autogen])
130+
- Maintainers monitor the tests for each the controller repository and merges
131+
and merges the PRs as soon as they pass.
132+
133+
[//]: <> (NOTES(a-hilaly): insert diagram)
134+
135+
## Solutions
136+
137+
The goal of this section is to find a balance between enhancing security and
138+
maintaining the efficiency and reliabiliity of the current release process.
139+
The objective is to proactively address security vulnerabilities, particularly
140+
CVEs, while minimizing disruptions to the already-functioning release chain.
141+
142+
### Solution 1: Periodic scanning and CVE detection (prefered)
143+
144+
To address these challenges while ensuring timely security updates, we propose
145+
implementing a solution focused on detecting CVEs, newly released base images
146+
and Go releases periodically. Here's how it would work:
147+
148+
- **Automated Detection**: implement automated scripts periodically scan for
149+
CVE reports related to ACK's dependencies, track updates to base images,
150+
and monitor Go releases. These scripts would run at regular intervals,
151+
ensuring that security vulnerabilities and dependency changes are quickly
152+
identified.
153+
154+
- **Pull Request Generation**: when a security vulnerability, new base image
155+
or Go release is detected, an automated process opens pull requests against
156+
the concerned repositories.
157+
158+
- **Notification to Maintainers**: Parallely, notifications are sent to
159+
project maintainers, providing them with essential information about the
160+
detected vulnerabilities, dependency updates, and the a link to the generated
161+
PRs. Maintainers would have a clear overview of the necessary actions required.
162+
163+
- **Maintainer Actions**: maintainers review the generated PRs and take
164+
appropriate actions. This may involve temporarily reverting unfinished
165+
features, making necessary code adjustments, or completing unfinished
166+
features before merging the PRs.
167+
168+
#### Technical details:
169+
170+
[//]: <> (NOTES(a-hilaly): develop each section with technical details on the approach and technologies that will be leveraged)
171+
172+
### Solution 2: Tweaking the release process, by introducing branch based releases
173+
174+
[//]: <> (NOTES(a-hilaly): add more solutions if needed)
175+
176+
[//]: <> (NOTES(a-hilaly): add more solutions if needed)
177+
178+
[quay]: https://github.com/quay/clair
179+
[trivy]: https://github.com/aquasecurity/trivy
180+
[code-generator]: https://github.com/aws-controllers-k8s/code-generator
181+
[code-generator-dockerfile]: https://github.com/aws-controllers-k8s/code-generator/blob/main/Dockerfile
182+
[code-generator-autogen]: https://prow.ack.aws.dev/view/s3/ack-prow-logs/logs/auto-generate-controllers/1702457878050246656
183+
[runtime]: https://github.com/aws-controllers-k8s/runtime
184+
[runtime-deps-bump]: https://github.com/aws-controllers-k8s/runtime/pull/125
185+
[test-infra]: https://github.com/aws-controllers-k8s/test-infra
186+
[test-infra-release-prowjob]: https://github.com/aws-controllers-k8s/test-infra/blob/main/prow/jobs/jinja/postsubmits/controller_release.jinja2#L2-L33

0 commit comments

Comments
 (0)