Skip to content

Harden CI Flow and Checks#2040

Open
aauren wants to merge 15 commits intomasterfrom
harden_ci_flow_and_checks
Open

Harden CI Flow and Checks#2040
aauren wants to merge 15 commits intomasterfrom
harden_ci_flow_and_checks

Conversation

@aauren
Copy link
Copy Markdown
Collaborator

@aauren aauren commented Mar 22, 2026

CI Security Hardening Pull Request

What type of PR is this?

feature

What this PR does / why we need it:

Hardens the CI pipeline against supply chain attacks and improves release artifact trust. I tried to do this in a way that would balance our small maintainer team and the medium size of this repo with best practices.

This is also necessary / timely because of upcoming EU CRA compliance which requires SBOMs for software used in the EU. While kube-router isn't a sold product, its still probably a good practice to begin adopting.

This PR is chained off of #2035 and is meant to be merged after #2035 is merged and a rebase has been performed.

CI refactor:

  • Splits the monolithic ci.yml into an orchestrating caller (ci.yml) and three reusable
    workflow_call workflows (ci-checks.yml, ci-container.yml, ci-release.yml), plus a local
    composite action for the repeated checkout + setup-go steps. All jobs remain sequential via
    needs: and PR status checks are fully preserved.
  • ci-unicode-check has been added to check for any malicious content that might be sent with PRs but are otherwise hidden from reviews using special unicode characters.

Supply chain hardening:

  • All 14 third-party GitHub Actions pinned by commit SHA with version comments. Existing Dependabot
    github-actions config will maintain these automatically.
  • Base images (golang:1.25.7-alpine3.23, alpine:3.23) pinned by digest in both ci.yml and
    the Makefile to ensure identical builds locally and in CI.

Code quality fixes:

  • CodeQL given an explicit languages: go to prevent silent scan failures if autobuild heuristics
    fail

Release artifact trust:

  • Container images are keyless-signed with cosign (Sigstore) on all tag pushes; signatures logged
    in Rekor
  • SPDX-JSON SBOMs generated for container images and attested to DockerHub via cosign attest
    verifiable directly from the image without visiting GitHub
  • CycloneDX-JSON SBOM for release binaries attached to each GitHub release
  • SLSA Build Level 2 provenance for release binaries via actions/attest

CVE scanning:

  • make scan added to the Makefile (Docker-first, local fallback, same BUILD_IN_DOCKER pattern
    as all other targets) using Grype to scan the locally-built container image
  • make prep-release now includes scan as its final step
  • .grype.yaml configures only-fixed: true (suppresses Alpine CVEs with no upstream patch) and
    ignores the self-referential kube-router finding
  • Grype was not added to CI after evaluating the trade-offs: Alpine CVEs with no fix available,
    newly published transitive dependency CVEs, and complex conditional logic for bugfix vs
    new-release branches would create uncontrollable CI failures for a small maintainer team. CodeQL
    and Dependabot continue to provide automated coverage. OpenSSF Scorecard is unaffected — no
    Scorecard check evaluates whether container image CVE scanning runs in CI.

OpenSSF Scorecard:

  • New scorecard.yml workflow runs on push to master, weekly, and on branch protection rule changes
  • Results published to the public Scorecard API and surfaced as code scanning alerts in the Security
    tab
  • README badge added

gotestsum pinned:

  • GOTESTSUM_VERSION=v1.13.0 added alongside other tool version constants; both @latest
    references replaced

Which issue(s) this PR is related to:

N/A

Was AI used during the creation of this PR?

  • What tool was used: Claude (claude-sonnet-4-6) via OpenCode CLI
  • To what extent was the tool used? The AI drafted and implemented the entire PR. The human
    directed the work, made all architectural decisions, reviewed every phase before committing, and
    pushed back on several proposals (e.g. removing Grype from CI, workflow_run vs workflow_call,
    env var vs hardcoded versions).
  • How detailed of a plan? Very detailed — a multi-phase plan with rationale for each tool
    selection (including upstream health evaluation of alternatives) was created and reviewed before
    implementation began. The plan lives in .plans/CiSecurityHardening/PLAN.md.
  • Human in the loop? Yes — each phase was paused for human review and an explicit commit before
    proceeding. Several AI proposals were rejected or revised based on human judgment.

What, if any, amount of integration testing was done with this change in a Kubernetes environment?

No Kubernetes integration testing — this PR touches only CI workflows, the Makefile, and repository
configuration files. No kube-router runtime behaviour is changed. make scan was validated locally
against an existing built image.

Does this PR introduce a breaking change?

NONE

Anything else the reviewer should know that wasn't already covered?

Permissions model: workflow_call requires permissions to be granted in the caller (ci.yml) —
called workflows cannot self-elevate. The release job explicitly grants contents: write,
id-token: write, and attestations: write. The container job grants id-token: write and
attestations: write. Both are commented in the file.

Digest pins will drift: The base image digests in ci.yml and the Makefile will go stale as
Alpine and Go release patches. These are intentionally pinned for build reproducibility and CVE scan
consistency between local and CI environments. They should be updated as part of normal dependency
maintenance via make update-deps. Dependabot does not currently track env-var image references,
so this is a manual step for now.

First Scorecard run: The score will be zero/unavailable until the workflow runs on master for
the first time. Several checks (e.g. Branch-Protection, Code-Review) depend on repository
settings rather than code and may require separate configuration to improve.

aauren added 7 commits March 22, 2026 16:06
Previously, this was done manually by humans and was therefore not
always done consistently. Sometimes dependencies would be missed,
other times dependencies would not be updated at all.

Additionally, we only used tags which, while good from a release point
of view, were not proof against supply chain attacks. This automates the
process to hopefully bring in a sense of consistently and allow us to
leverage SHA sums to guard against supply chain attacks.
Ensure that the go version (and others) is the same across all points of
reference. In the case of golang, we start by derriving the available go
version from our distro of choice (Alpine) to ensure that it is used the
same everywhere.
@aauren aauren requested review from catherinetcai and mrueg March 22, 2026 23:41
@aauren aauren force-pushed the harden_ci_flow_and_checks branch from 087e338 to 74383fa Compare March 22, 2026 23:49
aauren added 7 commits March 22, 2026 18:56
Attempts to bound the context a bit when people have to look at these
files by splitting them across multiple files and making each one
logical part of the CI lifecycle.
With the prevalance of recent supply chain attacks, this helps avert
dependency tampering with re-released versions by pinning to specific
SHA sums.

This is fully compliant with dependabot as it will update both the SHA
and the commented version when it does its updates.

This also helps prepare for OpenSSF integration by hardening the CI
process.
When this is not explicitely set, codeql still works, but if anything
ever changes (with autodetection) in the future, it will just silently
succeed without producing results. This corrects that by explicitely
saying that we want it to look for golang.
Adds a scan target which is automatically added to the prep-release
target that checks for grype vulnerabilities during the release
preparation flow.
@aauren aauren force-pushed the harden_ci_flow_and_checks branch from 74383fa to ac1d5fb Compare March 22, 2026 23:58
@aauren aauren force-pushed the harden_ci_flow_and_checks branch from 3738670 to 03fd533 Compare March 23, 2026 01:21
-w /go/src/github.com/cloudnativelabs/kube-router $(DOCKER_BUILD_IMAGE) \
sh -c \
'go install gotest.tools/gotestsum@latest && CGO_ENABLED=0 gotestsum --format gotestdox -- -timeout 30s github.com/cloudnativelabs/kube-router/v2/cmd/kube-router/ github.com/cloudnativelabs/kube-router/v2/...'
'go install gotest.tools/gotestsum@$(GOTESTSUM_VERSION) && CGO_ENABLED=0 gotestsum --format gotestdox -- -timeout 30s github.com/cloudnativelabs/kube-router/v2/cmd/kube-router/ github.com/cloudnativelabs/kube-router/v2/...'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use go tool / go mod tool for this instead of go install. See also: https://tip.golang.org/doc/modules/managing-dependencies#tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants