cluster-api-provider-aws-build-docker is frequently OOM killed

/kind bug

Examples from the last few days:
* https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5574/pull-cluster-api-provider-aws-build-docker/1940305333725433856
* https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5569/pull-cluster-api-provider-aws-build-docker/1940149578351251456
* https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5574/pull-cluster-api-provider-aws-build-docker/1940059909945036800
* https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5569/pull-cluster-api-provider-aws-build-docker/1939643011591835648
* https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5572/pull-cluster-api-provider-aws-build-docker/1939623178984755200
* https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5572/pull-cluster-api-provider-aws-build-docker/1939606626604421120

In all cases the build log stops during `goreleaser`, which you'd expect. e.g.:
```
hack/tools/bin/goreleaser build --config .goreleaser.yaml --snapshot --clean
  • starting build...
  • loading                                          path=.goreleaser.yaml
  • skipping validate...
  • loading environment variables
  • getting and validating git state
    • ignoring errors because this is a snapshot     error=couldn't get remote URL: fatal: No remote configured to list refs from.
    • git state                                      commit=none branch=none current_tag=v0.0.0 previous_tag=<unknown> dirty=false
    • pipe skipped                                   reason=disabled during snapshot mode
  • parsing tag
  • setting defaults
  • snapshotting
    • building snapshot...                           version=0.0.0-SNAPSHOT-none
  • checking distribution directory
  • loading go mod information
  • build prerequisites
  • writing effective config file
    • writing                                        config=dist/config.yaml
  • building binaries
    • building                                       binary=dist/clusterctl-aws_windows_arm64/bin/clusterctl-aws.exe
    • building                                       binary=dist/clusterctl-aws_darwin_arm64/bin/clusterctl-aws
    • building                                       binary=dist/clusterctl-aws_darwin_amd64_v1/bin/clusterctl-aws
    • building                                       binary=dist/clusterctl-aws_windows_amd64_v1/bin/clusterctl-aws.exe
    • building                                       binary=dist/clusterctl-aws_linux_amd64_v1/bin/clusterctl-aws
    • building                                       binary=dist/clusterctl-aws_linux_arm64/bin/clusterctl-aws
```

We can see the resource usage of these jobs in Grafana here:

https://monitoring-eks.prow.k8s.io/d/96Q8oOOZk/builds?var-org=kubernetes-sigs&var-repo=cluster-api-provider-aws&var-job=pull-cluster-api-provider-aws-build-docker&orgId=1&from=now-24h&to=now

We can see that the jobs are coming perilously close to their 12G limit, which is already very high. The memory limit is defined here:

https://github.com/kubernetes/test-infra/blob/42c25d98a67da245e2bdf8612766f4c85103fe8c/config/jobs/kubernetes-sigs/cluster-api-provider-aws/cluster-api-provider-aws-presubmits.yaml#L91-L97

Rather than increasing the memory limit still further, I propose restricting the parallelism of these build jobs to bring the memory usage down.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cluster-api-provider-aws-build-docker is frequently OOM killed #5576

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cluster-api-provider-aws-build-docker is frequently OOM killed #5576

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions