Hello,
I'm opening this issue to raise awareness of one potential impact of the recent glibc upgrade within Bottlerocket (2.42->2.43).
After upgrading to Bottlerocket 1.57, we observed widespread container initialization failures where runc was OOMKilled. This affected containers with memory requests/limits below ~14MiB, and only on aarch64.
Example symptoms:
- In
dmesg: memory cgroup out of memory: Killed process 5338 (runc:[2:INIT])
- In pod events:
Warning Failed 3s (x2 over 4s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?)
We have attributed this issue to glibc 2.43 enabling 2MiB Transparent Huge Pages (THP) by default in malloc on aarch64. Our experimentation revealed that when THP was disabled (echo never > /sys/kernel/mm/transparent_hugepage/enabled), runc's memory utilization would be similar to Bottlerocket 1.56 aarch64 (where glibc is 2.42) or Bottlerocket 1.57 amd64 (where glibc is 2.43, but 2MB THP is not enabled).
I don't think this is necessarily actionable by Bottlerocket, but perhaps there should be a more prominent warning about this change or this particular situation.
More broadly, I feel as if k8s should have some mechanism to guarantee that runc (or equivalent) is given the resources it needs to function, rejecting any configuration that would result in failures first at runtime.
runc may also wish to call attention to an increase in baseline resource requirements on aarch64.
Hello,
I'm opening this issue to raise awareness of one potential impact of the recent glibc upgrade within Bottlerocket (2.42->2.43).
After upgrading to Bottlerocket 1.57, we observed widespread container initialization failures where
runcwas OOMKilled. This affected containers with memory requests/limits below ~14MiB, and only on aarch64.Example symptoms:
dmesg:memory cgroup out of memory: Killed process 5338 (runc:[2:INIT])Warning Failed 3s (x2 over 4s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?)We have attributed this issue to glibc 2.43 enabling 2MiB Transparent Huge Pages (THP) by default in malloc on aarch64. Our experimentation revealed that when THP was disabled (
echo never > /sys/kernel/mm/transparent_hugepage/enabled),runc's memory utilization would be similar to Bottlerocket 1.56 aarch64 (where glibc is 2.42) or Bottlerocket 1.57 amd64 (where glibc is 2.43, but 2MB THP is not enabled).I don't think this is necessarily actionable by Bottlerocket, but perhaps there should be a more prominent warning about this change or this particular situation.
More broadly, I feel as if k8s should have some mechanism to guarantee that
runc(or equivalent) is given the resources it needs to function, rejecting any configuration that would result in failures first at runtime.runcmay also wish to call attention to an increase in baseline resource requirements on aarch64.