Skip to content

glibc upgrade (2.42->2.43) affects memory utilization on aarch64 #893

@JacobHenner

Description

@JacobHenner

Hello,

I'm opening this issue to raise awareness of one potential impact of the recent glibc upgrade within Bottlerocket (2.42->2.43).

After upgrading to Bottlerocket 1.57, we observed widespread container initialization failures where runc was OOMKilled. This affected containers with memory requests/limits below ~14MiB, and only on aarch64.

Example symptoms:

  • In dmesg: memory cgroup out of memory: Killed process 5338 (runc:[2:INIT])
  • In pod events: Warning Failed 3s (x2 over 4s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?)

We have attributed this issue to glibc 2.43 enabling 2MiB Transparent Huge Pages (THP) by default in malloc on aarch64. Our experimentation revealed that when THP was disabled (echo never > /sys/kernel/mm/transparent_hugepage/enabled), runc's memory utilization would be similar to Bottlerocket 1.56 aarch64 (where glibc is 2.42) or Bottlerocket 1.57 amd64 (where glibc is 2.43, but 2MB THP is not enabled).

I don't think this is necessarily actionable by Bottlerocket, but perhaps there should be a more prominent warning about this change or this particular situation.

More broadly, I feel as if k8s should have some mechanism to guarantee that runc (or equivalent) is given the resources it needs to function, rejecting any configuration that would result in failures first at runtime.

runc may also wish to call attention to an increase in baseline resource requirements on aarch64.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions