Skip to content

containerd 2.1.3: image pull hangs silently on certain registries (range-request not handled) #5404

@nathanael-h

Description

@nathanael-h

Bug Description

On MicroK8s 1.35 with containerd 2.1.3, pulling images from certain OCI-compliant registries hangs indefinitely with no error output. The TCP connection is established and TLS completes successfully, but no data flows — containerd simply stalls waiting for a response that never arrives.

Environment

  • MicroK8s: 1.35
  • containerd: 2.1.3
  • Nodes: multi-node cluster (control plane + worker nodes)
  • Issue appears on worker nodes pulling from external registries

Symptoms

  • crictl pull or microk8s ctr images pull hangs indefinitely with no error
  • The OCI index resolves successfully ("already exists"), but platform manifest fetches stay stuck at "waiting"
  • ss confirms the TCP connection to the registry is established
  • curl to the same registry endpoint works correctly, returning proper HTTP responses
  • No timeout, no error — just silence in containerd logs
  • Kubernetes Pods using an image from the affected registry stay pending forever. I had also this error with Jobs which never complete their init containers, leading to context deadline exceeded in Helm pre-install hooks

Root Cause

containerd 2.1 introduced a multipart layer fetch feature that sends Range: bytes=0-N HTTP headers to enable parallel downloads. Some registries respond with HTTP 200 (full content) rather than 206 Partial Content when they do not support or choose to ignore range requests.

containerd 2.1.3 does not handle this case — the fetch goroutines hang indefinitely waiting for a partial-content response that will never come. This is tracked upstream as containerd/containerd#11864.

Three fixes in the upstream release/2.1 branch are relevant:

Upstream commit Description First included in
34a1cb1dd Deadlock: semaphore not released on error in dockerFetcher.open() v2.1.4
add2dcf86 Fetcher doesn't always close response body and call Release() v2.1.4
ca3de4fe7 Range-get request ignored by registry not surfaced as errContentRangeIgnored v2.1.6

The third fix (ca3de4fe7) is the most directly relevant: it ensures that when a registry ignores the Range header and returns a full 200 response, containerd detects this and falls back gracefully rather than hanging.

Workaround (confirmed working)

Create a per-host config for the affected registry under $SNAP_DATA/args/certs.d/:

/var/snap/microk8s/current/args/certs.d/<registry-hostname>/hosts.toml
server = "https://<registry-hostname>"

[host."https://<registry-hostname>"]
  capabilities = ["pull", "resolve"]
  dial_timeout = "30s"

Then restart containerd:

sudo snap restart microk8s.daemon-containerd

Suggested Fix

Bump the containerd version in build-scripts/components/containerd/version.sh from v2.1.3 to v2.1.6 (released 2025-12-17).

-echo "v2.1.3"
+echo "v2.1.6"

No patch changes are needed. The existing patches/v2.1.3/ directory is automatically selected by the version selector in build-scripts/print-patches-for.py for any target version ≥ v2.1.3, and the sideload patch applies cleanly to v2.1.6 (it only adds new files with no conflicts).

v2.1.6 also includes an update to the vendored golang.org/x/net/http2 transport (196 lines changed), which may further improve HTTP/2 reliability with various registries.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions