Skip to content

Conversation

@antonipp
Copy link

Description

While trying out checkpoint restoration with checkpointctl and containerd, I found a bug which is due to the fact that checkpointctl doesn't write the checkpoint annotations where containerd is looking for them.

In order to confirm that an image is a checkpoint, containerd is looking for OCI Index (application/vnd.oci.image.index.v1+json) annotations:
https://github.com/containerd/containerd/blob/2bc9bdbcc0d0337772ff6cde0198bb60ba381ad3/internal/cri/server/container_checkpoint_linux.go#L100

However, checkpointctl was writing image manifest (application/vnd.oci.image.manifest.v1+json) annotations instead:

_, err := runBuildahCommand("config", "--annotation", fmt.Sprintf("%s=%s", key, value), newContainer)

# buildah config --help | grep annotation
  -a, --annotation annotation               add annotation e.g. annotation=value, for the target image (default [])

So checkpoint images were not recognized by containerd. Example:

# ./checkpointctl build /var/lib/kubelet/checkpoints/checkpoint-anton-test-2-5f95bb8f88-m9jz4_anton-test-anton-test-2025-11-27T13:50:33Z.tar localhost/my-checkpoint:broken
Added annotation: org.criu.checkpoint.rootfsImageUserRequested=image-here:latest
Added annotation: org.criu.checkpoint.rootfsImageName=image-here:latest
Added annotation: org.criu.checkpoint.rootfsImageID=image-here@sha256:9d8e0158085bd492fdd4925c651b9c8cbdebda479962f334aaa813daf1f0e6c6
Added annotation: org.criu.checkpoint.runtime.name=io.containerd.runc.v2
Added annotation: org.criu.checkpoint.engine.name=containerd
Added annotation: org.criu.checkpoint.container.name=anton-test
Added annotation: org.criu.checkpoint.pod.name=anton-test-2-5f95bb8f88-m9jz4
Added annotation: org.criu.checkpoint.pod.namespace=anton-test
2025/11/27 15:11:31 Image 'localhost/my-checkpoint:broken' created successfully from checkpoint '/var/lib/kubelet/checkpoints/checkpoint-anton-test-2-5f95bb8f88-m9jz4_anton-test-anton-test-2025-11-27T13:50:33Z.tar'

The image has manifest annotations:

# buildah inspect localhost/my-checkpoint:broken | jq '.ImageAnnotations'
{
  "org.criu.checkpoint.container.name": "anton-test",
  "org.criu.checkpoint.engine.name": "containerd",
  "org.criu.checkpoint.pod.name": "anton-test-2-5f95bb8f88-m9jz4",
  "org.criu.checkpoint.pod.namespace": "anton-test",
  "org.criu.checkpoint.rootfsImageID": "image-here@sha256:9d8e0158085bd492fdd4925c651b9c8cbdebda479962f334aaa813daf1f0e6c6",
  "org.criu.checkpoint.rootfsImageName": "image-here:latest",
  "org.criu.checkpoint.rootfsImageUserRequested": "image-here:latest",
  "org.criu.checkpoint.runtime.name": "io.containerd.runc.v2",
  "org.opencontainers.image.base.digest": "",
  "org.opencontainers.image.base.name": "",
  "org.opencontainers.image.created": "2025-11-27T15:11:31.132325093Z"
}

And there is no image index (== manifest list) (I think the terminology is a bit confusing here btw 🫤):

# buildah manifest inspect localhost/my-checkpoint:broken | jq '.annotations'
Error: tried manifest is of type application/vnd.oci.image.manifest.v1+json (not a list type): reading image "docker://localhost/my-checkpoint:broken": pinging container registry localhost: Get "https://localhost/v2/": dial tcp 127.0.0.1:443: connect: connection refused

After my fix, the images still have the same annotations as before (for backwards compatibility):

# ./checkpointctl build /var/lib/kubelet/checkpoints/checkpoint-anton-test-2-5f95bb8f88-m9jz4_anton-test-anton-test-2025-11-27T13:50:33Z.tar localhost/my-checkpoint:latest
[...]

# buildah inspect localhost/my-checkpoint:latest | jq '.ImageAnnotations'
{
  "org.criu.checkpoint.container.name": "anton-test",
  "org.criu.checkpoint.engine.name": "containerd",
  "org.criu.checkpoint.pod.name": "anton-test-2-5f95bb8f88-m9jz4",
  "org.criu.checkpoint.pod.namespace": "anton-test",
  "org.criu.checkpoint.rootfsImageID": "image-here@sha256:9d8e0158085bd492fdd4925c651b9c8cbdebda479962f334aaa813daf1f0e6c6",
  "org.criu.checkpoint.rootfsImageName": "image-here:latest",
  "org.criu.checkpoint.rootfsImageUserRequested": "image-here:latest",
  "org.criu.checkpoint.runtime.name": "io.containerd.runc.v2",
  "org.opencontainers.image.base.digest": "",
  "org.opencontainers.image.base.name": "",
  "org.opencontainers.image.created": "2025-11-27T15:12:14.992107826Z"
}

And there are now manifest list annotations to satisfy containerd:

# buildah manifest inspect localhost/my-checkpoint:latest | jq '.annotations'
{
  "org.criu.checkpoint.container.name": "anton-test",
  "org.criu.checkpoint.engine.name": "containerd",
  "org.criu.checkpoint.pod.name": "anton-test-2-5f95bb8f88-m9jz4",
  "org.criu.checkpoint.pod.namespace": "anton-test",
  "org.criu.checkpoint.rootfsImageID": "image-here@sha256:9d8e0158085bd492fdd4925c651b9c8cbdebda479962f334aaa813daf1f0e6c6",
  "org.criu.checkpoint.rootfsImageName": "image-here:latest",
  "org.criu.checkpoint.rootfsImageUserRequested": "image-here:latest",
  "org.criu.checkpoint.runtime.name": "io.containerd.runc.v2"
}

Notes

  • This could potentially be fixed in containerd as well, I don't feel strongly about either way
  • This requires buildah >= 1.35 because the manifest annotate --index flag was added in commit containers/buildah@aca884a. I tested with buildah 1.42 on my machine.
  • Maybe this could be gated behind a flag as well? I'm not sure if enforcing manifest list creation could have side-effects

Signed-off-by: Anton Ippolitov <[email protected]>
@antonipp antonipp force-pushed the ai/fix-containerd-annotations branch from 0a0843b to a5ba66a Compare November 27, 2025 16:18
@adrianreber
Copy link
Member

Interesting. Thanks for the PR.

I was using this script https://github.com/containerd/containerd/blob/2bc9bdbcc0d0337772ff6cde0198bb60ba381ad3/contrib/checkpoint/checkpoint-restore-cri-test.sh while developing the code for containerd. Do you know why I don't need the manifest for my test code?

@github-actions
Copy link

Test Results

61 tests  ±0   61 ✅ ±0   2s ⏱️ ±0s
 1 suites ±0    0 💤 ±0 
 1 files   ±0    0 ❌ ±0 

Results for commit a5ba66a. ± Comparison against base commit fd18316.

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.55%. Comparing base (fd18316) to head (a5ba66a).

Files with missing lines Patch % Lines
internal/oci_image_build.go 0.00% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #174      +/-   ##
==========================================
- Coverage   74.14%   73.55%   -0.59%     
==========================================
  Files          13       13              
  Lines        1257     1267      +10     
==========================================
  Hits          932      932              
- Misses        251      261      +10     
  Partials       74       74              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@antonipp
Copy link
Author

Hmm I didn't know this script existed 🤔 I tried only the buildah / podman commands and I noticed that podman does create a manifest here:
https://github.com/containerd/containerd/blob/2bc9bdbcc0d0337772ff6cde0198bb60ba381ad3/contrib/checkpoint/checkpoint-restore-cri-test.sh#L167
However it doesn't have the expected annotation... so not really sure why it would work (and I haven't tried the script fully yet)

# buildah from scratch
working-container

# buildah add working-container /var/lib/kubelet/checkpoints/checkpoint-anton-test-2-5f95bb8f88-m9jz4_anton-test-anton-test-2025-11-27T13:50:33Z.tar /
b7d8361a1cf7f651362fcc1da996d162130d4e18b3a9ab0849d80dd795439ef4

# buildah config --annotation "org.criu.checkpoint.container.name=anton-test" working-container

# buildah commit working-container localhost/my-checkpoint:test
Getting image source signatures
Copying blob 56fa5bf502b8 skipped: already exists
Copying config 659ca1ade9 done   |
Writing manifest to image destination
659ca1ade93cda871a45f587b7f4ccea58faadb9b42ec529d66b9f2a1a89bf68

# podman save --format oci-archive -o /tmp/checkpoint.tar localhost/my-checkpoint:test
Copying blob 56fa5bf502b8 done
Copying config 659ca1ade9 done
Writing manifest to image destination
Storing signatures

# mkdir -p /tmp/checkpoint-extracted
# tar -xf /tmp/checkpoint.tar -C /tmp/checkpoint-extracted
# cat /tmp/checkpoint-extracted/index.json | jq
{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:c63ea80930bd79695b87c8d935fd2f5e8105164d32486a437cef2c9c62aa1065",
      "size": 621,
      "annotations": {
        "org.opencontainers.image.ref.name": "localhost/my-checkpoint:test"
      }
    }
  ]
}

# DIGEST=$(cat /tmp/checkpoint-extracted/index.json | jq -r '.manifests[0].digest' | sed 's/sha256://')
# cat /tmp/checkpoint-extracted/blobs/sha256/$DIGEST | jq '.annotations'
{
  "org.criu.checkpoint.container.name": "anton-test",
  "org.opencontainers.image.base.digest": "",
  "org.opencontainers.image.base.name": "",
  "org.opencontainers.image.created": "2025-11-27T17:22:29.022158466Z"
}

@adrianreber
Copy link
Member

Ah, so your are exporting a podman container and trying to import it into containerd via CRI? I am pretty sure nobody has tried that and it will not work. The images are using different metadata all over the place. It should be possible to convert it, but it need to be done at many more locations.

@rst0git
Copy link
Member

rst0git commented Nov 28, 2025

The images are using different metadata all over the place. It should be possible to convert it, but it need to be done at many more locations.

We briefly discussed this functionality in the following issue: #130

@antonipp
Copy link
Author

Ah, so your are exporting a podman container and trying to import it into containerd via CRI? I am pretty sure nobody has tried that and it will not work. The images are using different metadata all over the place. It should be possible to convert it, but it need to be done at many more locations.

Not at all, all my tests were done with regular runc containers and containerd 2.1.

I only mentioned podman because it was in the script you shared. Maybe you misunderstood "I tried only the buildah / podman commands" -> I was referring to the script. I just ran these commands from the script and that script uses podman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants