Skip to content

[pull] main from kata-containers:main#30

Open
pull[bot] wants to merge 6500 commits intobergwolf:mainfrom
kata-containers:main
Open

[pull] main from kata-containers:main#30
pull[bot] wants to merge 6500 commits intobergwolf:mainfrom
kata-containers:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Oct 10, 2023

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Oct 10, 2023
Apokleos and others added 29 commits March 6, 2026 09:28
The vhost-user-net tests could hang in CI because
VhostUserNet::new_server() blocks indefinitely on listener.accept()
when the slave fails to connect in time
(e.g. due to scheduler delays or flaky socket paths). This also caused
panics when connect_slave() returned None and the test unwrapped it.

Fix the tests by:
- using a `/tmp`, absolute, unique unix socket path per test run
  retrying slave connect with a deadline
- running new_server() in a separate thread and waiting via
  recv_timeout() to ensure the test never blocks indefinitely

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Fix warnings from unformattted codes.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Fix warnings of unused imports as below:
```
warning: unused imports: `DEVICE_ACKNOWLEDGE`, `DEVICE_DRIVER_OK`,
`DEVICE_DRIVER`, `DEVICE_FEATURES_OK`, and `DEVICE_INIT`
    --> src/dragonball/dbs_pci/src/virtio_pci.rs:1177:9
     |
1177 |         DEVICE_ACKNOWLEDGE, DEVICE_DRIVER, DEVICE_DRIVER_OK,
DEVICE_FEATURES_OK, DEVICE_INIT,
     |         ^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^
     |
     = note: `#[warn(unused_imports)]` (part of `#[warn(unused)]`) on by
default
```

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
warning: unnecessary parentheses around type
   --> src/dragonball/dbs_legacy_devices/src/serial.rs:245:39
    |
245 |         let out: Arc<Mutex<Option<Box<(dyn std::io::Write + Send +
'static)>>>> =
    |                                       ^
^
    |
    = note: `#[warn(unused_parens)]` (part of `#[warn(unused)]`) on by
default
help: remove these parentheses

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The signal_handler test was intermittently failing because it used
kill(pid, sig), which sends signals asynchronously to the process.
This created a race condition where the child thread could exit and
be joined before the signal was delivered or processed.

This fix including:
1. Replaces `kill` with `libc::raise` to ensure signals are delivered
   synchronously to the calling thread.
2. Reorders triggers to verify standard signals before installing
   seccomp filters.
3. Guarantees that metrics are incremented before the child thread
   terminates and is joined by the main thread.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
ci: keep mktemp output suffix stable with .yaml
The vhost-kern net unit test used a fixed TAP interface name
("test_vhosttap"). When tests run in parallel or a previous run
leaves the interface behind, TAP creation can fail with
EBUSY ("Resource busy"), making CI flaky.

Introduce a unique_tap_name() helper in the tests and use it to
generate a per-test TAP name (based on pid/thread/counter),
avoiding name collisions and stabilizing CI.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
to simplify analyzing failures let's print the link to the job result
next to the status.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
this should produce a table of failed/running jobs as a table along with
links to them. On pass it should only produce simple line with how many
jobs passed.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
runtime-rs: Fix dragonball's flaky unit tests
Bump the builder image and versions to resolve CVEs:
- GO-2026-4601
- GO-2026-4602
- GO-2026-4603

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Some VMMs support plugging a disk as an image file instead of a block device,
so we adapt the runtime to support that.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Co-authored-by: Aurélien Bombo <abombo@microsoft.com>
 * Introduces the `emptydir_mode` config flag to allow instructing the runtime
   to create a block device for emptyDir volumes.
 * The block device is created in the original emptyDir folder on the host
   so that Kubelet can monitors its disk usage and evict the pod if it exceeds
   its sizeLimit. This matches runc and virtio-fs.
 * The block device's disk image file is sparse to minimize host disk
   footprint.

Fixes: #10560

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Handles block-based emptyDirs plugged via virtio-blk and virtio-scsi by
encrypting and formatting them.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
 * Introduces a new cluster_config setting encrypted_emptydir defaulting to true.
 * Adapts genpolicy for encrypted emptyDirs.

Crucially, the rules.rego change checks that the mount and the storage are
well-formed together:

 * i_storage.source matches a known regex.
 * i_storage.mount_point == $(spath)/BASE64(i_storage.source)
 * i_storage.mount_point == p_storage.mount_point
 * i_storage.mount_point == i_mount.source

Note that policy enforcement is necessary to prevent rogue device injection.
E.g. the agent could not blindly encrypt all block devices as some use cases
only need dm-verity.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This tests the feature on CoCo machines.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The design moved away from CSI driver so stop deploying that.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
It can be useful to set these variables during local testing:

 * AZ_REGION: Region for the cluster.
 * AZ_NODEPOOL_TAGS: Node pool tags for the cluster.
 * GENPOLICY_BINARY: Path to the genpolicy binary.
 * GENPOLICY_SETTINGS_DIR: Directory holding the genpolicy settings.

I've also made it so that tests_common.sh modifies the duplicated
genpolicy-settings.json (used for testing) instead of the original git-tracked
one.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Use modern test semantics to ease debugging.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This adds various files that are generated during development.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
k0s uses /var/lib/k0s/kubelet instead of /var/lib/kubelet as its
kubelet data directory. Introduce get_kubelet_data_dir() in
tests_common.sh and use it in k8s-trusted-ephemeral-data-storage.bats
instead of hardcoding /var/lib/kubelet.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
1. Reduce the complexity of the new allow_mount rules for emptyDir.

2. Reverse the order of the two allow_mount versions, as a hint to the
   rego engine that the first version is more often matching the input.

3. Remove `p_mount.source != ""` from mount_source_allows, because:
 - Policy rules typically test the values from input, not values read
   from Policy.
 - mount_source_allows is no longer called for emptyDir mounts after
   these changes, so p_mount.source is not empty.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
coco: Implement trusted ephemeral data storage
These are two changes following a Copilot review on #10559:

1. Restore the p_storage.driver != "blk" check in allow_storage_options():
   - An early version of #10599 hardcoded p_storage.driver to "blk".
   - Hence that check needed to be removed to validate "blk" storage options.
   - The final version of #10599 hardcodes p_storage.driver to "" to
     account for both "blk" and "scsi", and checks storage options in
     allow_block_storage().
   - Hence that check should be restored to preserve the original behavior.

#10559 (comment)

2. Don't use a regex to validate emptyDir storage mount points:
   - It's risky to use a regex to validate a path that has base64-encoded
     components.
   - We can infer the exact path anyway so the regex is redundant.

#10559 (comment)

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Before this change, `make test` for runtime-rs used to test all crates
in the root workspace (due to the `--all` flag). This was not intended
but happened to be mostly working. However, genpolicy needs additional
steps before it can build, so this behavior blocks adding genpolicy to
the root workspace.

The solution here is to only build the inteded packages. For the build
and run commands, this is the runtime-rs crate itself. For testing, we
need to include the sub-crates, too, which needs a bit of cargo metadata
scraping.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
This commit adds the genpolicy utility to the root workspace. For now,
only dependencies that are already in the root workspace are consumed
from there, the genpolicy-specific ones should be added later.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
The older version we used transitively depends on an unmaintained crate.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
The yaml-rust dependency is unmaintained, but no suitable alternatives
exist. We log an exception for this now and will revisit the topic after
some time.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
Moving the genpolicy crate into the root workspace causes the build
outputs to go into the root workspace's target directory, instead of
src/tools/genpolicy/target, invalidating assumptions made by the
kata-deploy-binaries script.

This commit adds a special case for the lookup path of the genpolicy
binary, and fixes two bugs that made identifying this problem harder.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
fidencio and others added 30 commits April 1, 2026 15:20
When a kata-deploy DaemonSet pod is restarted (e.g. due to a label
change or rolling update), the SIGTERM handler runs cleanup which
unconditionally removes kata artifacts and restarts containerd. This
causes containerd to lose the kata shim binary, crashing all running
kata pods on the node.

Fix this by implementing a three-stage cleanup decision:

1. If this pod's owning DaemonSet still exists (exact name match via
   DAEMONSET_NAME env var), this is a pod restart — skip all cleanup.
   The replacement pod will re-run install, which is idempotent.

2. If this DaemonSet is gone but other kata-deploy DaemonSets still
   exist (multi-install scenario), perform instance-specific cleanup
   only (snapshotters, CRI config, artifacts) but skip shared
   resources (node label removal, CRI restart) to avoid disrupting
   the other instances.

3. If no kata-deploy DaemonSets remain, perform full cleanup including
   node label removal and CRI restart.

The Helm chart injects a DAEMONSET_NAME environment variable with the
exact DaemonSet name (including any multi-install suffix), ensuring
instance-aware lookup rather than broadly matching any DaemonSet
containing "kata-deploy".

Fixes: #12761

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
…anup

Add functional tests that cover two previously untested kata-deploy
behaviors:

1. Restart resilience (regression test for #12761): deploys a
   long-running kata pod, triggers a kata-deploy DaemonSet restart via
   rollout restart, and verifies the kata pod survives with the same
   UID and zero additional container restarts.

2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses
   are removed, the kata-runtime node label is cleared, /opt/kata is
   gone from the host filesystem, and containerd remains healthy.

3. Artifact presence: after install, verifies /opt/kata and the shim
   binary exist on the host, RuntimeClasses are created, and the node
   is labeled.

Host filesystem checks use a short-lived privileged pod with a
hostPath mount to inspect the node directly.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
A FC update caused bad requests for the runtime-rs runtime when
specifying the vcpu count and block rate limiter fields.

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
This fix applies the config file value as a fallback when block_device_cache_direct annotation is not explicitly set on the pod.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
replace the deprecated CAA deployment with helm one. Note that this also
installs the CAA mutating webhook, which wasn't installed before.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
to run all the tests that are running in CI we need to enable external
tests. This can be a bit tricky so add it into our documentation.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
ci.ocp: Use helm deployment for peer-pods
…pods-to-crash-after-containerd-restart

kata-deploy: Fix kata-deploy pods crashing if containerd restarts
workflows: Update actions/checkout version
After pod runAsUser triggers passwd-based GID resolution, genpolicy
clears AdditionalGids and inserts only the primary GID.
PodSecurityContext fsGroup and supplementalGroups get cleared, so
policy enforcement would deny CreateContainer when the runtime
includes those when specified.

This change applies fsGroup/supplementalGroups once in
get_container_process via apply_pod_fs_group_and_supplemental_groups.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The shim uses Storage.fs_group on block/scsi encrypted emptyDir while
genpolicy used fsgid= in options and null fs_group, leading to
denying CreateContainerRequest when using block-encrypted emptyDir in
combination with fsGroup. Thus, emit fs_group in that scenario and keep
fsgid= for the existing shared-fs/local emptyDir behavior.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The logic in the k8s-empty-dirs.bats file missed to add a security
policy for the pod-empty-dir-fsgroup.yaml manifest. With this change,
we add the policy annotation.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Do not run the NIM containers with elevated privileges. Note that,
using hostPath requires proper host folder permissions, and that
using emptyDir requires a proper fsGroup ID.
Once issue 11162 is resolved, we can further refine the securityContext
fields for the TEE manifests.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Onboard a test case for deploying a NIM service using the NIM
operator. We install the operator helm chart on the fly as this is
a fast operation, spinning up a single operand. Once a NIM service
is scheduled, the operator creates a deployment with a single pod.

For now, the TEE-based flow uses an allow-all policy. In future
work, we strive to support generating pod security policies for the
scenario where NIM services are deployed and the pod manifest is
being generated on the fly.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
runtime-rs: Fix FC API fields
Docker 26+ configures container networking (veth pair, IP addresses,
routes) after task creation rather than before. Kata's endpoint scan
runs during CreateSandbox, before the interfaces exist, resulting in
VMs starting without network connectivity (no -netdev passed to QEMU).

Add RescanNetwork() which runs asynchronously after the Start RPC.
It polls the network namespace until Docker's interfaces appear, then
hotplugs them to QEMU and informs the guest agent to configure them
inside the VM.

Additional fixes:
- mountinfo parser: find fs type dynamically instead of hardcoded
  field index, fixing parsing with optional mount tags (shared:,
  master:)
- IsDockerContainer: check CreateRuntime hooks for Docker 26+
- DockerNetnsPath: extract netns path from libnetwork-setkey hook
  args with path traversal protection
- detectHypervisorNetns: verify PID ownership via /proc/pid/cmdline
  to guard against PID recycling
- startVM guard: rescan when len(endpoints)==0 after VM start

Fixes: #9340

Signed-off-by: llink5 <llink5@users.noreply.github.com>
v0.1.4 has a bugfix for nvrc.log=trace which is now
optional.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
workflows: Create workflow to stale issues based on date
There's a typo in the error message which gets prompted when an
unsupported share_fs was configured. Fixed shred -> shared.

Signed-off-by: Yuting Nie <yuting.nie@spacemit.com>
runtime-rs: Fix typo in share_fs error message
runtime: fix Docker 26+ networking by rescanning after Start
nvrc: Bump to the latest Release
runtime-rs: fix setting directio via config file
When TDX confidential guest support is enabled, set `kernel_irqchip=split`
for TDX CVM:
...
-machine \
   q35,accel=kvm,kernel_irqchip=split,confidential-guest-support=tdx \
...

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
runtime-rs: Enhance TDX in qemu
2ba0cb0d4a7 did the ground work for using OVMF even for the
qemu-nvidia-gpu, but missed actually setting the OVMF path to be used,
which we'e fixing now.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
k3s and rke2 ship containerd 2.2.2, which requires the OCI 1.3.0
drop-in overlay. Move them from the separate OCI 1.2.1 branch into
the OCI 1.3.0 condition alongside nvidia-gpu, qemu-snp, qemu-tdx,
and custom container engine versions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We've switched to nydus there, but never did for the values.yaml.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
…olicy

genpolicy: adjust GID after passwd GID handling and set fs_group for encrypted emptyDir volumes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.