[pull] main from kata-containers:main#30
Open
pull[bot] wants to merge 6500 commits intobergwolf:mainfrom
Open
Conversation
The vhost-user-net tests could hang in CI because VhostUserNet::new_server() blocks indefinitely on listener.accept() when the slave fails to connect in time (e.g. due to scheduler delays or flaky socket paths). This also caused panics when connect_slave() returned None and the test unwrapped it. Fix the tests by: - using a `/tmp`, absolute, unique unix socket path per test run retrying slave connect with a deadline - running new_server() in a separate thread and waiting via recv_timeout() to ensure the test never blocks indefinitely Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Fix warnings from unformattted codes. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Fix warnings of unused imports as below:
```
warning: unused imports: `DEVICE_ACKNOWLEDGE`, `DEVICE_DRIVER_OK`,
`DEVICE_DRIVER`, `DEVICE_FEATURES_OK`, and `DEVICE_INIT`
--> src/dragonball/dbs_pci/src/virtio_pci.rs:1177:9
|
1177 | DEVICE_ACKNOWLEDGE, DEVICE_DRIVER, DEVICE_DRIVER_OK,
DEVICE_FEATURES_OK, DEVICE_INIT,
| ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` (part of `#[warn(unused)]`) on by
default
```
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
warning: unnecessary parentheses around type
--> src/dragonball/dbs_legacy_devices/src/serial.rs:245:39
|
245 | let out: Arc<Mutex<Option<Box<(dyn std::io::Write + Send +
'static)>>>> =
| ^
^
|
= note: `#[warn(unused_parens)]` (part of `#[warn(unused)]`) on by
default
help: remove these parentheses
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The signal_handler test was intermittently failing because it used kill(pid, sig), which sends signals asynchronously to the process. This created a race condition where the child thread could exit and be joined before the signal was delivered or processed. This fix including: 1. Replaces `kill` with `libc::raise` to ensure signals are delivered synchronously to the calling thread. 2. Reorders triggers to verify standard signals before installing seccomp filters. 3. Guarantees that metrics are incremented before the child thread terminates and is joined by the main thread. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
ci: keep mktemp output suffix stable with .yaml
The vhost-kern net unit test used a fixed TAP interface name
("test_vhosttap"). When tests run in parallel or a previous run
leaves the interface behind, TAP creation can fail with
EBUSY ("Resource busy"), making CI flaky.
Introduce a unique_tap_name() helper in the tests and use it to
generate a per-test TAP name (based on pid/thread/counter),
avoiding name collisions and stabilizing CI.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
to simplify analyzing failures let's print the link to the job result next to the status. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
this should produce a table of failed/running jobs as a table along with links to them. On pass it should only produce simple line with how many jobs passed. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
runtime-rs: Fix dragonball's flaky unit tests
Bump the builder image and versions to resolve CVEs: - GO-2026-4601 - GO-2026-4602 - GO-2026-4603 Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Some VMMs support plugging a disk as an image file instead of a block device, so we adapt the runtime to support that. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Aurélien Bombo <abombo@microsoft.com> Co-authored-by: Aurélien Bombo <abombo@microsoft.com>
* Introduces the `emptydir_mode` config flag to allow instructing the runtime to create a block device for emptyDir volumes. * The block device is created in the original emptyDir folder on the host so that Kubelet can monitors its disk usage and evict the pod if it exceeds its sizeLimit. This matches runc and virtio-fs. * The block device's disk image file is sparse to minimize host disk footprint. Fixes: #10560 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Handles block-based emptyDirs plugged via virtio-blk and virtio-scsi by encrypting and formatting them. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
* Introduces a new cluster_config setting encrypted_emptydir defaulting to true. * Adapts genpolicy for encrypted emptyDirs. Crucially, the rules.rego change checks that the mount and the storage are well-formed together: * i_storage.source matches a known regex. * i_storage.mount_point == $(spath)/BASE64(i_storage.source) * i_storage.mount_point == p_storage.mount_point * i_storage.mount_point == i_mount.source Note that policy enforcement is necessary to prevent rogue device injection. E.g. the agent could not blindly encrypt all block devices as some use cases only need dm-verity. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This tests the feature on CoCo machines. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The design moved away from CSI driver so stop deploying that. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
It can be useful to set these variables during local testing: * AZ_REGION: Region for the cluster. * AZ_NODEPOOL_TAGS: Node pool tags for the cluster. * GENPOLICY_BINARY: Path to the genpolicy binary. * GENPOLICY_SETTINGS_DIR: Directory holding the genpolicy settings. I've also made it so that tests_common.sh modifies the duplicated genpolicy-settings.json (used for testing) instead of the original git-tracked one. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Use modern test semantics to ease debugging. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This adds various files that are generated during development. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
k0s uses /var/lib/k0s/kubelet instead of /var/lib/kubelet as its kubelet data directory. Introduce get_kubelet_data_dir() in tests_common.sh and use it in k8s-trusted-ephemeral-data-storage.bats instead of hardcoding /var/lib/kubelet. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
1. Reduce the complexity of the new allow_mount rules for emptyDir. 2. Reverse the order of the two allow_mount versions, as a hint to the rego engine that the first version is more often matching the input. 3. Remove `p_mount.source != ""` from mount_source_allows, because: - Policy rules typically test the values from input, not values read from Policy. - mount_source_allows is no longer called for emptyDir mounts after these changes, so p_mount.source is not empty. Signed-off-by: Dan Mihai <dmihai@microsoft.com>
coco: Implement trusted ephemeral data storage
These are two changes following a Copilot review on #10559: 1. Restore the p_storage.driver != "blk" check in allow_storage_options(): - An early version of #10599 hardcoded p_storage.driver to "blk". - Hence that check needed to be removed to validate "blk" storage options. - The final version of #10599 hardcodes p_storage.driver to "" to account for both "blk" and "scsi", and checks storage options in allow_block_storage(). - Hence that check should be restored to preserve the original behavior. #10559 (comment) 2. Don't use a regex to validate emptyDir storage mount points: - It's risky to use a regex to validate a path that has base64-encoded components. - We can infer the exact path anyway so the regex is redundant. #10559 (comment) Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Before this change, `make test` for runtime-rs used to test all crates in the root workspace (due to the `--all` flag). This was not intended but happened to be mostly working. However, genpolicy needs additional steps before it can build, so this behavior blocks adding genpolicy to the root workspace. The solution here is to only build the inteded packages. For the build and run commands, this is the runtime-rs crate itself. For testing, we need to include the sub-crates, too, which needs a bit of cargo metadata scraping. Signed-off-by: Markus Rudy <mr@edgeless.systems>
This commit adds the genpolicy utility to the root workspace. For now, only dependencies that are already in the root workspace are consumed from there, the genpolicy-specific ones should be added later. Signed-off-by: Markus Rudy <mr@edgeless.systems>
The older version we used transitively depends on an unmaintained crate. Signed-off-by: Markus Rudy <mr@edgeless.systems>
The yaml-rust dependency is unmaintained, but no suitable alternatives exist. We log an exception for this now and will revisit the topic after some time. Signed-off-by: Markus Rudy <mr@edgeless.systems>
Moving the genpolicy crate into the root workspace causes the build outputs to go into the root workspace's target directory, instead of src/tools/genpolicy/target, invalidating assumptions made by the kata-deploy-binaries script. This commit adds a special case for the lookup path of the genpolicy binary, and fixes two bugs that made identifying this problem harder. Signed-off-by: Markus Rudy <mr@edgeless.systems>
When a kata-deploy DaemonSet pod is restarted (e.g. due to a label change or rolling update), the SIGTERM handler runs cleanup which unconditionally removes kata artifacts and restarts containerd. This causes containerd to lose the kata shim binary, crashing all running kata pods on the node. Fix this by implementing a three-stage cleanup decision: 1. If this pod's owning DaemonSet still exists (exact name match via DAEMONSET_NAME env var), this is a pod restart — skip all cleanup. The replacement pod will re-run install, which is idempotent. 2. If this DaemonSet is gone but other kata-deploy DaemonSets still exist (multi-install scenario), perform instance-specific cleanup only (snapshotters, CRI config, artifacts) but skip shared resources (node label removal, CRI restart) to avoid disrupting the other instances. 3. If no kata-deploy DaemonSets remain, perform full cleanup including node label removal and CRI restart. The Helm chart injects a DAEMONSET_NAME environment variable with the exact DaemonSet name (including any multi-install suffix), ensuring instance-aware lookup rather than broadly matching any DaemonSet containing "kata-deploy". Fixes: #12761 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
…anup Add functional tests that cover two previously untested kata-deploy behaviors: 1. Restart resilience (regression test for #12761): deploys a long-running kata pod, triggers a kata-deploy DaemonSet restart via rollout restart, and verifies the kata pod survives with the same UID and zero additional container restarts. 2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses are removed, the kata-runtime node label is cleared, /opt/kata is gone from the host filesystem, and containerd remains healthy. 3. Artifact presence: after install, verifies /opt/kata and the shim binary exist on the host, RuntimeClasses are created, and the node is labeled. Host filesystem checks use a short-lived privileged pod with a hostPath mount to inspect the node directly. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
A FC update caused bad requests for the runtime-rs runtime when specifying the vcpu count and block rate limiter fields. Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
This fix applies the config file value as a fallback when block_device_cache_direct annotation is not explicitly set on the pod. Signed-off-by: PiotrProkop <pprokop@nvidia.com>
replace the deprecated CAA deployment with helm one. Note that this also installs the CAA mutating webhook, which wasn't installed before. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
to run all the tests that are running in CI we need to enable external tests. This can be a bit tricky so add it into our documentation. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
ci.ocp: Use helm deployment for peer-pods
…pods-to-crash-after-containerd-restart kata-deploy: Fix kata-deploy pods crashing if containerd restarts
workflows: Update actions/checkout version
After pod runAsUser triggers passwd-based GID resolution, genpolicy clears AdditionalGids and inserts only the primary GID. PodSecurityContext fsGroup and supplementalGroups get cleared, so policy enforcement would deny CreateContainer when the runtime includes those when specified. This change applies fsGroup/supplementalGroups once in get_container_process via apply_pod_fs_group_and_supplemental_groups. Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The shim uses Storage.fs_group on block/scsi encrypted emptyDir while genpolicy used fsgid= in options and null fs_group, leading to denying CreateContainerRequest when using block-encrypted emptyDir in combination with fsGroup. Thus, emit fs_group in that scenario and keep fsgid= for the existing shared-fs/local emptyDir behavior. Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The logic in the k8s-empty-dirs.bats file missed to add a security policy for the pod-empty-dir-fsgroup.yaml manifest. With this change, we add the policy annotation. Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Do not run the NIM containers with elevated privileges. Note that, using hostPath requires proper host folder permissions, and that using emptyDir requires a proper fsGroup ID. Once issue 11162 is resolved, we can further refine the securityContext fields for the TEE manifests. Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Onboard a test case for deploying a NIM service using the NIM operator. We install the operator helm chart on the fly as this is a fast operation, spinning up a single operand. Once a NIM service is scheduled, the operator creates a deployment with a single pod. For now, the TEE-based flow uses an allow-all policy. In future work, we strive to support generating pod security policies for the scenario where NIM services are deployed and the pod manifest is being generated on the fly. Signed-off-by: Manuel Huber <manuelh@nvidia.com>
runtime-rs: Fix FC API fields
Docker 26+ configures container networking (veth pair, IP addresses, routes) after task creation rather than before. Kata's endpoint scan runs during CreateSandbox, before the interfaces exist, resulting in VMs starting without network connectivity (no -netdev passed to QEMU). Add RescanNetwork() which runs asynchronously after the Start RPC. It polls the network namespace until Docker's interfaces appear, then hotplugs them to QEMU and informs the guest agent to configure them inside the VM. Additional fixes: - mountinfo parser: find fs type dynamically instead of hardcoded field index, fixing parsing with optional mount tags (shared:, master:) - IsDockerContainer: check CreateRuntime hooks for Docker 26+ - DockerNetnsPath: extract netns path from libnetwork-setkey hook args with path traversal protection - detectHypervisorNetns: verify PID ownership via /proc/pid/cmdline to guard against PID recycling - startVM guard: rescan when len(endpoints)==0 after VM start Fixes: #9340 Signed-off-by: llink5 <llink5@users.noreply.github.com>
v0.1.4 has a bugfix for nvrc.log=trace which is now optional. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
workflows: Create workflow to stale issues based on date
There's a typo in the error message which gets prompted when an unsupported share_fs was configured. Fixed shred -> shared. Signed-off-by: Yuting Nie <yuting.nie@spacemit.com>
runtime-rs: Fix typo in share_fs error message
runtime: fix Docker 26+ networking by rescanning after Start
nvrc: Bump to the latest Release
runtime-rs: fix setting directio via config file
When TDX confidential guest support is enabled, set `kernel_irqchip=split` for TDX CVM: ... -machine \ q35,accel=kvm,kernel_irqchip=split,confidential-guest-support=tdx \ ... Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
runtime-rs: Enhance TDX in qemu
2ba0cb0d4a7 did the ground work for using OVMF even for the qemu-nvidia-gpu, but missed actually setting the OVMF path to be used, which we'e fixing now. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
runtime-rs: cleanup config
k3s and rke2 ship containerd 2.2.2, which requires the OCI 1.3.0 drop-in overlay. Move them from the separate OCI 1.2.1 branch into the OCI 1.3.0 condition alongside nvidia-gpu, qemu-snp, qemu-tdx, and custom container engine versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We've switched to nydus there, but never did for the values.yaml. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
…olicy genpolicy: adjust GID after passwd GID handling and set fs_group for encrypted emptyDir volumes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )