Skip to content

Conversation

@maryamtahhan
Copy link
Collaborator

@maryamtahhan maryamtahhan commented Nov 25, 2025

depends on #75

TODO:

  • unit tests
  • tests with cosign v3
  • check that the status from 'kyverno.io/verify-images' is a pass
  • Support no image verification configuration scenario - maybe this is the default for compatibility with kserve
  • Separate cosign v2 and v3 examples for GKMCache Resources
  • Support cosign v2 and v3 verification in the ClusterGKMCache Webhook
  • update to go version 1.25

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates Kyverno for image signature verification and digest mutation, replacing the internal cosign-based verification logic. It also adds support for building without GPU packages for development environments.

  • Replaces internal image signature verification with Kyverno policy-based verification
  • Adds webhook reinvocationPolicy to ensure GKM webhook runs after Kyverno mutations
  • Introduces NO_GPU_BUILD flag for building agent images without ROCm packages

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/utils/contants.go Removes unused mutation signature annotations, adds Kyverno annotation constant
examples/namespace/00-kyverno.yaml Adds sample Kyverno policy for GKMCache with imageExtractors configuration
config/webhook/webhook_timeout_patch.yaml Adds reinvocationPolicy to ensure webhooks run after Kyverno
config/webhook/README.md Documents the manual reinvocationPolicy requirement
config/kyverno/verify-clustergkmcache-images.yaml Kyverno policy for ClusterGKMCache image verification
config/kyverno/values.yaml Helm values for deploying Kyverno with GPU tolerations
config/kyverno/README.md Documentation for Kyverno integration and usage
config/configMap/kustomization.yaml Adds gkm.nogpu configuration option
api/v1alpha1/gkmcache_webhook.go Removes cosign verification, extracts digest from Kyverno-mutated images
api/v1alpha1/clustergkmcache_webhook.go Same changes as gkmcache_webhook.go for cluster-scoped resources
README.md Documents NO_GPU_BUILD option for KIND deployments
Makefile Adds Kyverno deployment targets and NO_GPU_BUILD support, removes webhook secret handling
Containerfile.gkm-agent Conditionally installs ROCm packages based on NO_GPU flag
Comments suppressed due to low confidence (3)

api/v1alpha1/gkmcache_webhook.go:1

  • When extractedDigest is empty (no digest found in image), the variable digest remains uninitialized. Line 66 then compares an uninitialized digest with resolvedDigest, and line 73 attempts to set the annotation to an empty string. This could cause issues if Kyverno hasn't mutated the image yet. Consider returning an error or waiting for Kyverno mutation if no digest is found.
package v1alpha1

api/v1alpha1/clustergkmcache_webhook.go:1

  • Same issue as in gkmcache_webhook.go: when extractedDigest is empty (no digest found in image), the variable digest remains uninitialized. Line 64 then compares an uninitialized digest with resolvedDigest, and line 71 attempts to set the annotation to an empty string. This could cause issues if Kyverno hasn't mutated the image yet. Consider returning an error or waiting for Kyverno mutation if no digest is found.
package v1alpha1

Makefile:1

  • The path 0config/secret/mutation.env appears to have a typo - it should be config/secret/mutation.env (without the leading 0). However, this target is being removed in the diff, so this is already fixed.
# VERSION defines the project version for the bundle.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@maryamtahhan maryamtahhan force-pushed the test-kyverno branch 2 times, most recently from 254dd10 to 6845261 Compare December 1, 2025 12:50
@maryamtahhan maryamtahhan force-pushed the test-kyverno branch 2 times, most recently from 85658aa to 3093737 Compare December 1, 2025 16:22
@maryamtahhan
Copy link
Collaborator Author

kyverno fails with cosign v3 signed images

kubectl create -f examples/namespace/11-gkmcache-2.yaml 
Error from server: error when creating "examples/namespace/11-gkmcache-2.yaml": admission webhook "mutate.kyverno.svc-fail" denied the request: 

resource GKMCache/gkm-test-ns-scoped-1/cosign-v3-cache-1 was blocked due to the following policies 

verify-gkmcache-images:
  verify-and-mutate-image: 'failed to verify image quay.io/mtahhan/vllm-flash-attention:rocm:
    .attestors[0].entries[0].keyless: no signatures found'

@maryamtahhan
Copy link
Collaborator Author

kyverno fails with cosign v3 signed images

kubectl create -f examples/namespace/11-gkmcache-2.yaml 
Error from server: error when creating "examples/namespace/11-gkmcache-2.yaml": admission webhook "mutate.kyverno.svc-fail" denied the request: 

resource GKMCache/gkm-test-ns-scoped-1/cosign-v3-cache-1 was blocked due to the following policies 

verify-gkmcache-images:
  verify-and-mutate-image: 'failed to verify image quay.io/mtahhan/vllm-flash-attention:rocm:
    .attestors[0].entries[0].keyless: no signatures found'

This is fixed with the latest examples. Cosign v2 and v3 examples tested with GKMCache resources

@maryamtahhan maryamtahhan force-pushed the test-kyverno branch 3 times, most recently from ca505fb to 47e4eef Compare January 6, 2026 13:19
maryamtahhan and others added 8 commits January 7, 2026 13:16
Add Kyverno policy engine deployment with GPU node tolerations for the
kind-gpu-sim cluster. This enables image verification and policy
enforcement for ClusterGKMCache resources.

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
TODO:
- unit tests
- tests with cosign v3
- check that the status from 'kyverno.io/verify-images'
  is a pass

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Webhooks now check the kyverno.io/verify-images annotation to ensure:
- Image signature verification status is 'pass'
- The verified SHA digest matches the expected digest

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Added KYVERNO_ENABLED Makefile variable (defaults to true) to control
Kyverno verification at deployment time, following the same pattern
as NO_GPU. Usage examples:
  make deploy                          # Kyverno enabled (default)
  make deploy KYVERNO_ENABLED=false    # Kyverno disabled
  make deploy-on-kind KYVERNO_ENABLED=false  # Disable on Kind cluster

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Added KYVERNO_VERIFICATION_ENABLED environment variable to control
signature verification behavior:
- When enabled (default): Full verification with Kyverno
  annotation checking
- When disabled: Skip signature verification, only resolve image
  digest (for development/testing)

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Centralized Kyverno policy management and improved deployment workflow:

- **Policy Organization**:
  - Moved ClusterPolicy definitions from examples/ to config/kyverno/policies/
  - Created kustomization.yaml for unified policy deployment
  - Added MOVED.md files in examples/ directories to guide users

- **Makefile Enhancements**:
  - Added deploy-kyverno-policies and undeploy-kyverno-policies targets
  - Enhanced deploy-kyverno with conditional NO_GPU flag support
  - Modified run-on-kind to automatically deploy Kyverno when KYVERNO_ENABLED=true
  - Fixed CRD discovery timing: deploy GKM first, then Kyverno, then restart

- **Documentation**:
  - Rewrote config/kyverno/README.md with clearer deployment instructions
  - Documented automatic deployment workflow with run-on-kind
  - Added details about KYVERNO_VERIFICATION_ENABLED environment variable

This change ensures Kyverno's webhook controller discovers GKM CRDs correctly
and provides a cleaner separation between example resources and policy definitions.

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Replaced individual MOVED.md files with a comprehensive guide:

- Removed examples/cluster/00-kyverno-MOVED.md
- Removed examples/namespace/00-kyverno-MOVED.md
- Created docs/examples/kyverno-policies.md with:
  - Policy location migration details
  - Deployment instructions
  - Policy verification requirements
  - Image mutation examples
  - Runtime control documentation

Provides a single source of truth for Kyverno policy documentation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…pport

This commit introduces a label-based approach to support both Cosign v2
(legacy) and v3 (bundle) signature formats, allowing GKM to work with
images signed using either method.

Key changes:
- Split Kyverno policy into v2 and v3 variants with label selectors
- Added gkm.io/signature-format label to control verification method
- Updated webhook naming (z- prefix) to ensure Kyverno runs before GKM
- Added reinvocationPolicy=Never to GKM mutating webhook
- Reorganized policy files with clear v2/v3 naming
- Created comprehensive image verification documentation
- Updated examples with appropriate labels for each format

Webhook execution order:
1. Kyverno (mutate.kyverno.svc-fail) - verifies and mutates images
2. GKM (z-mgkmcache.kb.io) - validates Kyverno annotation and processes

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
@maryamtahhan maryamtahhan force-pushed the test-kyverno branch 2 times, most recently from ce4e9c6 to 4d04ae8 Compare January 7, 2026 13:25
@maryamtahhan maryamtahhan marked this pull request as ready for review January 7, 2026 13:25
…v2/v3 support

This commit removes ClusterGKMCache's dependency on Kyverno and implements
built-in signature verification directly in the admission webhook with
automatic Cosign v2 and v3 format detection.

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
@@ -1,5 +1,5 @@
# Build the agent binary
FROM public.ecr.aws/docker/library/golang:1.24.4 AS builder
FROM public.ecr.aws/docker/library/golang:1.25.0 AS builder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use golang:1.25 or stick with golang:1.25.0 for known consistent builds?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the automatic updates might be nice - not sure, what do you think?


var (
gkmcachelog = logf.Log.WithName("webhook-ns")
gkmcacheLog = logf.Log.WithName("webhook-ns")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird spacing, is the linter doing it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure

This commit addresses GLIBC compatibility issues and corrects Kyverno
Cosign v3 policy configuration for proper image signature verification.

Container Updates:
- Upgrade gkm-agent and gkm-operator base images from Ubuntu 22.04 to
  24.04 to resolve GLIBC 2.38 compatibility issues
- Update ROCm repository URL from 'jammy' to 'noble' for Ubuntu 24.04
- Fixes gkm-agent CrashLoopBackOff caused by GLIBC version mismatch

Kyverno Policy Updates:
- Fix Cosign v3 policy issuer: use GitHub Actions token issuer
  (https://token.actions.githubusercontent.com) instead of OAuth issuer
- Replace specific subject email with regex pattern to match any GitHub
  workflow (subjectRegExp: "https://github.com/.*")
- Aligns with actual signatures generated by GitHub Actions workflows

Documentation Updates:
- Add new Documentation section to main README with links to config docs
- Document gkm.io/signature-format label usage (cosign-v2 and cosign-v3)
- Add example showing how to use signature format labels
- Update Kyverno v3 policy documentation with correct issuer/subject
- Add links to Kyverno and webhook configuration READMEs

Example Updates:
- Update example files to use correct signature format labels
- Ensure consistency across namespace and cluster examples

These changes enable successful verification of Cosign v3 bundle format
signatures and resolve runtime issues in KIND deployments.

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
maryamtahhan and others added 3 commits January 23, 2026 13:25
Replace hardcoded 30*time.Second timeout values with named constants
to improve code maintainability and make timeout values easier to
configure in the future.

Changes:
- Add ImageVerificationTimeout constant in clustergkmcache_webhook.go
  for ClusterGKMCache webhook operations
- Add DefaultVerificationTimeout constant in pkg/cosign/verify.go
  for general image verification operations
- Update all usages to reference the new constants
- Remove unused time import from verify_test.go

Both constants are set to 30 seconds to accommodate Cosign v3 bundle
verification which can take 15-20 seconds.

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
…gurable

Separate Kyverno Helm values into environment-specific files to support
both production GPU environments and Kind/simulated GPU clusters.

Changes:
- Split values.yaml into two files:
  - values.yaml: Default minimal configuration for production GPU environments
  - values-no-gpu.yaml: Configuration with GPU nodeSelector and tolerations
    for Kind/simulated GPU clusters
- Update Makefile deploy-kyverno target to select appropriate values file
  based on NO_GPU environment variable
- Update config/kyverno/README.md with detailed documentation about:
  - Configuration file purposes and usage
  - Automatic file selection based on NO_GPU variable
  - When to use each configuration

Benefits:
- Production deployments no longer require GPU nodeSelector/tolerations
- Kind/simulated GPU deployments maintain necessary scheduling constraints
- Clear separation of concerns between environments
- Better documentation for configuration choices

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants