feat(autoconfig): inject Agent discovery augmentations #1218

andrewazores · 2026-01-16T15:30:49Z

Welcome to Cryostat! 👋

Before contributing, make sure you have:

Read the contributing guidelines
Linked a relevant issue which this PR resolves
Linked any other relevant issues, PR's, or documentation, if any
Resolved all conflicts, if any
Rebased your branch PR on top of the latest upstream main branch
Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
Signed all commits: git commit -S -m "YOUR_COMMIT_MESSAGE"

Disclaimer: This PR was written with LLM assistance

Fixes: #1068
See cryostatio/cryostat#635 (comment)
Depends on cryostatio/cryostat-agent#779

Description of the change:

Adds additional capability to the Agent autoconfig pod mutation webhook so that the Pod's ownership lineage up to the Namespace is determined (following the same algorithm as https://github.com/cryostatio/cryostat/blob/main/src/main/java/io/cryostat/discovery/KubeEndpointSlicesDiscovery.java) and written into a JSON object hierarchy.json in a new ConfigMap. The Pod labels are also written into a metadata.json in the same ConfigMap. This ConfigMap is then mounted to the autoconfigured Pod so that the Agent can pick up this augmented Discovery information.

Since the ConfigMap is created before the Pod it cannot have an OwnerReference on the Pod initially, but it only makes sense that the Pod's lifecycle defines the ConfigMap's lifecycle too, so a new controller watch is installed that watches for ConfigMaps created by the webhook and attempts to assign the owner reference to the associated Pod once it's ready.

Motivation for the change:

See cryostatio/cryostat-agent#779

How to manually test:

Check out and build Agent PR and init image, Cryostat server w/ web frontend, and Operator locally, or use prebuilt images:

$ export OPERATOR_IMG=quay.io/andrewazores/cryostat-operator:4.2.0-k8s-ergo-16
$ export CORE_IMG=quay.io/andrewazores/cryostat:4.2.0-k8s-ergo-5
$ export AGENT_INIT_IMG=quay.io/andrewazores/cryostat-agent-init:0.7.0-k8s-ergo-7

Build Operator locally, optionally with OCP bundle mode
Deploy Operator, create a Cryostat CR, deploy a sample application
Edit sample application Pod template metadata labels for Agent autoconfig, or in OCP bundle mode on OpenShift use the Console Plugin UI
Verify results, should look like screenshots in feat(discovery): handle hierarchical plugin publication, implement querying of merged realms cryostat#1249

Implements data structures and algorithms for cryostatio/cryostat-agent#477. Matches Cryostat's DiscoveryNode model with recursive owner reference traversal.

Implements complete hierarchy building from Namespace down to Pod. Matches Cryostat's discovery model without Realm wrapper (server adds that). Includes 5 new tests covering full chains, standalone pods, and label preservation.

Agent will apply these to its internal cryostat and platform maps. Reverts to simpler structure per clarification on metadata.json format.

Creates ConfigMap with hierarchy.json and metadata.json for Agent to read. ConfigMap is owned by Pod for automatic cleanup. Includes 5 tests with deep hierarchy validation and comprehensive assertions.

- Mount discovery ConfigMap as volume at /opt/cryostat.d/conf.d/discovery - Create ConfigMap without owner reference (Pod doesn't exist yet) - Update test expectations to include discovery volume and mount

- Change from /opt/cryostat.d/conf.d/discovery to /tmp/cryostat-agent/discovery - Matches cryostat-agent's default discovery path - Update documentation to reflect correct path

- Create DiscoveryConfigMapReconciler to watch discovery ConfigMaps - Automatically add owner references linking ConfigMaps to their Pods - Enables automatic cleanup when Pods are deleted

When pod.Name is empty during webhook mutation, the ConfigMap name was being generated as 'cryostat-agent-discovery-' (ending with a hyphen), which violates Kubernetes RFC 1123 naming rules. This fix: - Uses pod.Name when available - Falls back to pod.GenerateName with a random suffix when pod.Name is empty - Returns an error if neither Name nor GenerateName is set - Ensures truncated names don't end with hyphens or dots - Reuses common.DefaultOSUtils.GenPasswd() for random string generation Fixes the runtime error: ConfigMap "cryostat-agent-discovery-" is invalid: metadata.name: Invalid value: "cryostat-agent-discovery-": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character

github-actions · 2026-01-16T15:31:02Z

This PR/issue depends on:

feat(metadata): implement optional mounted metadata/hierarchy files cryostat-agent#779
By Dependent Issues (🤖). Happy coding!

…roller

andrewazores · 2026-01-16T19:59:12Z

/build_test

github-actions · 2026-01-16T20:13:46Z

/build_test completed successfully ✅.
View Actions Run.

ebaron

What are your thoughts about storing this information in a pod annotation(s) instead of a new config map? I'm just concerned about the amount of config map objects this would create. Annotations would be less complex for the operator to manage as well.

andrewazores · 2026-01-16T21:16:57Z

This just creates one ConfigMap for each autoconfig'd Pod, but placing the information into annotations would eliminate the complexity of creating the ConfigMap first and then asynchronously watching for the corresponding Pod to set its owner reference, so that would be nice. This didn't even occur to me as it seemed like the node hierarchy data would be too much to fit into annotations, but:

https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/

The metadata in an annotation can be small or large, structured or unstructured, and can include characters not permitted by labels.

I can't find any documentation about what "large" annotation metadata would be or if there is any concrete limit here. Google searches do suggest a 256KiB limit but maybe that's untrue, or was true for some older Kubernetes version? That's quite large and I don't think in practice this feature should run into that.

The Kubernetes docs don't specify a size limit, but the implementation does have a limit of 256KB: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/api/validation/objectmeta.go#L44-L67 . That's quite large and I don't think in practice this feature should run into that. But it would be possible, especially if the user or other tooling in the cluster sets other large metadata on the Pod or any of the other nodes up to the Namespace, since the hierarchy file includes a JSON representation of all of those nodes including their metadata.

I'll give that a try, since Pod metadata can be exposed as volumes by the Downward API: https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/ . At the very least this can simplify the autoconfig metadata, since the existing metadata can just be mounted as a file in the Pod without needing to marshal the metadata.json and put that into the ConfigMap.

andrewazores · 2026-01-16T21:45:52Z

At the very least this can simplify the autoconfig metadata, since the existing metadata can just be mounted as a file in the Pod without needing to marshal the metadata.json and put that into the ConfigMap.

Actually this might not be so simple - the Agent needs it to be JSON because it doesn't have the Jackson module for parsing YAML. Mounting the Pod metadata labels/annotations via Downward API is going to result in YAML files. So it looks like the Operator would need to read the metadata labels/annotations and marshal that to JSON to supply to the Agent.

But then we have a circular problem where we want to read the Pod's metadata YAML, marshal that to JSON, then store that back into the Pod metadata to mount as a file via Downward API. I think we'll end up in an infinite loop here of embedding the updated metadata back into a new update.

Hmm...

Or else we can add the dependency to the Agent so that it can parse YAML, and it should become possible to just use the existing Downward API to mount the Pod metadata as files that the Agent can read. If the hierarchy information is included in the Pod annotations (set there by the Operator webhook) then this would all be in one metadata YAML file for the Agent to process. Do you think that sounds about right @ebaron ?

ebaron · 2026-01-16T22:14:28Z

It looks like the label/annotations files from the downward APIs aren't YAML but simple key-value pairs separated by line:
https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/

andrewazores · 2026-01-16T22:20:13Z

For annotations, which can include large and structured data, what would happen? The value of an annotation can itself be stringified JSON, for example, and might even include newlines:

- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.217.1.147/23"],"mac_address":"0a:58:0a:d9:01:93","gateway_ips":["10.217.0.1"],"routes":[{"dest":"10.217.0.0/22","nextHop":"10.217.0.1"},{"dest":"10.217.4.0/23","nextHop":"10.217.0.1"},{"dest":"169.254.0.5/32","nextHop":"10.217.0.1"},{"dest":"100.64.0.0/16","nextHop":"10.217.0.1"}],"ip_address":"10.217.1.147/23","gateway_ip":"10.217.0.1","role":"primary"}}'
      k8s.v1.cni.cncf.io/network-status: |-
        [{
            "name": "ovn-kubernetes",
            "interface": "eth0",
            "ips": [
                "10.217.1.147"
            ],
            "mac": "0a:58:0a:d9:01:93",
            "default": true,
            "dns": {}
        }]

(this is real data taken from a oc get -o yaml invocation - the first annotation has no newlines, the second one does)

Maybe I need to just try it out and see how this kind of annotation gets written, but since annotations are documented as potentially containing structured data I think that means the value side is allowed to have significant whitespace, ie the annotation value could itself be a YAML document. In that case, I don't think it's possible that this can be exposed as a simple key=value file format.

andrewazores · 2026-01-16T22:26:57Z

https://kubernetes.io/docs/concepts/workloads/pods/downward-api/#downwardapi-fieldRef

metadata.annotations
all of the pod's annotations, formatted as annotation-key="escaped-annotation-value" with one annotation per line

I wonder what the escape format is here. We could just leave it escaped, but then the annotations are going to appear escaped within the Cryostat API and UI, so they won't match up directly with the k8s view of things, which could be annoying/frustrating for the user.

ebaron · 2026-01-16T22:40:55Z

https://kubernetes.io/docs/concepts/workloads/pods/downward-api/#downwardapi-fieldRef

metadata.annotations
all of the pod's annotations, formatted as annotation-key="escaped-annotation-value" with one annotation per line

I wonder what the escape format is here. We could just leave it escaped, but then the annotations are going to appear escaped within the Cryostat API and UI, so they won't match up directly with the k8s view of things, which could be annoying/frustrating for the user.

If I read the code correctly, it uses strconv.AppendQuote on the annotation value:
https://github.com/kubernetes/kubernetes/blob/6817bea700feddae1fb77c5ef60b557c3d1349b8/pkg/fieldpath/fieldpath.go#L48
Looks like that only escapes double quotes? https://pkg.go.dev/strconv#example-AppendQuote

ebaron · 2026-01-16T22:42:29Z

Ah, it's a little more than double quotes: https://pkg.go.dev/strconv#Quote

andrewazores · 2026-01-16T23:04:16Z

I could just go ahead with it anyway, but I'm still worried that in the future either we or some user are going to have a reason to want to process these bits of metadata, and it'll be annoying that it's strconv escaped when most of everything else involved is Java-based.

ebaron · 2026-01-19T16:24:49Z

I could just go ahead with it anyway, but I'm still worried that in the future either we or some user are going to have a reason to want to process these bits of metadata, and it'll be annoying that it's strconv escaped when most of everything else involved is Java-based.

I'm just pretty concerned about the one config map per pod. I worry this could upset users, especially those with many replicas. Some of which might have quotas in place for the number of objects a specific type: https://kubernetes.io/docs/concepts/policy/resource-quotas/#quota-on-object-count. Maybe I'm overthinking it, what do you think?

An alternative implementation I had in mind was to use the downward API to get the pod name and namespace, send those to Cryostat as part of the registration process and then Cryostat can use that information to run through the existing Kubernetes discovery code path to fill in the ownership hierarchy and labels/annotations.

andrewazores · 2026-01-19T16:33:18Z

I'm just pretty concerned about the one config map per pod. I worry this could upset users, especially those with many replicas. Some of which might have quotas in place for the number of objects a specific type: https://kubernetes.io/docs/concepts/policy/resource-quotas/#quota-on-object-count. Maybe I'm overthinking it, what do you think?

That's fair, it's possible this could be an issue too.

An alternative implementation I had in mind was to use the downward API to get the pod name and namespace, send those to Cryostat as part of the registration process and then Cryostat can use that information to run through the existing Kubernetes discovery code path to fill in the ownership hierarchy and labels/annotations.

That's an interesting idea too, and it definitely could be done that way. How would Cryostat know that it needs to fill in the k8s ownership hierarchy for these particular Agents? Would we just hardcode in some assumption that if a discovery plugin's published tree is a list of Pod-wrapped nodes that Cryostat should apply the hierarchy-filling to those nodes? Conceptually I'd prefer if discovery plugins published the complete hierarchy themselves, but for the Agent specifically we also want to avoid additional dependencies and complexity, so these things are kind of at odds with each other.

andrewazores · 2026-01-19T16:51:21Z

Here's another idea - what if the Operator created one ConfigMap per Namespace (or one per NS which contains at least one autoconfig'd Deployment), and then updated that ConfigMap's contents whenever the set of autoconfig'd Pods changed? Then it could mount the ConfigMap to the Pods, but only the particular relevant files within the ConfigMap - so maybe the files would have a naming scheme including the Pod's name to make them unique. The data within the ConfigMap is probably not that large per Pod, so this one ConfigMap itself shouldn't end up being too large unless there are really a lot of replicas or some node in the hierarchy has a lot of metadata (ex. very large annotations).

andrewazores · 2026-01-19T17:09:41Z

Or another refinement to that idea:

ConfigMap contains a deployment-xx.json hierarchy file for each autoconfig'd Deployment, which basically just has a model of the Deployment node as a child of the Namespace node. Both would contain all metadata etc.
ConfigMap also contains a 'pod-xx.json' for each autoconfig'd Pod resulting from those Deployments, which has a model of the Pod and whatever node(s) are between it and the parent Deployment (ex. ReplicaSet).
and maybe we ditch the metadata.json, since in practice this just contains another copy of the metadata already present on the Pod node. The Agent side of the implementation can be updated to merge its Target self-node definition with the innermost hierarchy node, rather than adding the self-node as a child of the innermost node, and the Operator can fill in a stub of that Target node containing this metadata.

This results in a few more files within the ConfigMap, but reduces the duplication of data across each file so the overall ConfigMap size will be smaller. In practice ConfigMap sizes are limited to 1MB by etcd, so that should scale up to a large number of replicas just fine unless there is some really large metadata attached to some nodes.

For example, this hierarchy from Pod to Namespace is only 514 bytes (without prettified formatting):

{
  "name": "production-namespace",
  "nodeType": "Namespace",
  "labels": {
    "kubernetes.io/metadata.name": "production",
    "environment": "production"
  },
  "children": [
    {
      "name": "my-app-deployment",
      "nodeType": "Deployment",
      "labels": {
        "app": "my-application",
        "app.kubernetes.io/name": "my-app",
        "app.kubernetes.io/version": "1.0.0",
        "app.kubernetes.io/component": "backend"
      },
      "children": [
        {
          "name": "my-app-pod-abc123",
          "nodeType": "Pod",
          "labels": {
            "app": "my-application",
            "pod-template-hash": "abc123",
            "controller-revision-hash": "my-app-7d9f8c6b5"
          }
        }
      ]
    }
  ]
}

so even if we assume in practice there is more metadata in the hierarchy that pushes it up to 1KB per Pod, the ConfigMap can still support up to 1000 autoconfig'd Pods per Namespace, and even more than that if we split it up and deduplicate the Namespace/Deployment from each hierarchy file.

andrewazores · 2026-01-19T17:15:49Z

A hierarchy.json taken from the current implementation on a simple sample application:

{
  "name": "apps1",
  "nodeType": "Namespace",
  "labels": {
    "kubernetes.io/metadata.name": "apps1",
    "pod-security.kubernetes.io/audit": "restricted",
    "pod-security.kubernetes.io/audit-version": "latest",
    "pod-security.kubernetes.io/warn": "restricted",
    "pod-security.kubernetes.io/warn-version": "latest"
  },
  "children": [
    {
      "name": "quarkus-test",
      "nodeType": "Deployment",
      "labels": {
        "app": "quarkus-test"
      },
      "children": [
        {
          "name": "quarkus-test-7df7856846",
          "nodeType": "ReplicaSet",
          "labels": {
            "app": "quarkus-test",
            "cryostat.io/name": "cryostat-sample",
            "cryostat.io/namespace": "cryostat",
            "pod-template-hash": "7df7856846"
          },
          "children": [
            {
              "name": "quarkus-test-7df7856846",
              "nodeType": "Pod",
              "labels": {
                "app": "quarkus-test",
                "cryostat.io/name": "cryostat-sample",
                "cryostat.io/namespace": "cryostat",
                "pod-template-hash": "7df7856846"
              },
              "children": []
            }
          ]
        }
      ]
    }
  ]
}

Without prettified formatting this is 804 bytes. The associated metadata.json is 308 bytes:

{
  "labels": {
    "app": "quarkus-test",
    "cryostat.io/name": "cryostat-sample",
    "cryostat.io/namespace": "cryostat",
    "pod-template-hash": "7df7856846"
  },
  "annotations": {
    "openshift.io/scc": "restricted-v2",
    "seccomp.security.alpha.kubernetes.io/pod": "runtime/default",
    "security.openshift.io/validated-scc-subject-type": "user"
  }
}

So even without deduplicating the Namespace/Deployment, if we used a single ConfigMap it would support just under 1000 autoconfig'd Pods per Namespace with that scheme.

andrewazores · 2026-01-19T17:45:47Z

That's an interesting idea too, and it definitely could be done that way. How would Cryostat know that it needs to fill in the k8s ownership hierarchy for these particular Agents? Would we just hardcode in some assumption that if a discovery plugin's published tree is a list of Pod-wrapped nodes that Cryostat should apply the hierarchy-filling to those nodes? Conceptually I'd prefer if discovery plugins published the complete hierarchy themselves, but for the Agent specifically we also want to avoid additional dependencies and complexity, so these things are kind of at odds with each other.

I guess we could also have the Agent set a query parameter or HTTP header when it publishes its discovery subtree, and Cryostat can read the value of this additional bit of info and use that to perform an additional fill-in step. For now the k8s lookup fill-in would be the only implementation, but this allows it to be generic and extensible for other cases in the future. The Agent could know to set that query/header based on some bit of info passed down from the Operator via env var or something.

ebaron · 2026-01-19T19:24:39Z

I was thinking the pod name and namespace could be an extra bit of information sent in the registration data. Cryostat would know that that information corresponds to a pod name and namespace and then should have the information it needs to query the ownership/metadata.

andrewazores · 2026-01-19T19:30:13Z

I'll give that a try. I've been thinking about the header or query param approach and I don't really like either option, so I'll try creating a new publication endpoint that accepts this additional contextual information as part of the POST body.

andrewazores added 16 commits January 14, 2026 16:05

agent autoconfig metadata

b4cad4f

feat: add discovery metadata and hierarchy extraction for agent

4b2d163

Implements data structures and algorithms for cryostatio/cryostat-agent#477. Matches Cryostat's DiscoveryNode model with recursive owner reference traversal.

fix: update DiscoveryMetadata to simple labels/annotations format

7edb347

Agent will apply these to its internal cryostat and platform maps. Reverts to simpler structure per clarification on metadata.json format.

feat: add ConfigMap creation for agent discovery files

9ce6468

Creates ConfigMap with hierarchy.json and metadata.json for Agent to read. ConfigMap is owned by Pod for automatic cleanup. Includes 5 tests with deep hierarchy validation and comprehensive assertions.

feat: integrate discovery ConfigMap into pod_defaulter webhook

f3e34d5

- Mount discovery ConfigMap as volume at /opt/cryostat.d/conf.d/discovery - Create ConfigMap without owner reference (Pod doesn't exist yet) - Update test expectations to include discovery volume and mount

fix: update discovery mount path to /tmp/cryostat-agent/discovery

0158744

- Change from /opt/cryostat.d/conf.d/discovery to /tmp/cryostat-agent/discovery - Matches cryostat-agent's default discovery path - Update documentation to reflect correct path

feat: add controller to manage discovery ConfigMap lifecycle

31d5470

- Create DiscoveryConfigMapReconciler to watch discovery ConfigMaps - Automatically add owner references linking ConfigMaps to their Pods - Enables automatic cleanup when Pods are deleted

fixup! feat: add controller to manage discovery ConfigMap lifecycle

f6756a6

double hyphen

bcb1884

character set fixup

c8582c6

rbac

ed11491

cleanup

40472ed

rename Pod type, handle Endpoints and EndpointSlices

165e0ec

use Pod.GenerateName if Name is not yet set

0e8a236

github-actions bot added the dependent label Jan 16, 2026

mergify bot added the safe-to-test label Jan 16, 2026

andrewazores mentioned this pull request Jan 16, 2026

feat(discovery): handle hierarchical plugin publication, implement querying of merged realms cryostatio/cryostat#1249

Open

7 tasks

andrewazores added the feat New feature or request label Jan 16, 2026

andrewazores and others added 5 commits January 16, 2026 13:29

refactor, double-hyphen

2aa2dcf

filter configmap events by component label predicate

abe80bd

bundle

a41dd89

reorder

74fe3fe

cleanup, log levels

b506367

andrewazores force-pushed the agent-autoconfig-k8s-ergo branch from 495dd63 to b506367 Compare January 16, 2026 19:12

andrewazores added 2 commits January 16, 2026 14:20

expect all discovery configmap owner references to be handled by cont…

64d2dec

…roller

refactor, cleanup

e15018f

andrewazores marked this pull request as ready for review January 16, 2026 20:26

andrewazores requested review from a team and ebaron January 16, 2026 20:29

ebaron reviewed Jan 16, 2026

View reviewed changes

andrewazores marked this pull request as draft January 19, 2026 19:30

feat(autoconfig): inject Agent discovery augmentations #1218

Are you sure you want to change the base?

feat(autoconfig): inject Agent discovery augmentations #1218

Uh oh!

Conversation

andrewazores commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Welcome to Cryostat! 👋

Before contributing, make sure you have:

Description of the change:

Motivation for the change:

How to manually test:

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

andrewazores commented Jan 16, 2026

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

ebaron left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewazores commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewazores commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ebaron commented Jan 16, 2026

Uh oh!

andrewazores commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewazores commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ebaron commented Jan 16, 2026

Uh oh!

ebaron commented Jan 16, 2026

Uh oh!

andrewazores commented Jan 16, 2026

Uh oh!

ebaron commented Jan 19, 2026

Uh oh!

andrewazores commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewazores commented Jan 19, 2026

Uh oh!

andrewazores commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewazores commented Jan 19, 2026

Uh oh!

andrewazores commented Jan 19, 2026

Uh oh!

ebaron commented Jan 19, 2026

Uh oh!

andrewazores commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andrewazores commented Jan 16, 2026 •

edited

Loading

ebaron left a comment •

edited

Loading

andrewazores commented Jan 16, 2026 •

edited

Loading

andrewazores commented Jan 16, 2026 •

edited

Loading

andrewazores commented Jan 16, 2026 •

edited

Loading

andrewazores commented Jan 16, 2026 •

edited

Loading

andrewazores commented Jan 19, 2026 •

edited

Loading

andrewazores commented Jan 19, 2026 •

edited

Loading