Skip to content

Conversation

@andrewazores
Copy link
Member

@andrewazores andrewazores commented Jan 16, 2026

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits: git commit -S -m "YOUR_COMMIT_MESSAGE"

Disclaimer: This PR was written with LLM assistance

Fixes: #1068
See cryostatio/cryostat#635 (comment)
Depends on cryostatio/cryostat-agent#779

Description of the change:

Adds additional capability to the Agent autoconfig pod mutation webhook so that the Pod's ownership lineage up to the Namespace is determined (following the same algorithm as https://github.com/cryostatio/cryostat/blob/main/src/main/java/io/cryostat/discovery/KubeEndpointSlicesDiscovery.java) and written into a JSON object hierarchy.json in a new ConfigMap. The Pod labels are also written into a metadata.json in the same ConfigMap. This ConfigMap is then mounted to the autoconfigured Pod so that the Agent can pick up this augmented Discovery information.

Since the ConfigMap is created before the Pod it cannot have an OwnerReference on the Pod initially, but it only makes sense that the Pod's lifecycle defines the ConfigMap's lifecycle too, so a new controller watch is installed that watches for ConfigMaps created by the webhook and attempts to assign the owner reference to the associated Pod once it's ready.

Motivation for the change:

See cryostatio/cryostat-agent#779

How to manually test:

  1. Check out and build Agent PR and init image, Cryostat server w/ web frontend, and Operator locally, or use prebuilt images:
$ export OPERATOR_IMG=quay.io/andrewazores/cryostat-operator:4.2.0-k8s-ergo-16
$ export CORE_IMG=quay.io/andrewazores/cryostat:4.2.0-k8s-ergo-5
$ export AGENT_INIT_IMG=quay.io/andrewazores/cryostat-agent-init:0.7.0-k8s-ergo-7
  1. Build Operator locally, optionally with OCP bundle mode
  2. Deploy Operator, create a Cryostat CR, deploy a sample application
  3. Edit sample application Pod template metadata labels for Agent autoconfig, or in OCP bundle mode on OpenShift use the Console Plugin UI
  4. Verify results, should look like screenshots in feat(discovery): handle hierarchical plugin publication, implement querying of merged realms cryostat#1249

Implements data structures and algorithms for cryostatio/cryostat-agent#477.
Matches Cryostat's DiscoveryNode model with recursive owner reference traversal.
Implements complete hierarchy building from Namespace down to Pod.
Matches Cryostat's discovery model without Realm wrapper (server adds that).
Includes 5 new tests covering full chains, standalone pods, and label preservation.
Agent will apply these to its internal cryostat and platform maps.
Reverts to simpler structure per clarification on metadata.json format.
Creates ConfigMap with hierarchy.json and metadata.json for Agent to read.
ConfigMap is owned by Pod for automatic cleanup.
Includes 5 tests with deep hierarchy validation and comprehensive assertions.
- Mount discovery ConfigMap as volume at /opt/cryostat.d/conf.d/discovery
- Create ConfigMap without owner reference (Pod doesn't exist yet)
- Update test expectations to include discovery volume and mount
- Change from /opt/cryostat.d/conf.d/discovery to /tmp/cryostat-agent/discovery
- Matches cryostat-agent's default discovery path
- Update documentation to reflect correct path
- Create DiscoveryConfigMapReconciler to watch discovery ConfigMaps
- Automatically add owner references linking ConfigMaps to their Pods
- Enables automatic cleanup when Pods are deleted
When pod.Name is empty during webhook mutation, the ConfigMap name
was being generated as 'cryostat-agent-discovery-' (ending with a
hyphen), which violates Kubernetes RFC 1123 naming rules.

This fix:
- Uses pod.Name when available
- Falls back to pod.GenerateName with a random suffix when pod.Name is empty
- Returns an error if neither Name nor GenerateName is set
- Ensures truncated names don't end with hyphens or dots
- Reuses common.DefaultOSUtils.GenPasswd() for random string generation

Fixes the runtime error:
ConfigMap "cryostat-agent-discovery-" is invalid: metadata.name:
Invalid value: "cryostat-agent-discovery-": a lowercase RFC 1123
subdomain must consist of lower case alphanumeric characters, '-' or
'.', and must start and end with an alphanumeric character
@github-actions
Copy link

@andrewazores andrewazores force-pushed the agent-autoconfig-k8s-ergo branch from 495dd63 to b506367 Compare January 16, 2026 19:12
@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

/build_test completed successfully ✅.
View Actions Run.

@andrewazores andrewazores marked this pull request as ready for review January 16, 2026 20:26
@andrewazores andrewazores requested review from a team and ebaron January 16, 2026 20:29
Copy link
Member

@ebaron ebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about storing this information in a pod annotation(s) instead of a new config map? I'm just concerned about the amount of config map objects this would create. Annotations would be less complex for the operator to manage as well.

@andrewazores
Copy link
Member Author

andrewazores commented Jan 16, 2026

This just creates one ConfigMap for each autoconfig'd Pod, but placing the information into annotations would eliminate the complexity of creating the ConfigMap first and then asynchronously watching for the corresponding Pod to set its owner reference, so that would be nice. This didn't even occur to me as it seemed like the node hierarchy data would be too much to fit into annotations, but:

https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/

The metadata in an annotation can be small or large, structured or unstructured, and can include characters not permitted by labels.

I can't find any documentation about what "large" annotation metadata would be or if there is any concrete limit here. Google searches do suggest a 256KiB limit but maybe that's untrue, or was true for some older Kubernetes version? That's quite large and I don't think in practice this feature should run into that.

The Kubernetes docs don't specify a size limit, but the implementation does have a limit of 256KB: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/api/validation/objectmeta.go#L44-L67 . That's quite large and I don't think in practice this feature should run into that. But it would be possible, especially if the user or other tooling in the cluster sets other large metadata on the Pod or any of the other nodes up to the Namespace, since the hierarchy file includes a JSON representation of all of those nodes including their metadata.

I'll give that a try, since Pod metadata can be exposed as volumes by the Downward API: https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/ . At the very least this can simplify the autoconfig metadata, since the existing metadata can just be mounted as a file in the Pod without needing to marshal the metadata.json and put that into the ConfigMap.

@andrewazores
Copy link
Member Author

andrewazores commented Jan 16, 2026

At the very least this can simplify the autoconfig metadata, since the existing metadata can just be mounted as a file in the Pod without needing to marshal the metadata.json and put that into the ConfigMap.

Actually this might not be so simple - the Agent needs it to be JSON because it doesn't have the Jackson module for parsing YAML. Mounting the Pod metadata labels/annotations via Downward API is going to result in YAML files. So it looks like the Operator would need to read the metadata labels/annotations and marshal that to JSON to supply to the Agent.

But then we have a circular problem where we want to read the Pod's metadata YAML, marshal that to JSON, then store that back into the Pod metadata to mount as a file via Downward API. I think we'll end up in an infinite loop here of embedding the updated metadata back into a new update.

Hmm...

Or else we can add the dependency to the Agent so that it can parse YAML, and it should become possible to just use the existing Downward API to mount the Pod metadata as files that the Agent can read. If the hierarchy information is included in the Pod annotations (set there by the Operator webhook) then this would all be in one metadata YAML file for the Agent to process. Do you think that sounds about right @ebaron ?

@ebaron
Copy link
Member

ebaron commented Jan 16, 2026

It looks like the label/annotations files from the downward APIs aren't YAML but simple key-value pairs separated by line:
https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/

@andrewazores
Copy link
Member Author

andrewazores commented Jan 16, 2026

For annotations, which can include large and structured data, what would happen? The value of an annotation can itself be stringified JSON, for example, and might even include newlines:

- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.217.1.147/23"],"mac_address":"0a:58:0a:d9:01:93","gateway_ips":["10.217.0.1"],"routes":[{"dest":"10.217.0.0/22","nextHop":"10.217.0.1"},{"dest":"10.217.4.0/23","nextHop":"10.217.0.1"},{"dest":"169.254.0.5/32","nextHop":"10.217.0.1"},{"dest":"100.64.0.0/16","nextHop":"10.217.0.1"}],"ip_address":"10.217.1.147/23","gateway_ip":"10.217.0.1","role":"primary"}}'
      k8s.v1.cni.cncf.io/network-status: |-
        [{
            "name": "ovn-kubernetes",
            "interface": "eth0",
            "ips": [
                "10.217.1.147"
            ],
            "mac": "0a:58:0a:d9:01:93",
            "default": true,
            "dns": {}
        }]

(this is real data taken from a oc get -o yaml invocation - the first annotation has no newlines, the second one does)

Maybe I need to just try it out and see how this kind of annotation gets written, but since annotations are documented as potentially containing structured data I think that means the value side is allowed to have significant whitespace, ie the annotation value could itself be a YAML document. In that case, I don't think it's possible that this can be exposed as a simple key=value file format.

@andrewazores
Copy link
Member Author

andrewazores commented Jan 16, 2026

https://kubernetes.io/docs/concepts/workloads/pods/downward-api/#downwardapi-fieldRef

metadata.annotations
all of the pod's annotations, formatted as annotation-key="escaped-annotation-value" with one annotation per line

I wonder what the escape format is here. We could just leave it escaped, but then the annotations are going to appear escaped within the Cryostat API and UI, so they won't match up directly with the k8s view of things, which could be annoying/frustrating for the user.

@ebaron
Copy link
Member

ebaron commented Jan 16, 2026

https://kubernetes.io/docs/concepts/workloads/pods/downward-api/#downwardapi-fieldRef

metadata.annotations
all of the pod's annotations, formatted as annotation-key="escaped-annotation-value" with one annotation per line

I wonder what the escape format is here. We could just leave it escaped, but then the annotations are going to appear escaped within the Cryostat API and UI, so they won't match up directly with the k8s view of things, which could be annoying/frustrating for the user.

If I read the code correctly, it uses strconv.AppendQuote on the annotation value:
https://github.com/kubernetes/kubernetes/blob/6817bea700feddae1fb77c5ef60b557c3d1349b8/pkg/fieldpath/fieldpath.go#L48
Looks like that only escapes double quotes? https://pkg.go.dev/strconv#example-AppendQuote

@ebaron
Copy link
Member

ebaron commented Jan 16, 2026

Ah, it's a little more than double quotes: https://pkg.go.dev/strconv#Quote

@andrewazores
Copy link
Member Author

I could just go ahead with it anyway, but I'm still worried that in the future either we or some user are going to have a reason to want to process these bits of metadata, and it'll be annoying that it's strconv escaped when most of everything else involved is Java-based.

@ebaron
Copy link
Member

ebaron commented Jan 19, 2026

I could just go ahead with it anyway, but I'm still worried that in the future either we or some user are going to have a reason to want to process these bits of metadata, and it'll be annoying that it's strconv escaped when most of everything else involved is Java-based.

I'm just pretty concerned about the one config map per pod. I worry this could upset users, especially those with many replicas. Some of which might have quotas in place for the number of objects a specific type: https://kubernetes.io/docs/concepts/policy/resource-quotas/#quota-on-object-count. Maybe I'm overthinking it, what do you think?

An alternative implementation I had in mind was to use the downward API to get the pod name and namespace, send those to Cryostat as part of the registration process and then Cryostat can use that information to run through the existing Kubernetes discovery code path to fill in the ownership hierarchy and labels/annotations.

@andrewazores
Copy link
Member Author

andrewazores commented Jan 19, 2026

I'm just pretty concerned about the one config map per pod. I worry this could upset users, especially those with many replicas. Some of which might have quotas in place for the number of objects a specific type: https://kubernetes.io/docs/concepts/policy/resource-quotas/#quota-on-object-count. Maybe I'm overthinking it, what do you think?

That's fair, it's possible this could be an issue too.

An alternative implementation I had in mind was to use the downward API to get the pod name and namespace, send those to Cryostat as part of the registration process and then Cryostat can use that information to run through the existing Kubernetes discovery code path to fill in the ownership hierarchy and labels/annotations.

That's an interesting idea too, and it definitely could be done that way. How would Cryostat know that it needs to fill in the k8s ownership hierarchy for these particular Agents? Would we just hardcode in some assumption that if a discovery plugin's published tree is a list of Pod-wrapped nodes that Cryostat should apply the hierarchy-filling to those nodes? Conceptually I'd prefer if discovery plugins published the complete hierarchy themselves, but for the Agent specifically we also want to avoid additional dependencies and complexity, so these things are kind of at odds with each other.

@andrewazores
Copy link
Member Author

Here's another idea - what if the Operator created one ConfigMap per Namespace (or one per NS which contains at least one autoconfig'd Deployment), and then updated that ConfigMap's contents whenever the set of autoconfig'd Pods changed? Then it could mount the ConfigMap to the Pods, but only the particular relevant files within the ConfigMap - so maybe the files would have a naming scheme including the Pod's name to make them unique. The data within the ConfigMap is probably not that large per Pod, so this one ConfigMap itself shouldn't end up being too large unless there are really a lot of replicas or some node in the hierarchy has a lot of metadata (ex. very large annotations).

@andrewazores
Copy link
Member Author

andrewazores commented Jan 19, 2026

Or another refinement to that idea:

  • ConfigMap contains a deployment-xx.json hierarchy file for each autoconfig'd Deployment, which basically just has a model of the Deployment node as a child of the Namespace node. Both would contain all metadata etc.
  • ConfigMap also contains a 'pod-xx.json' for each autoconfig'd Pod resulting from those Deployments, which has a model of the Pod and whatever node(s) are between it and the parent Deployment (ex. ReplicaSet).
  • and maybe we ditch the metadata.json, since in practice this just contains another copy of the metadata already present on the Pod node. The Agent side of the implementation can be updated to merge its Target self-node definition with the innermost hierarchy node, rather than adding the self-node as a child of the innermost node, and the Operator can fill in a stub of that Target node containing this metadata.

This results in a few more files within the ConfigMap, but reduces the duplication of data across each file so the overall ConfigMap size will be smaller. In practice ConfigMap sizes are limited to 1MB by etcd, so that should scale up to a large number of replicas just fine unless there is some really large metadata attached to some nodes.

For example, this hierarchy from Pod to Namespace is only 514 bytes (without prettified formatting):

{
  "name": "production-namespace",
  "nodeType": "Namespace",
  "labels": {
    "kubernetes.io/metadata.name": "production",
    "environment": "production"
  },
  "children": [
    {
      "name": "my-app-deployment",
      "nodeType": "Deployment",
      "labels": {
        "app": "my-application",
        "app.kubernetes.io/name": "my-app",
        "app.kubernetes.io/version": "1.0.0",
        "app.kubernetes.io/component": "backend"
      },
      "children": [
        {
          "name": "my-app-pod-abc123",
          "nodeType": "Pod",
          "labels": {
            "app": "my-application",
            "pod-template-hash": "abc123",
            "controller-revision-hash": "my-app-7d9f8c6b5"
          }
        }
      ]
    }
  ]
}

so even if we assume in practice there is more metadata in the hierarchy that pushes it up to 1KB per Pod, the ConfigMap can still support up to 1000 autoconfig'd Pods per Namespace, and even more than that if we split it up and deduplicate the Namespace/Deployment from each hierarchy file.

@andrewazores
Copy link
Member Author

A hierarchy.json taken from the current implementation on a simple sample application:

{
  "name": "apps1",
  "nodeType": "Namespace",
  "labels": {
    "kubernetes.io/metadata.name": "apps1",
    "pod-security.kubernetes.io/audit": "restricted",
    "pod-security.kubernetes.io/audit-version": "latest",
    "pod-security.kubernetes.io/warn": "restricted",
    "pod-security.kubernetes.io/warn-version": "latest"
  },
  "children": [
    {
      "name": "quarkus-test",
      "nodeType": "Deployment",
      "labels": {
        "app": "quarkus-test"
      },
      "children": [
        {
          "name": "quarkus-test-7df7856846",
          "nodeType": "ReplicaSet",
          "labels": {
            "app": "quarkus-test",
            "cryostat.io/name": "cryostat-sample",
            "cryostat.io/namespace": "cryostat",
            "pod-template-hash": "7df7856846"
          },
          "children": [
            {
              "name": "quarkus-test-7df7856846",
              "nodeType": "Pod",
              "labels": {
                "app": "quarkus-test",
                "cryostat.io/name": "cryostat-sample",
                "cryostat.io/namespace": "cryostat",
                "pod-template-hash": "7df7856846"
              },
              "children": []
            }
          ]
        }
      ]
    }
  ]
}

Without prettified formatting this is 804 bytes. The associated metadata.json is 308 bytes:

{
  "labels": {
    "app": "quarkus-test",
    "cryostat.io/name": "cryostat-sample",
    "cryostat.io/namespace": "cryostat",
    "pod-template-hash": "7df7856846"
  },
  "annotations": {
    "openshift.io/scc": "restricted-v2",
    "seccomp.security.alpha.kubernetes.io/pod": "runtime/default",
    "security.openshift.io/validated-scc-subject-type": "user"
  }
}

So even without deduplicating the Namespace/Deployment, if we used a single ConfigMap it would support just under 1000 autoconfig'd Pods per Namespace with that scheme.

@andrewazores
Copy link
Member Author

That's an interesting idea too, and it definitely could be done that way. How would Cryostat know that it needs to fill in the k8s ownership hierarchy for these particular Agents? Would we just hardcode in some assumption that if a discovery plugin's published tree is a list of Pod-wrapped nodes that Cryostat should apply the hierarchy-filling to those nodes? Conceptually I'd prefer if discovery plugins published the complete hierarchy themselves, but for the Agent specifically we also want to avoid additional dependencies and complexity, so these things are kind of at odds with each other.

I guess we could also have the Agent set a query parameter or HTTP header when it publishes its discovery subtree, and Cryostat can read the value of this additional bit of info and use that to perform an additional fill-in step. For now the k8s lookup fill-in would be the only implementation, but this allows it to be generic and extensible for other cases in the future. The Agent could know to set that query/header based on some bit of info passed down from the Operator via env var or something.

@ebaron
Copy link
Member

ebaron commented Jan 19, 2026

I was thinking the pod name and namespace could be an extra bit of information sent in the registration data. Cryostat would know that that information corresponds to a pod name and namespace and then should have the information it needs to query the ownership/metadata.

@andrewazores
Copy link
Member Author

I'll give that a try. I've been thinking about the header or query param approach and I don't really like either option, so I'll try creating a new publication endpoint that accepts this additional contextual information as part of the POST body.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependent feat New feature or request safe-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Request] Agent autoconfig Downward API properties

2 participants