Skip to content

feat(k8s): implement wildcard All Namespaces discovery#725

Merged
andrewazores merged 8 commits intocryostatio:mainfrom
andrewazores:k8s-all-namespaces
Jun 12, 2025
Merged

feat(k8s): implement wildcard All Namespaces discovery#725
andrewazores merged 8 commits intocryostatio:mainfrom
andrewazores:k8s-all-namespaces

Conversation

@andrewazores
Copy link
Member

@andrewazores andrewazores commented Nov 26, 2024

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits using a GPG signature

To recreate commits with GPG signature git fetch upstream && git rebase --force --gpg-sign upstream/main


Fixes: #722
Depends on #689
Based on #689
Depends on #740

Description of the change:

Handles the * value in Kubernetes discovery namespaces as a wildcard indicating "All Namespaces". When this is found then, rather than creating an Endpoints Informer for each target namespace, Cryostat creates a single Informer for Endpoints in any namespace in the cluster. A few pieces of supporting logic are modified to suit this change, ex. grabbing the client-side cache (in Cryostat application memory) and doing per-namespace filtering of Endpoints objects found in that cluster-wide Informer, rather than having each Informer's cache already be segmented by namespace.

Motivation for the change:

When suitably deployed in a Kubernetes cluster - with accompanying ClusterRole(Binding) to allow the serviceaccount to discover Endpoints cluster-wide - this allows the user to install a single Cryostat instance and have visibility of applications in any namespace. This also includes the possibility that namespaces are created or deleted within Cryostat's lifetime. Users should take care to carefully select the port names/numbers that will be deemed as compatible targets, since Cryostat will see every single Endpoints object in the cluster. Cryostat may also see a large number of update events from Kubernetes if the cluster is large and active with many Endpoints changes, which may be burdensome and resource intensive.

How to manually test:

  1. See feat(discovery): implement All Namespaces discovery cryostat-helm#213

@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 11/28/2024, 3:10:23 PM. View Actions Run.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/12074936405

Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have some questions while checking cryostatio/cryostat-helm#213 :D

Comment on lines +161 to +162
// TODO we should not need to force manual re-syncs this way - the Informer is already
// supposed to resync itself.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious here: Is it now safe to remove this manual resync? If not, in the case of all namespaces, we should also list namespaces and do resync too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be needed, but I have it there just in case since I have seen problems with Informers or Watchers failing to update in the past (ex. dropping the WebSocket and failing to ever reconnect), and we don't expose any mechanism to restart or force-resync those other than just restarting the entire Cryostat container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, the new changes look good to me :D Thanks!

.inAnyNamespace()
.inform(
KubeApiDiscovery.this,
informerResyncPeriod.toMillis()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, add a logger.debugv here?

 logger.debugv(
        "Started Endpoints SharedInformer for all namespace with resync period {0}",
        informerResyncPeriod);

Comment on lines +145 to +147
private boolean watchAllNamespaces() {
return kubeConfig.getWatchNamespaces().stream().anyMatch(ns -> ALL_NAMESPACES.equals(ns));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make sense to bring this method and ALL_NAMESPACES into KubeConfig class?

@Josh-Matsuoka
Copy link
Contributor

/build_test

@github-actions
Copy link

Workflow started at 4/17/2025, 4:27:32 PM. View Actions Run.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/14524233066

@andrewazores andrewazores force-pushed the k8s-all-namespaces branch from e60a357 to a0361ac Compare May 7, 2025 19:44
@andrewazores
Copy link
Member Author

Seems like I have some kind of bug here:

Screenshot_2025-05-08_10-04-51
Screenshot_2025-05-08_10-04-33

Notice that the connection URLs correctly indicate that these two applications are in apps1 and apps2, but the discovery tree placed them both within an apps2 namespace node.

@andrewazores
Copy link
Member Author

{
  "id": 1,
  "name": "Universe",
  "nodeType": "Universe",
  "labels": [],
  "children": [
    {
      "id": 2,
      "name": "Custom Targets",
      "nodeType": "Realm",
      "labels": [],
      "children": []
    },
    {
      "id": 3,
      "name": "KubernetesApi",
      "nodeType": "Realm",
      "labels": [],
      "children": [
        {
          "id": 65,
          "name": "apps2",
          "nodeType": "Namespace",
          "labels": [],
          "children": [
            {
              "id": 63,
              "name": "quarkus-test",
              "nodeType": "Deployment",
              "labels": [
                {
                  "key": "app",
                  "value": "quarkus-test"
                },
                {
                  "key": "discovery.cryostat.io/namespace",
                  "value": "apps1"
                }
              ],
              "children": [
                {
                  "id": 61,
                  "name": "quarkus-test-855b45956d",
                  "nodeType": "ReplicaSet",
                  "labels": [
                    {
                      "key": "app",
                      "value": "quarkus-test"
                    },
                    {
                      "key": "pod-template-hash",
                      "value": "855b45956d"
                    },
                    {
                      "key": "discovery.cryostat.io/namespace",
                      "value": "apps1"
                    }
                  ],
                  "children": [
                    {
                      "id": 56,
                      "name": "quarkus-test-855b45956d-85kls",
                      "nodeType": "Pod",
                      "labels": [
                        {
                          "key": "app",
                          "value": "quarkus-test"
                        },
                        {
                          "key": "pod-template-hash",
                          "value": "855b45956d"
                        },
                        {
                          "key": "discovery.cryostat.io/namespace",
                          "value": "apps1"
                        }
                      ],
                      "children": [
                        {
                          "id": 59,
                          "name": "service:jmx:rmi:///jndi/rmi://10-217-0-98.apps1.pod:9097/jmxrmi",
                          "nodeType": "Endpoint",
                          "labels": [
                            {
                              "key": "app",
                              "value": "quarkus-test"
                            },
                            {
                              "key": "pod-template-hash",
                              "value": "855b45956d"
                            }
                          ],
                          "children": [],
                          "target": {
                            "id": 1,
                            "connectUrl": "service:jmx:rmi:///jndi/rmi://10-217-0-98.apps1.pod:9097/jmxrmi",
                            "alias": "quarkus-test-855b45956d-85kls",
                            "jvmId": "MYowwoqm-7CRUmPDGFE_xG_amGUPz946h3ecny71o1o=",
                            "labels": [
                              {
                                "key": "app",
                                "value": "quarkus-test"
                              },
                              {
                                "key": "pod-template-hash",
                                "value": "855b45956d"
                              }
                            ],
                            "annotations": {
                              "platform": [
                                {
                                  "key": "openshift.io/scc",
                                  "value": "restricted-v2"
                                },
                                {
                                  "key": "k8s.ovn.org/pod-networks",
                                  "value": "{\"default\":{\"ip_addresses\":[\"10.217.0.98/23\"],\"mac_address\":\"0a:58:0a:d9:00:62\",\"gateway_ips\":[\"10.217.0.1\"],\"routes\":[{\"dest\":\"10.217.0.0/22\",\"nextHop\":\"10.217.0.1\"},{\"dest\":\"10.217.4.0/23\",\"nextHop\":\"10.217.0.1\"},{\"dest\":\"169.254.0.5/32\",\"nextHop\":\"10.217.0.1\"},{\"dest\":\"100.64.0.0/16\",\"nextHop\":\"10.217.0.1\"}],\"ip_address\":\"10.217.0.98/23\",\"gateway_ip\":\"10.217.0.1\",\"role\":\"primary\"}}"
                                },
                                {
                                  "key": "k8s.v1.cni.cncf.io/network-status",
                                  "value": "[{\n    \"name\": \"ovn-kubernetes\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.217.0.98\"\n    ],\n    \"mac\": \"0a:58:0a:d9:00:62\",\n    \"default\": true,\n    \"dns\": {}\n}]"
                                },
                                {
                                  "key": "seccomp.security.alpha.kubernetes.io/pod",
                                  "value": "runtime/default"
                                }
                              ],
                              "cryostat": [
                                {
                                  "key": "HOST",
                                  "value": "10.217.0.98"
                                },
                                {
                                  "key": "PORT",
                                  "value": "9097"
                                },
                                {
                                  "key": "REALM",
                                  "value": "KubernetesApi"
                                },
                                {
                                  "key": "POD_NAME",
                                  "value": "quarkus-test-855b45956d-85kls"
                                },
                                {
                                  "key": "NAMESPACE",
                                  "value": "apps1"
                                }
                              ]
                            },
                            "agent": false
                          }
                        }
                      ]
                    },
                    {
                      "id": 58,
                      "name": "quarkus-test-855b45956d-m7vwp",
                      "nodeType": "Pod",
                      "labels": [
                        {
                          "key": "app",
                          "value": "quarkus-test"
                        },
                        {
                          "key": "pod-template-hash",
                          "value": "855b45956d"
                        },
                        {
                          "key": "discovery.cryostat.io/namespace",
                          "value": "apps2"
                        }
                      ],
                      "children": [
                        {
                          "id": 66,
                          "name": "service:jmx:rmi:///jndi/rmi://10-217-0-99.apps2.pod:9097/jmxrmi",
                          "nodeType": "Endpoint",
                          "labels": [
                            {
                              "key": "app",
                              "value": "quarkus-test"
                            },
                            {
                              "key": "pod-template-hash",
                              "value": "855b45956d"
                            }
                          ],
                          "children": [],
                          "target": {
                            "id": 2,
                            "connectUrl": "service:jmx:rmi:///jndi/rmi://10-217-0-99.apps2.pod:9097/jmxrmi",
                            "alias": "quarkus-test-855b45956d-m7vwp",
                            "jvmId": "R8iKtq1BsOUE9LwC9id_uQEBi3wfsyNnLmFt_FQYFlc=",
                            "labels": [
                              {
                                "key": "app",
                                "value": "quarkus-test"
                              },
                              {
                                "key": "pod-template-hash",
                                "value": "855b45956d"
                              }
                            ],
                            "annotations": {
                              "platform": [
                                {
                                  "key": "openshift.io/scc",
                                  "value": "restricted-v2"
                                },
                                {
                                  "key": "k8s.ovn.org/pod-networks",
                                  "value": "{\"default\":{\"ip_addresses\":[\"10.217.0.99/23\"],\"mac_address\":\"0a:58:0a:d9:00:63\",\"gateway_ips\":[\"10.217.0.1\"],\"routes\":[{\"dest\":\"10.217.0.0/22\",\"nextHop\":\"10.217.0.1\"},{\"dest\":\"10.217.4.0/23\",\"nextHop\":\"10.217.0.1\"},{\"dest\":\"169.254.0.5/32\",\"nextHop\":\"10.217.0.1\"},{\"dest\":\"100.64.0.0/16\",\"nextHop\":\"10.217.0.1\"}],\"ip_address\":\"10.217.0.99/23\",\"gateway_ip\":\"10.217.0.1\",\"role\":\"primary\"}}"
                                },
                                {
                                  "key": "k8s.v1.cni.cncf.io/network-status",
                                  "value": "[{\n    \"name\": \"ovn-kubernetes\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.217.0.99\"\n    ],\n    \"mac\": \"0a:58:0a:d9:00:63\",\n    \"default\": true,\n    \"dns\": {}\n}]"
                                },
                                {
                                  "key": "seccomp.security.alpha.kubernetes.io/pod",
                                  "value": "runtime/default"
                                }
                              ],
                              "cryostat": [
                                {
                                  "key": "HOST",
                                  "value": "10.217.0.99"
                                },
                                {
                                  "key": "PORT",
                                  "value": "9097"
                                },
                                {
                                  "key": "REALM",
                                  "value": "KubernetesApi"
                                },
                                {
                                  "key": "POD_NAME",
                                  "value": "quarkus-test-855b45956d-m7vwp"
                                },
                                {
                                  "key": "NAMESPACE",
                                  "value": "apps2"
                                }
                              ]
                            },
                            "agent": false
                          }
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    },
    {
      "id": 4,
      "name": "JDP",
      "nodeType": "Realm",
      "labels": [],
      "children": []
    },
    {
      "id": 5,
      "name": "Podman",
      "nodeType": "Realm",
      "labels": [],
      "children": []
    },
    {
      "id": 6,
      "name": "Docker",
      "nodeType": "Realm",
      "labels": [],
      "children": []
    }
  ]
}

It is an actual data issue too, not just a UI rendering bug or something.

@andrewazores
Copy link
Member Author

Uh-oh. Same result if I deploy (with the Helm chart) using --set core.discovery.kubernetes.namespaces='{apps1,apps2}' instead of --set core.discovery.kubernetes.allowAllNamespaces=true, so this seems like the bug actually doesn't originate from this PR.

image

I suspect something went wrong in refactoring with #870 , or maybe #689 (much less likely since that was so long ago)

@andrewazores
Copy link
Member Author

Backing out #870 does fix the bug. I'll dissect the specific issue and open a separate PR to fix it.

@andrewazores
Copy link
Member Author

^ #901

@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

github-actions bot commented May 8, 2025

Workflow started at 5/8/2025, 11:39:53 AM. View Actions Run.

@github-actions
Copy link

github-actions bot commented May 8, 2025

No GraphQL schema changes detected.

@github-actions
Copy link

github-actions bot commented May 8, 2025

No OpenAPI schema changes detected.

@github-actions
Copy link

github-actions bot commented May 8, 2025

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/14910402264

Josh-Matsuoka
Josh-Matsuoka previously approved these changes Jun 11, 2025
@github-actions
Copy link

@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 6/12/2025, 11:24:35 AM. View Actions Run.

@github-actions
Copy link

CI build and push: At least one test failed ❌
https://github.com/cryostatio/cryostat/actions/runs/15614665593

@andrewazores
Copy link
Member Author

/build_test

@github-actions
Copy link

Workflow started at 6/12/2025, 11:28:34 AM. View Actions Run.

@github-actions
Copy link

No GraphQL schema changes detected.

@github-actions
Copy link

No OpenAPI schema changes detected.

@github-actions
Copy link

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/15614758154

@andrewazores andrewazores merged commit 40efbd7 into cryostatio:main Jun 12, 2025
9 checks passed
@andrewazores andrewazores deleted the k8s-all-namespaces branch June 12, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat New feature or request safe-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task] All Namespaces k8s discovery mode

3 participants