-
Notifications
You must be signed in to change notification settings - Fork 434
[dcgm-exporter] Support exposing metrics on hostNetwork #1962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dcgm-exporter] Support exposing metrics on hostNetwork #1962
Conversation
e12adfc to
9c1b323
Compare
9c1b323 to
d858d24
Compare
|
/ok-to-test d858d24 |
|
@nikhaild, thanks for your contribution. In general, we like to ensure that our generated YAML manifests are aligned with the upstream helm chart of dcgm-exporter. It would be good if you can open a PR/issue in NVIDIA/dcgm-exporter and get the maintainers of dcgm-exporter to weigh in on this first. |
Thanks @tariq1890 for taking a look! Upstream dcgm-exporter helm chart already supports "templatizing" this It's just not set explicitly to a default value in dcgm-exporter helm chart values.yaml, but looks like some folks use it already (ref issue#495). Mind clarifying what exactly should I ask from dcgm-exporter maintainer folks? Would appreciate if you could elaborate a bit. |
|
Hi, |
|
Please add a unit test as well |
2a33799 to
30b19e2
Compare
|
Thanks @tariq1890 for the review!
I've updated the unit tests in I also saw there was no unit test for "case for DCGM running on the host itself(DGX BaseOS)" existing scenario/code-path: gpu-operator/controllers/object_controls.go Lines 1694 to 1698 in 1de9839
have added a unit test case for it too here: https://github.com/NVIDIA/gpu-operator/pull/1962/changes#diff-97f4959bc802c8178bd5dd8118463f0f937130ec0cb9dbfb74d3ec01f2793f9aR1372-R1406 Let me know please if these tests look sufficient! |
|
Thanks @nikhaild the test cases look sufficient to me ! Can you squash your commit history and enable signoff to satisfy the DCO check? |
30b19e2 to
af2e516
Compare
Add support to toggle `hostNetwork` field of pod spec for dcgm-exporter daemonset pod spec, allowing dcgm-exporter pods to be scraped by say prometheus-server that runs outside of the bounds of the k8s cluster overlay network (and still be able to reach dcgm-exporter pods). Fixes NVIDIA#1086 Signed-off-by: Nikhil R Deshpande <[email protected]>
af2e516 to
95255ef
Compare
Squashed, also rebased on upstream/main! |
|
/ok to test 95255ef |
rahulait
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add support to toggle
hostNetworkfield of pod spec for dcgm-exporter daemonset pod spec,allowing dcgm-exporter pods to be scraped by say prometheus-server that runs outside of the
bounds of the k8s cluster overlay network (and still be able to reach dcgm-exporter pods,
i.e. scraping each daemonset pod's port on the node - this is easier to reason and deal with
compared to say scraping a NodePort service ports on each node).
Fixes #1086