-
Notifications
You must be signed in to change notification settings - Fork 430
add ServiceMonitor Configuration #1486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- 为scheduler和device-plugin组件添加ServiceMonitor模板 - 在values.yaml中添加servicemonitor配置选项 - 支持自定义监控间隔、标签和注解 - 默认启用ServiceMonitor功能 Signed-off-by: clcc2019 <[email protected]>
- 为scheduler组件添加ServiceMonitor配置参数文档 - 为device-plugin组件添加ServiceMonitor配置参数文档 - 包含enabled、labels、annotations、interval和honorLabels参数说明 - 提供默认值和详细描述 Signed-off-by: clcc2019 <[email protected]>
- Updated values.yaml to set servicemonitor.enabled to false for both scheduler and devicePlugin components, disabling the ServiceMonitor feature by default. Signed-off-by: clcc2019 <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: clcc2019 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary of ChangesHello @clcc2019, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the observability of the Hami project by integrating Prometheus ServiceMonitors for its core components: the scheduler and the device plugin. It provides new configuration options within the Helm chart to manage these ServiceMonitors, allowing for flexible and customizable monitoring setups. The changes include adding the necessary Kubernetes manifest files and updating the documentation to reflect these new capabilities. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces ServiceMonitor configurations for both the scheduler and the device-plugin, which is a valuable addition for enabling Prometheus-based monitoring. The changes to values.yaml and the README.md are well-executed. However, I've identified a critical copy-paste error in the device-plugin's ServiceMonitor template that incorrectly links its creation to the scheduler's configuration. Additionally, there are a few minor areas for cleanup, such as removing a leftover code comment and ensuring files end with a newline.
| @@ -0,0 +1,33 @@ | |||
| {{- if .Values.scheduler.servicemonitor.enabled }} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a copy-paste error in this conditional check. The creation of the device-plugin ServiceMonitor should be controlled by its own configuration flag, .Values.devicePlugin.monitor.servicemonitor.enabled, not the scheduler's flag. As it is, this ServiceMonitor would be incorrectly created only when the scheduler's monitor is enabled.
{{- if .Values.devicePlugin.monitor.servicemonitor.enabled }}| matchLabels: | ||
| app.kubernetes.io/component: hami-device-plugin | ||
| {{- include "hami-vgpu.labels" . | nindent 6 }} | ||
| {{- if .Values.devicePlugin.service.labels }} # Use devicePlugin instead of scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| {{- if .Values.devicePlugin.service.labels }} # Use devicePlugin instead of scheduler | ||
| {{ toYaml .Values.devicePlugin.service.labels | indent 6 }} | ||
| {{- end }} | ||
| {{- end }} No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| {{- if .Values.scheduler.service.labels }} | ||
| {{ toYaml .Values.scheduler.service.labels | indent 6 }} | ||
| {{- end }} | ||
| {{- end }} No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
- Fix device-plugin servicemonitor to use correct Values path - Add missing newline at end of servicemonitor files Signed-off-by: clcc2019 <[email protected]>
| name: {{ include "hami-vgpu.device-plugin" . }} | ||
| namespace: {{ include "hami-vgpu.namespace" . }} | ||
| labels: | ||
| {{- include "hami-vgpu.labels" . | nindent 4 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering add default label release: prometheus for prometheus to select this ServiceMonitor.
|
#1255 did the same work, can you try to merge it? |
The |
I think the label is required as the servicemonitor can't work with no label. But It may work with the default label. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to set interval in the ServiceMonitors by default. It should only be set if the user defines something. This chart should not be dictating what the default. Prometheus-operator prometheus CR already has a global default set which is what I'd expect to kick in if interval is not set.
HAMi's ServiceMonitors should also support setting other endpoints configs such as : scrapeTimeout & relabelings. As stated previously, if these are not defined by the user, they should not be set at all in favour of the Prometheus default for scrapeTimeout.
If you want to future proof this and don't necessarily want to support every possibility that the CRD offers, you should also add a generic servicemonitor.extraEndpointProperties (something like that, it could also be servicemonitor.extraEndpointConfig).
One last thing regarding the namespace where these get created, I think it makes sense by default to create them in the same namespace where HAMi is deployed however most of the community charts that I've seen over the past years generally offer the possibility to overwrite that and deploy them in a custom namespace.
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: