-
Notifications
You must be signed in to change notification settings - Fork 3
feat: migrate OTEL Collector from Deployment to DaemonSet architecture #1141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Convert otel-collector from Deployment to DaemonSet for per-node collectors - Add hostPort (4317/4318) to expose OTEL endpoints on each node - Add tolerations for control-plane/master nodes - Reduce resource limits for distributed deployment (512Mi/500m per node) - Inject NODE_IP via Kubernetes Downward API into microservices - Update OTEL endpoints in values.yaml to use $(NODE_IP) instead of service name Services updated: budgateway, budmetrics, budprompt, budcluster 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Summary of ChangesHello @vsraccubits, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the OpenTelemetry (OTEL) collection strategy within the Kubernetes environment. By transitioning the OTEL Collector from a standard Deployment to a DaemonSet, it ensures that a dedicated collector runs on each node, optimizing local telemetry data capture. The change also involves exposing collector endpoints directly via host ports, adjusting resource allocations for efficiency, and dynamically configuring microservices to send their telemetry data to the local collector using the node's IP address, thereby streamlining the observability pipeline. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly migrates the OpenTelemetry collector from a Deployment to a DaemonSet, which is a good architectural improvement for collecting telemetry on a per-node basis. The changes to inject the node IP into application pods and update the OTEL endpoint accordingly are implemented correctly across most services. My review includes a few points: a high-severity security concern regarding the use of hostPort, a medium-severity suggestion to remove a now-redundant Kubernetes Service, and a medium-severity comment about an incomplete configuration for the budmetrics service.
| containerPort: 4317 | ||
| hostPort: 4317 | ||
| protocol: TCP | ||
| - name: otlp-http | ||
| containerPort: 4318 | ||
| hostPort: 4318 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using hostPort exposes the OpenTelemetry collector ports on the node's IP address. If the nodes have publicly routable IP addresses, this could expose the collector to the internet, which can be a security risk. It is highly recommended to use NetworkPolicies to restrict access to these ports, allowing traffic only from within the cluster or trusted sources.
| --- | ||
| apiVersion: apps/v1 | ||
| kind: Deployment | ||
| kind: DaemonSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the collector now running as a DaemonSet and services configured to send telemetry to the collector on the same node via $(NODE_IP), the ClusterIP service for otel-collector defined later in this file (lines 178-201) appears to be redundant. If this service is no longer used by any component, consider removing it to simplify the configuration.
| - name: NODE_IP | ||
| valueFrom: | ||
| fieldRef: | ||
| fieldPath: status.hostIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NODE_IP environment variable is being injected into the budmetrics container, but it doesn't appear to be used. The values.yaml file does not define an OTEL_..._ENDPOINT variable for the budmetrics service that would consume this NODE_IP. Please verify if budmetrics is intended to send telemetry to the OpenTelemetry collector and update values.yaml accordingly. If it's not needed, this environment variable injection should be removed.
Services updated: budgateway, budmetrics, budprompt, budcluster