-
Notifications
You must be signed in to change notification settings - Fork 137
[platform] Split telemetry between operator and controller #1869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughRemoves the cozystack-version flag from controller and operator, eliminates CozystackVersion from telemetry config, adds telemetry support to the operator, refactors the collector to aggregate ApplicationDefinitions and HelmReleases data, introduces OperatorCollector, implements version injection via ldflags, and updates build tooling. Changes
Sequence Diagram(s)sequenceDiagram
actor Startup
participant Manager as ControllerManager
participant TelemetryConfig as Telemetry Config
participant OperatorCollector as OperatorCollector
participant Cluster as Kubernetes Cluster
participant Endpoint as Telemetry Endpoint
Startup->>Manager: Start manager
Startup->>TelemetryConfig: Parse telemetry flags
TelemetryConfig->>OperatorCollector: NewOperatorCollector(config)
OperatorCollector->>OperatorCollector: Initialize with client & config
Manager->>OperatorCollector: Attach to manager
Startup->>OperatorCollector: Start(ctx)
OperatorCollector->>OperatorCollector: Run initial collect()
OperatorCollector->>Cluster: List nodes, PVs, services, packages
Cluster-->>OperatorCollector: Cluster metadata
OperatorCollector->>OperatorCollector: Aggregate metrics
OperatorCollector->>Endpoint: POST metrics
Endpoint-->>OperatorCollector: 200 OK
OperatorCollector->>OperatorCollector: Wait on ticker
OperatorCollector->>Cluster: Periodic collect()
Cluster-->>OperatorCollector: Updated metadata
OperatorCollector->>Endpoint: POST metrics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @kvaps, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the telemetry system within Cozystack to provide a more granular and accurate view of both cluster infrastructure and application deployments. By separating telemetry responsibilities between the operator and controller, the system can now collect more relevant data for each component. Additionally, the change streamlines version reporting through build-time injection and updates several metric definitions to better reflect the system's state. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively refactors the telemetry collection by splitting responsibilities between the operator and the controller, which improves modularity. The change to inject the version at build time using ldflags is a good improvement over runtime flags. My review includes a high-severity suggestion to address significant code duplication between the new telemetry collectors, which would improve long-term maintainability. I've also included medium-severity comments to encourage the use of structured logging for better observability. Overall, this is a great set of changes that improves the telemetry architecture.
| func (c *OperatorCollector) Start(ctx context.Context) error { | ||
| if c.config.Disabled { | ||
| return nil | ||
| } | ||
|
|
||
| c.ticker = time.NewTicker(c.config.Interval) | ||
| c.stopCh = make(chan struct{}) | ||
|
|
||
| // Initial collection | ||
| c.collect(ctx) | ||
|
|
||
| for { | ||
| select { | ||
| case <-ctx.Done(): | ||
| c.ticker.Stop() | ||
| close(c.stopCh) | ||
| return nil | ||
| case <-c.ticker.C: | ||
| c.collect(ctx) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is significant code duplication between OperatorCollector in this file and Collector in internal/telemetry/collector.go. The Start, NeedLeaderElection, and sendMetrics methods are identical, and the struct definitions are very similar. This makes the code harder to maintain.
I recommend refactoring to use a common baseCollector that can be embedded by both OperatorCollector and Collector to hold the shared logic. This would centralize the common functionality, making it easier to manage.
| // Get cluster ID from kube-system namespace | ||
| var kubeSystemNS corev1.Namespace | ||
| if err := c.client.Get(ctx, types.NamespacedName{Name: "kube-system"}, &kubeSystemNS); err != nil { | ||
| logger.Info(fmt.Sprintf("Failed to get kube-system namespace: %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughout this file and internal/telemetry/collector.go, log messages are constructed using fmt.Sprintf. It is a best practice to use structured logging with key-value pairs for variable data. This makes logs more machine-readable and easier to query.
For example, this line could be improved.
| logger.Info(fmt.Sprintf("Failed to get kube-system namespace: %v", err)) | |
| logger.Info("Failed to get kube-system namespace", "error", err) |
dbefee7 to
e185bfe
Compare
|
/retest |
1 similar comment
|
/retest |
Split telemetry collection between cozystack-operator and cozystack-controller: cozystack-operator now collects cluster-level metrics: - cozy_cluster_info (cozystack_version, kubernetes_version) - cozy_nodes_count (os, kernel) - cozy_cluster_capacity (cpu, memory, nvidia.com/* resources) - cozy_loadbalancers_count - cozy_pvs_count (driver, size) - cozy_package_info (name, variant) cozystack-controller now collects application-level metrics: - cozy_application_count (kind) - counts HelmReleases per ApplicationDefinition Other changes: - Add pkg/version for build-time version injection via ldflags - Remove --cozystack-version flag (version now embedded at build time) - Remove bundle/oidc configuration from telemetry (replaced by package_info) - Remove cozy_workloads_count metric (replaced by cozy_application_count) Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
e185bfe to
987a74a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/telemetry/collector.go (1)
138-158: HTTP request lacks timeout, risking indefinite blocking.Using
http.DefaultClientwithout a timeout can cause the goroutine to hang indefinitely if the telemetry endpoint is unresponsive. This could affect the collector's ticker-based scheduling.Consider using
http.NewRequestWithContextwith a timeout context, or create a dedicatedhttp.Clientwith a configuredTimeout.🛠️ Suggested fix using context with timeout
// sendMetrics sends collected metrics to the configured endpoint -func (c *Collector) sendMetrics(clusterID, metrics string) error { - req, err := http.NewRequest("POST", c.config.Endpoint, bytes.NewBufferString(metrics)) +func (c *Collector) sendMetrics(ctx context.Context, clusterID, metrics string) error { + ctx, cancel := context.WithTimeout(ctx, 30*time.Second) + defer cancel() + + req, err := http.NewRequestWithContext(ctx, "POST", c.config.Endpoint, bytes.NewBufferString(metrics)) if err != nil { return fmt.Errorf("failed to create request: %w", err) }Then update the call site at line 132:
- if err := c.sendMetrics(clusterID, metrics.String()); err != nil { + if err := c.sendMetrics(ctx, clusterID, metrics.String()); err != nil {cmd/cozystack-controller/main.go (1)
246-251: Duplicate error logging on the same failure path.Lines 248 and 249 both log an error for the same failure case.
setupLog.Errorwill already log the error, and thensetupLog.V(1).Errorlogs it again at a different verbosity level. This appears to be unintentional.🛠️ Suggested fix
if collector != nil { if err := mgr.Add(collector); err != nil { - setupLog.Error(err, "unable to set up telemetry collector") setupLog.V(1).Error(err, "unable to set up telemetry collector, continuing without telemetry") } }Or use
setupLog.Infowith the error as a field if you want it at V(1) level:if collector != nil { if err := mgr.Add(collector); err != nil { - setupLog.Error(err, "unable to set up telemetry collector") - setupLog.V(1).Error(err, "unable to set up telemetry collector, continuing without telemetry") + setupLog.V(1).Info("unable to set up telemetry collector, continuing without telemetry", "error", err) } }
🤖 Fix all issues with AI agents
In `@internal/telemetry/collector.go`:
- Around line 22-27: Replace the duplicated ApplicationKindLabel constant with
the canonical constant from the API types package: remove the local
ApplicationKindLabel declaration in the telemetry collector and add an import
for the API package (the package that defines ApplicationKindLabel, e.g.,
appsv1alpha1) then reference appsv1alpha1.ApplicationKindLabel wherever the
local constant was used; keep the local UILabel constant as-is and update any
existing references to use the imported symbol to avoid divergence.
In `@internal/telemetry/operator_collector.go`:
- Around line 261-282: In sendMetrics, avoid using http.DefaultClient without a
timeout by creating a context with a deadline (e.g., via context.WithTimeout)
and use http.NewRequestWithContext to build the POST to c.config.Endpoint, then
perform the request with an http.Client that enforces the timeout (or rely on
the context cancel) instead of http.DefaultClient.Do; ensure you call the cancel
function and preserve existing headers (Content-Type and X-Cluster-ID) and error
handling in OperatorCollector.sendMetrics.
In `@packages/core/installer/Makefile`:
- Line 18: The Makefile currently defines targets like image, pre-checks,
image-operator, and image-packages as real targets which can be skipped if files
with those names exist; add a .PHONY declaration listing these targets (e.g.
.PHONY: image pre-checks image-operator image-packages) so Make always executes
them regardless of same-named files, placing the .PHONY line near the top of the
Makefile where other target metadata is defined.
In `@packages/system/cozystack-controller/Makefile`:
- Line 7: Makefile declares the target "image" (and also a "helper" target) but
doesn't mark them .PHONY, so if a file named "image" exists Make will treat the
target as up-to-date; add a .PHONY declaration for these targets by adding a
line like ".PHONY: image helper" near the top of the Makefile so Make always
runs the "image" and "helper" targets (reference the existing "image:
image-cozystack-controller" target to locate where to add the .PHONY).
♻️ Duplicate comments (2)
internal/telemetry/operator_collector.go (2)
23-30: Significant code duplication withCollectortype.The
OperatorCollectorduplicates substantial logic fromCollectorincollector.go:
Startmethod (lines 46-67) is nearly identicalsendMetricsmethod (lines 262-282) is identical- Struct fields (
ticker,stopCh,config) are duplicatedConsider extracting common functionality into a shared
baseCollectortype that both collectors can embed.
109-114: Use structured logging with key-value pairs.Throughout this file, log messages use
fmt.Sprintffor variable data. Structured logging with key-value pairs improves machine readability and query-ability.Example fix
- logger.Info(fmt.Sprintf("Failed to get kube-system namespace: %v", err)) + logger.Info("Failed to get kube-system namespace", "error", err)
🧹 Nitpick comments (1)
internal/telemetry/collector.go (1)
30-35:stopChfield is unused.The
stopChchannel is created (line 52) and closed (line 61) but is never read from in anyselectstatement. This appears to be dead code, possibly left over from a previous implementation.Consider removing
stopChfrom the struct and the relatedmake/closecalls if it's not needed.
| const ( | ||
| // ApplicationKindLabel is the label used to identify application kind on HelmReleases | ||
| ApplicationKindLabel = "apps.cozystack.io/application.kind" | ||
| // UILabel is the label used to identify UI-managed HelmReleases | ||
| UILabel = "cozystack.io/ui" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find all definitions of ApplicationKindLabel
rg -n "ApplicationKindLabel\s*=" --type=goRepository: cozystack/cozystack
Length of output: 355
🏁 Script executed:
# Check the collector.go file to see if it imports ApplicationKindLabel or defines it
head -30 internal/telemetry/collector.go | cat -nRepository: cozystack/cozystack
Length of output: 1000
🏁 Script executed:
# Check what's in rest.go around line 75
sed -n '70,80p' pkg/registry/apps/application/rest.go | cat -nRepository: cozystack/cozystack
Length of output: 472
🏁 Script executed:
# Search for where appsv1alpha1.ApplicationKindLabel is defined (could be in types)
rg -n "ApplicationKindLabel\s*=" --type=go pkg/apis/Repository: cozystack/cozystack
Length of output: 159
Centralize ApplicationKindLabel by importing from the API types package.
ApplicationKindLabel is defined in three places: pkg/apis/apps/v1alpha1/types.go (source), pkg/registry/apps/application/rest.go:76 (correctly imports from appsv1alpha1), and here in internal/telemetry/collector.go:24 (duplicates the string literal). Import from the source of truth instead to prevent inconsistencies if the constant is updated.
🤖 Prompt for AI Agents
In `@internal/telemetry/collector.go` around lines 22 - 27, Replace the duplicated
ApplicationKindLabel constant with the canonical constant from the API types
package: remove the local ApplicationKindLabel declaration in the telemetry
collector and add an import for the API package (the package that defines
ApplicationKindLabel, e.g., appsv1alpha1) then reference
appsv1alpha1.ApplicationKindLabel wherever the local constant was used; keep the
local UILabel constant as-is and update any existing references to use the
imported symbol to avoid divergence.
| // sendMetrics sends collected metrics to the configured endpoint | ||
| func (c *OperatorCollector) sendMetrics(clusterID, metrics string) error { | ||
| req, err := http.NewRequest("POST", c.config.Endpoint, bytes.NewBufferString(metrics)) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create request: %w", err) | ||
| } | ||
|
|
||
| req.Header.Set("Content-Type", "text/plain") | ||
| req.Header.Set("X-Cluster-ID", clusterID) | ||
|
|
||
| resp, err := http.DefaultClient.Do(req) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to send request: %w", err) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| return fmt.Errorf("unexpected status code: %d", resp.StatusCode) | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP request lacks timeout.
Same issue as in collector.go: using http.DefaultClient without a timeout can cause indefinite blocking. Apply the same fix using http.NewRequestWithContext with a timeout.
🤖 Prompt for AI Agents
In `@internal/telemetry/operator_collector.go` around lines 261 - 282, In
sendMetrics, avoid using http.DefaultClient without a timeout by creating a
context with a deadline (e.g., via context.WithTimeout) and use
http.NewRequestWithContext to build the POST to c.config.Endpoint, then perform
the request with an http.Client that enforces the timeout (or rely on the
context cancel) instead of http.DefaultClient.Do; ensure you call the cancel
function and preserve existing headers (Content-Type and X-Cluster-ID) and error
handling in OperatorCollector.sendMetrics.
| cozyhr show --namespace $(NAMESPACE) $(NAME) --plain | kubectl diff --filename - | ||
| cozyhr show --namespace $(NAMESPACE) $(NAME) --plain | kubectl diff -f - | ||
|
|
||
| image: pre-checks image-operator image-packages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare image (and related) targets as PHONY.
This avoids Make skipping the target if a file named image exists. Line 18.
🔧 Proposed fix
+.PHONY: pre-checks show apply diff image image-operator image-packages
image: pre-checks image-operator image-packages📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| image: pre-checks image-operator image-packages | |
| .PHONY: pre-checks show apply diff image image-operator image-packages | |
| image: pre-checks image-operator image-packages |
🧰 Tools
🪛 checkmake (0.2.2)
[warning] 18-18: Target "image" should be declared PHONY.
(phonydeclared)
🤖 Prompt for AI Agents
In `@packages/core/installer/Makefile` at line 18, The Makefile currently defines
targets like image, pre-checks, image-operator, and image-packages as real
targets which can be skipped if files with those names exist; add a .PHONY
declaration listing these targets (e.g. .PHONY: image pre-checks image-operator
image-packages) so Make always executes them regardless of same-named files,
placing the .PHONY line near the top of the Makefile where other target metadata
is defined.
| include ../../../hack/package.mk | ||
|
|
||
| image: image-cozystack-controller update-version | ||
| image: image-cozystack-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare image (and helper) targets as PHONY.
If a file named image exists, Make will skip this target. Add a .PHONY declaration to avoid that. Line 7.
🔧 Proposed fix
+.PHONY: image image-cozystack-controller
image: image-cozystack-controller🧰 Tools
🪛 checkmake (0.2.2)
[warning] 7-7: Target "image" should be declared PHONY.
(phonydeclared)
🤖 Prompt for AI Agents
In `@packages/system/cozystack-controller/Makefile` at line 7, Makefile declares
the target "image" (and also a "helper" target) but doesn't mark them .PHONY, so
if a file named "image" exists Make will treat the target as up-to-date; add a
.PHONY declaration for these targets by adding a line like ".PHONY: image
helper" near the top of the Makefile so Make always runs the "image" and
"helper" targets (reference the existing "image: image-cozystack-controller"
target to locate where to add the .PHONY).
|
/retest |
What this PR does
Splits telemetry collection between cozystack-operator and cozystack-controller for better separation of concerns.
Release note
Summary by CodeRabbit
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.