Skip to content

Comments

U-6624: Kubernetes service discovery#7

Merged
curusarn merged 11 commits intomainfrom
sl/kubernetes_service_discovery
Jul 29, 2025
Merged

U-6624: Kubernetes service discovery#7
curusarn merged 11 commits intomainfrom
sl/kubernetes_service_discovery

Conversation

@curusarn
Copy link
Contributor

@curusarn curusarn commented Jul 25, 2025

  • Run kubernetes service discovery if kubernetes_discovery_* wildcard is used in vector config
  • Validate generated kubernetes discovery vector configs with minimal main config -> reject generated config if validation doesn't pass to prevent kubernetes discovery breking the entire config
  • Validate upstream vector config after download, use minimal kubernetes discovery config -> keep downloaded version if validation fails, update symlink to latest valid version if validation passes (send any validation errors to Better Stack)
  • Validate final upstream vector config + actual generated -> promote valid config + SIGHUP vector, on validation fail send errors to Better Stack

curusarn and others added 10 commits July 25, 2025 20:23
- Extract workload type and name from pod ownerReferences
- Add as label (e.g. workload: "daemonset/cilium")
- Works for both service-based and direct pod discovery
- Helps identify the source workload type in metrics

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add get_workload_from_pod helper method
- For ReplicaSets, follow chain to find parent Deployment
- Shows "deployment/name" instead of "replicaset/name" in workload label
- Other workload types (DaemonSet, StatefulSet, Job) remain unchanged

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add VectorConfigTest for all vector configuration methods
- Add KubernetesDiscoveryTest for discovery functionality
- Add BetterStackClientPingTest for ping workflow
- Test workload ownership chain resolution
- Fix broken tests after refactoring
- Remove obsolete test methods

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- KubernetesDiscoveryIntegrationTest: Complex workload types, node filtering, deduplication
- VectorConfigEdgeCasesTest: Malicious commands, symlinks, race conditions
- BetterStackClientErrorHandlingTest: Network errors, partial failures, security
- UtilsEdgeCasesTest: Invalid inputs, binary content, unicode handling

Tests cover error scenarios, security concerns, and concurrent operations.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@paweljw paweljw self-requested a review July 28, 2025 08:42
@curusarn
Copy link
Contributor Author

curusarn commented Jul 28, 2025

For context:

We have upstream vector config coming from Better Stack.
This PR adds a way to additionally give vector extra configs for discovered services (pods) in the cluster based on standard Prometheus labels.
We'll reference the sources via kubernetes_discovery_* glob in the main vector config.

Problems this PR addresses that came up during development:
Vector will error for configs that contain globs that don't match anything - e.g. no_sources_with_this_prefix_* will cause vector to fail.

  • This is problematic as we can no longer download upstream config and validate it separately. -> We validate with a minimal dummy service discovery config.
  • Another problem is if there are no services discovered. -> Again, we solve this with minimal empty service discovery configs.

Under no circumstances the vector should crash because of broken upstream config, broken dynamic discovery configs, or some odd combination of both.
We triple validate the vector config:

  • Upstream config is only used if it passes validation with minimal service discovery config.
  • Dynamic configs are deleted if they don't pass validation with minimal base vector config.
  • Both separately validated configs are validated together and only used if they pass validation.

Any errors in Upstream config or the final combination of configs is sent to Better Stack UI via "ping".
Errors in dynamic configs are logged but not surfaced to the user.

Kubernetes discovery:

Copy link
Member

@paweljw paweljw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me 🙏

@curusarn curusarn merged commit 30f6f4f into main Jul 29, 2025
2 checks passed
@curusarn curusarn deleted the sl/kubernetes_service_discovery branch July 29, 2025 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants