-
Notifications
You must be signed in to change notification settings - Fork 290
Support proxy-based Prometheus discovery for Holmes #1977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Support proxy-based Prometheus discovery for Holmes #1977
Conversation
Signed-off-by: Codex <[email protected]>
WalkthroughThis refactoring introduces a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/robusta/integrations/prometheus/utils.py (1)
178-188: Inconsistent return type annotation forfind_vm_url.
find_vm_urlreturnsOptional[str]but the parentfind_urlmethod now returnsOptional[DiscoveredServiceUrl]. This is a type mismatch that will cause issues when unpacking at line 66.🔎 Proposed fix
@classmethod - def find_vm_url(cls) -> Optional[str]: + def find_vm_url(cls) -> Optional[DiscoveredServiceUrl]: return super().find_url( selectors=[ "app.kubernetes.io/name=vmsingle",
🧹 Nitpick comments (1)
src/robusta/utils/service_discovery.py (1)
84-86: Headers fetched unconditionally even if proxy URL build fails.If
_should_use_proxy()returnsTruebut_build_proxy_url()returnsNone(e.g., no host configured),proxy_urlisNone, soheadersbecomesNonecorrectly. However, this means we silently fall back to cluster-local URL when proxy is expected but unavailable. Consider logging a warning in this case.🔎 Suggested improvement
cluster_local_url = f"{scheme}://{name}.{namespace}.svc.{CLUSTER_DOMAIN}:{port}" - proxy_url = _build_proxy_url(name, namespace, port, scheme) if _should_use_proxy() else None + use_proxy = _should_use_proxy() + proxy_url = _build_proxy_url(name, namespace, port, scheme) if use_proxy else None + if use_proxy and not proxy_url: + logging.warning("Proxy mode detected but failed to build proxy URL, falling back to cluster-local URL") headers = _get_kube_proxy_headers() if proxy_url else None final_url = proxy_url or cluster_local_url
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/robusta/integrations/prometheus/utils.pysrc/robusta/utils/service_discovery.py
🧰 Additional context used
🪛 Ruff (0.14.10)
src/robusta/utils/service_discovery.py
45-45: Consider moving this statement to an else block
(TRY300)
46-46: Do not catch blind exception: Exception
(BLE001)
54-54: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run_tests
- GitHub Check: run_tests
🔇 Additional comments (11)
src/robusta/utils/service_discovery.py (5)
13-19: LGTM! Clean dataclass for encapsulating discovery results.The
DiscoveredServiceUrldataclass nicely encapsulates the URL and optional headers. The__str__method returning the URL is a good choice for backward compatibility when the object is used in string contexts.
22-23: Consider a more descriptive variable name or documentation.The logic correctly detects out-of-cluster execution by checking for the absence of
KUBERNETES_SERVICE_HOST. This is a standard approach since that env var is automatically set inside pods.
26-33: Scheme derivation logic looks correct.The function properly checks port name,
app_protocol, and port number (443) to determine HTTPS. Usinggetattrforapp_protocolhandles cases where the attribute might not exist on older API versions.
36-48: Broad exception handling is acceptable here for resilience.The static analyzer flags the bare
Exceptioncatch (BLE001), but in this context it's appropriate—we want to gracefully fall back toNoneif any unexpected error occurs during header extraction rather than crash the discovery flow. The debug logging ensures visibility.
88-92: Good observability with logging.The info-level log clearly indicates the discovered URL and whether a proxy was used, which aids debugging.
src/robusta/integrations/prometheus/utils.py (6)
36-41: Clean helper for unpacking discovery results.This helper nicely handles the
Nonecase and extracts the URL and headers tuple.
44-52: Header merging logic is correct.The function properly handles all combinations of
Noneinputs and returnsNonewhen no headers exist. Note thatadditional_headerstakes precedence overdiscovered_headerswhen keys conflict, which is the expected behavior for user-provided overrides.
57-67: Logic correctly integrates discovered headers.The flow properly unpacks discovery results and preserves headers from whichever discovery method succeeds.
79-81: Good: User-provided headers override discovered headers.The merge order ensures user-provided
prometheus_additional_headerscan override any discovered proxy headers if needed.
139-155: Cache now storesDiscoveredServiceUrlobjects.The caching correctly stores and returns the full
DiscoveredServiceUrlobject, preserving headers across cached lookups.
214-216: Wrapping user-provided URL correctly.When
holmes_urlis provided directly, it's wrapped inDiscoveredServiceUrlwithout headers, which is correct since no proxy is needed for explicitly configured URLs.
| def _build_proxy_url(name: str, namespace: str, port: int, scheme: str) -> Optional[str]: | ||
| try: | ||
| configuration = client.Configuration.get_default_copy() | ||
| except Exception as e: | ||
| logging.debug(f"Unable to load Kubernetes configuration for proxy url: {e}") | ||
| return None | ||
|
|
||
| host = (configuration.host or "").rstrip("/") | ||
| if not host: | ||
| return None | ||
|
|
||
| path = f"/api/v1/namespaces/{namespace}/services/{scheme}:{name}:{port}/proxy" | ||
| return f"{host}{path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Kubernetes API proxy URL format services /api/v1/namespaces services proxy
💡 Result:
Kubernetes API server proxy URL patterns (service-level)
-
Base form:
/api/v1/namespaces/{namespace}/services/{service-ident}/proxy{/path...}.[1][2] -
{service-ident} supported forms:
- {service-name} → proxy to service’s default/unnamed port over http.
- {service-name}:{port} → proxy to the named numeric port (or port-name) over http.
- https:{service-name}: → proxy to service’s default/unnamed port over HTTPS (note the leading "https:" and trailing colon).
- https:{service-name}:{port} → proxy to the specified port over HTTPS.
Examples: - /api/v1/namespaces/default/services/web-frontend/proxy/
- /api/v1/namespaces/default/services/web-frontend:80/proxy/
- /api/v1/namespaces/kube-system/services/https:kube-ui:/proxy/ (HTTPS to default port). [2][4]
-
You can append additional path/query after /proxy, e.g. /proxy/_search?q=foo or /proxy/api/v1/items. [1][3]
Notes:
- The API also exposes verb-specific endpoints (GET/POST/HEAD/PUT) at the same path. [1]
- When using kubectl proxy, call the same path against the local proxy (e.g. http://localhost:8001/api/v1/...). [3]
Sources: Kubernetes API reference and docs; examples/discussion. [1] Kubernetes API reference — Proxy operations. [2] kubectl cluster-info / proxy examples and formats. [3] kubectl proxy docs. [4] community examples/Q&A showing https: prefix.
Fix proxy path format for HTTP services.
The proxy URL format is incorrect for HTTP services. Per Kubernetes API documentation, the service identifier should be:
- HTTP with port:
{name}:{port}(no scheme prefix) - HTTPS with port:
https:{name}:{port}(with scheme prefix)
The current code uses {scheme}:{name}:{port} for all schemes, which generates invalid URLs for HTTP (e.g., http:my-service:8080 instead of my-service:8080). Conditional logic is needed: omit the scheme prefix for HTTP, include it only for HTTPS.
🧰 Tools
🪛 Ruff (0.14.10)
54-54: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In src/robusta/utils/service_discovery.py around lines 51 to 63, the proxy path
uses "{scheme}:{name}:{port}" for all services which produces invalid proxy
paths for HTTP; change the path construction so that for scheme == "https" the
service identifier is "https:{name}:{port}" but for "http" (and empty/default)
it is just "{name}:{port}". Keep the rest of the function intact (load
configuration, validate host) and return f"{host}{path}" as before.
| port_obj = svc.spec.ports[0] | ||
| port = port_obj.port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential IndexError if service has no ports.
If svc.spec.ports is empty or None, accessing svc.spec.ports[0] will raise an IndexError. Consider adding a guard.
🔎 Proposed fix
name = svc.metadata.name
namespace = svc.metadata.namespace
+ if not svc.spec.ports:
+ logging.warning(f"Service {name} in {namespace} has no ports defined")
+ return None
port_obj = svc.spec.ports[0]
port = port_obj.portCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/robusta/utils/service_discovery.py around lines 79-80, the code
unconditionally accesses svc.spec.ports[0] which will raise an IndexError if
svc.spec.ports is None or empty; add a guard that checks if svc.spec and
svc.spec.ports are truthy and that len(svc.spec.ports) > 0 before indexing, and
handle the empty case by logging/raising a clear error or skipping this service
(returning None or continuing) so the function fails gracefully instead of
throwing an IndexError.
Summary
Testing
Codex Task