-
Notifications
You must be signed in to change notification settings - Fork 27
allow configuration of the metrics server opts #240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alex Price <[email protected]>
📝 WalkthroughWalkthroughAdds a variadic Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Serve() initializer
participant ServeOptions as ServeOptions
participant Metrics as NewServerMetrics
participant Server as gRPC Server
Note over Caller,ServeOptions: Serve is called with options
Caller->>ServeOptions: build options (may include WithMetricsServerOpts)
alt Metrics enabled
ServeOptions->>Metrics: call NewServerMetrics(registry, addr, MetricsServerOpts...)
Metrics-->>ServeOptions: metrics server instance
ServeOptions->>Server: register metrics interceptors
else Metrics disabled
Caller->>Server: start without metrics
end
Server-->>Caller: server running
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Mixed change: small API addition plus new test covering runtime behavior; review requires verifying API surface, call-site forwarding, and test correctness. Possibly related PRs
Suggested reviewers
Thanks — would you like an additional diagram showing the test flow (mock server → gRPC call → metrics scrape) as well? Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sdk_test.go (1)
362-449: Two default-registry servers in one process can panic; also fixed port 8080 risks conflicts.This test starts another server using the default/global registry and MustRegister, after the previous default-registry test. That combination typically triggers a duplicate registration panic. It also binds 8080 again without shutdown from the prior test.
Isolate this test by using a custom registry and a free port:
func TestMetricsServer_WithCustomMetricsServerOpts(t *testing.T) { @@ - // Should use default metrics port 8080 - metricsPort := 8080 + metricsPort := getAvailablePort(t) @@ go func() { err := Serve(mockServer, Listen("tcp", fmt.Sprintf(":%d", grpcPort)), Insecure(true), + WithMetricsServer(fmt.Sprintf(":%d", metricsPort)), + WithMetricsRegistry(prometheus.NewRegistry()), WithMetricsServerOpts( grpcprometheus.WithServerHandlingTimeHistogram(), ), ) serverDone <- err }() @@ - // Wait for server to start - time.Sleep(3 * time.Second) + // Poll for readiness instead of sleeping. + deadline := time.Now().Add(5 * time.Second) + for { + if time.Now().After(deadline) { + t.Fatalf("metrics endpoint did not become ready at :%d", metricsPort) + } + req, _ := http.NewRequestWithContext(context.Background(), http.MethodGet, + fmt.Sprintf("http://localhost:%d/metrics", metricsPort), nil) + if resp, err := http.DefaultClient.Do(req); err == nil { + resp.Body.Close() + break + } + time.Sleep(100 * time.Millisecond) + } @@ - if !strings.Contains(metricsContent, "grpc_server_handling_seconds_bucket") { + if !strings.Contains(metricsContent, "grpc_server_handling_seconds_bucket") { t.Error("Expected grpc_server_handling_seconds_bucket metric to be present") }Optional polish:
- Rename to PascalCase (no underscores) per repo test style: TestMetricsServerWithCustomMetricsServerOpts.
- Consider asserting resp.StatusCode == http.StatusOK and removing the unused serverDone channel.
Would you prefer we collapse both “default registry” scenarios into a single test to avoid global state reuse altogether?
🧹 Nitpick comments (3)
sdk.go (3)
66-67: Good addition; consider cumulative semantics for options.Exposing MetricsServerOpts is useful and non-breaking. One tweak: allow multiple WithMetricsServerOpts calls to accumulate rather than replace, matching common options patterns.
Apply in WithMetricsServerOpts:
- o.MetricsServerOpts = opts + o.MetricsServerOpts = append(o.MetricsServerOpts, opts...)Do you expect callers to invoke this option more than once (e.g., from separate wiring layers)?
177-185: WithMetricsServerOpts overwrites; prefer append for composability.Current implementation replaces any previously-set options, which can surprise users composing options.
Apply:
func WithMetricsServerOpts(opts ...grpcprometheus.ServerMetricsOption) ServeOption { return func(o *ServeOptions) error { - o.MetricsServerOpts = opts + o.MetricsServerOpts = append(o.MetricsServerOpts, opts...) return nil } }
191-197: Minor: zero value is fine; drop the explicit make.Variadic spread handles a nil slice. Keeping it nil reduces allocations and matches other fields’ zero-value defaults.
so := &ServeOptions{ Network: DefaultNetwork, Address: DefaultAddress, MaxRecvMsgSize: DefaultMaxRecvMsgSize, MetricsAddress: DefaultMetricsAddress, MetricsRegistry: prometheus.DefaultRegisterer.(*prometheus.Registry), // Use default registry - MetricsServerOpts: make([]grpcprometheus.ServerMetricsOption, 0), }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
sdk.go(3 hunks)sdk_test.go(3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/!(*_test).go
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
Breaking Changes: Do not remove, rename, or change the signature/behavior of any exported Go functions, types, methods, or fields without adding the 'breaking-change' label; fail the PR otherwise (library repo: all exported APIs are public).
Files:
sdk.go
**/*.go
⚙️ CodeRabbit configuration file
**/*.go: Enforce Crossplane-specific patterns: Use function-sdk-go/errors
for wrapping. Check variable naming (short for local scope, descriptive
for wider scope). Ensure 'return early' pattern. Verify error scoping
(declare in conditionals when possible). For nolint directives, require
specific linter names and explanations. CRITICAL: Ensure all error
messages are meaningful to end users, not just developers - avoid
technical jargon, include context about what the user was trying to do,
and suggest next steps when possible.
Files:
sdk.gosdk_test.go
**/*_test.go
⚙️ CodeRabbit configuration file
**/*_test.go: Enforce table-driven test structure: PascalCase test names (no
underscores), args/want pattern, use cmp.Diff with
cmpopts.EquateErrors() for error testing. Check for proper test case
naming and reason fields. Ensure no third-party test frameworks (no
Ginkgo, Gomega, Testify).
Files:
sdk_test.go
🧬 Code graph analysis (1)
sdk_test.go (2)
proto/v1/run_function_grpc.pb.go (1)
NewFunctionRunnerServiceClient(61-63)sdk.go (4)
Serve(188-268)Listen(74-80)Insecure(127-134)WithMetricsServerOpts(179-184)
🔍 Remote MCP
I encountered authorization issues with the Context7 tools. Let me search for relevant information about the libraries and patterns involved in this PR using web search instead.## Additional Context for PR Review
Prometheus Metrics Configuration in gRPC
NewServerMetrics accepts ServerMetricsOption parameters to configure metrics behavior. Key configuration options relevant to this PR include:
- WithServerHandlingTimeHistogram enables recording of RPC handling time, but histogram metrics can be very expensive for Prometheus to retain and query
- WithServerCounterOptions sets counter options
Metrics Collected by the Server
The standard gRPC metrics that the PR's tests validate include:
- grpc_server_started_total: Total number of RPCs started on the server
- grpc_server_handled_total: Total number of RPCs completed on the server, regardless of success or failure
- grpc_server_handling_seconds: Histogram of response latency (seconds) of gRPC that had been application-level handled by the server (when histogram option is enabled)
Design Pattern Validation
The PR follows the standard Go pattern for metrics configuration. NewServerMetrics accepts variadic ServerMetricsOption parameters like WithServerHandlingTimeHistogram with histogram bucket customization options, which is exactly what the PR implements through the new WithMetricsServerOpts function.
Test Coverage Validation
The PR's test cases appropriately verify:
- Default metrics endpoint exposure in Prometheus format
- Custom port and registry configurations
- Optional histogram metrics when enabled via ServerMetricsOption
- The presence of expected metrics like
grpc_server_started_totalandgrpc_server_handling_seconds_bucket
[::web_search::]
🔇 Additional comments (2)
sdk.go (1)
228-236: LGTM on wiring the server metrics options.Passing so.MetricsServerOpts... into grpcprometheus.NewServerMetrics and placing the metrics interceptor first is correct (captures full latency including subsequent interceptors).
One caveat: using MustRegister on the default/global registry will panic on duplicate registrations if multiple servers start in the same process. Tests later start multiple servers; please confirm they always use separate registries in that case or run sequentially with a single default-registry server.
sdk_test.go (1)
31-31: Import looks good.Brings in grpcprometheus only where needed for histogram option in tests.
| t.Run("MetricsServerTest On DefaultPort With DefaultRegistry", func(t *testing.T) { | ||
| // Test gRPC connection | ||
| conn, err := grpc.NewClient(fmt.Sprintf("localhost:%d", grpcPort), | ||
| grpc.WithTransportCredentials(insecure.NewCredentials())) | ||
| if err != nil { | ||
| t.Fatalf("Failed to connect: %v", err) | ||
| } | ||
| defer conn.Close() | ||
|
|
||
| client := v1.NewFunctionRunnerServiceClient(conn) | ||
|
|
||
| // Make the request | ||
| req := &v1.RunFunctionRequest{ | ||
| Meta: &v1.RequestMeta{Tag: "default-metrics-test"}, | ||
| } | ||
|
|
||
| _, err = client.RunFunction(context.Background(), req) | ||
| if err != nil { | ||
| t.Errorf("Request failed: %v", err) | ||
| } | ||
|
|
||
| // Wait for metrics to be collected | ||
| time.Sleep(2 * time.Second) | ||
|
|
||
| // Verify metrics endpoint is accessible | ||
| metricsURL := fmt.Sprintf("http://localhost:%d/metrics", metricsPort) | ||
| httpReq, err := http.NewRequestWithContext(context.Background(), http.MethodGet, metricsURL, nil) | ||
| if err != nil { | ||
| t.Fatalf("Failed to create request: %v", err) | ||
| } | ||
| resp, err := http.DefaultClient.Do(httpReq) | ||
| if err != nil { | ||
| t.Fatalf("Failed to get metrics: %v", err) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| body, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| t.Fatalf("Failed to read metrics: %v", err) | ||
| } | ||
|
|
||
| metricsContent := string(body) | ||
|
|
||
| // Verify metrics are present | ||
| if !strings.Contains(metricsContent, "# HELP") { | ||
| t.Error("Expected Prometheus format") | ||
| } | ||
|
|
||
| // Verify gRPC metrics are present | ||
| if !strings.Contains(metricsContent, "grpc_server_started_total") { | ||
| t.Error("Expected grpc_server_started_total metric to be present") | ||
| } | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default port test is flaky (port 8080 contention) and relies on sleeps.
Two concerns:
- Port 8080 may be in use on CI/dev machines; the test will intermittently fail.
- Fixed sleeps slow tests and are brittle vs. startup latency.
Recommend either (a) preflight-check 8080 and skip if busy, plus readiness polling, or (b) use a random free port via WithMetricsServer while keeping a single default-port test elsewhere.
Minimal changes to improve stability:
@@
- // Should use default metrics port 8080
- metricsPort := 8080
+ // Default metrics port is 8080; skip if unavailable to avoid flakiness.
+ metricsPort := 8080
+ if ln, err := net.Listen("tcp", fmt.Sprintf(":%d", metricsPort)); err != nil {
+ t.Skipf("port %d not available, skipping default-port test: %v", metricsPort, err)
+ } else {
+ ln.Close()
+ }
@@
- // Wait for server to start
- time.Sleep(3 * time.Second)
+ // Wait for metrics endpoint to become ready (up to 5s).
+ deadline := time.Now().Add(5 * time.Second)
+ for {
+ if time.Now().After(deadline) {
+ t.Fatalf("metrics endpoint did not become ready at :%d", metricsPort)
+ }
+ req, _ := http.NewRequestWithContext(context.Background(), http.MethodGet,
+ fmt.Sprintf("http://localhost:%d/metrics", metricsPort), nil)
+ if resp, err := http.DefaultClient.Do(req); err == nil {
+ resp.Body.Close()
+ break
+ }
+ time.Sleep(100 * time.Millisecond)
+ }
@@
- resp, err := http.DefaultClient.Do(httpReq)
+ resp, err := http.DefaultClient.Do(httpReq)
if err != nil {
t.Fatalf("Failed to get metrics: %v", err)
}
defer resp.Body.Close()
+ if resp.StatusCode != http.StatusOK {
+ t.Fatalf("unexpected status code: %d", resp.StatusCode)
+ }Nit: s/DefaultRegisrty/DefaultRegistry in the subtest name.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In sdk_test.go around lines 307 to 360, the subtest "MetricsServerTest On
DefaultPort With DefaultRegistry" is flaky because it assumes port 8080 and uses
fixed sleeps; change it to avoid port contention and brittle sleeps by (1)
obtaining a free ephemeral port for the metrics server (or preflight-check port
8080 and skip the test if in use) and wire that port into the test via the
server creation helper (e.g., WithMetricsServer(port)), (2) replace time.Sleep
with a readiness polling loop that repeatedly GETs the /metrics endpoint with a
short timeout until it returns 200 or a deadline is reached, and (3) fix the
subtest name typo to "DefaultRegistry". Ensure the test defers closing
connections and uses context timeouts for RPC/HTTP calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern exists before this PR, so I wouldn't force us to change it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbw976, understood! You're right that this is pre-existing behavior, and it makes sense not to expand the scope of this PR to address it.
If you'd like, I can open a follow-up issue to track improving the test stability (handling port contention and replacing fixed sleeps with readiness polling) for future work. Just let me know!
🧠 Learnings used
Learnt from: CR
Repo: crossplane/function-sdk-go PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-10-19T23:33:14.658Z
Learning: Applies to **/!(*_test).go : Breaking Changes: Do not remove, rename, or change the signature/behavior of any exported Go functions, types, methods, or fields without adding the 'breaking-change' label; fail the PR otherwise (library repo: all exported APIs are public).
Signed-off-by: Alex Price <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
sdk_test.go (1)
362-450: Consider improving test reliability and reducing duplication.The test correctly validates that custom metrics server options produce the expected histogram metrics. However, it shares the same reliability concerns flagged in the previous review comment for the other metrics tests:
- Fixed sleeps: Lines 390 and 414 use
time.Sleepfor synchronization, making tests slow and potentially flaky if the server takes longer to start.- Unused channel:
serverDone(line 375) is never read, which could cause a goroutine leak if the server encounters an error.- No cleanup: The server continues running after the test completes, which could cause port conflicts in subsequent tests.
- Code duplication: This test duplicates significant portions of the other two metrics tests.
Would you consider extracting a helper function for the common metrics test setup/verification logic? That would make these tests more maintainable and easier to enhance with proper readiness polling and cleanup.
Example approach for readiness polling:
// Helper to wait for metrics endpoint to be ready func waitForMetricsEndpoint(t *testing.T, port int, timeout time.Duration) { t.Helper() deadline := time.Now().Add(timeout) for time.Now().Before(deadline) { resp, err := http.Get(fmt.Sprintf("http://localhost:%d/metrics", port)) if err == nil { resp.Body.Close() return } time.Sleep(100 * time.Millisecond) } t.Fatalf("metrics endpoint at :%d did not become ready within %v", port, timeout) }And for the subtest name on line 392, consider matching the pattern used in the other tests:
"MetricsServerTest With CustomMetricsServerOpts"(capital W, no "with" lowercase).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
sdk.go(3 hunks)sdk_test.go(3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/!(*_test).go
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
Breaking Changes: Do not remove, rename, or change the signature/behavior of any exported Go functions, types, methods, or fields without adding the 'breaking-change' label; fail the PR otherwise (library repo: all exported APIs are public).
Files:
sdk.go
**/*.go
⚙️ CodeRabbit configuration file
**/*.go: Enforce Crossplane-specific patterns: Use function-sdk-go/errors
for wrapping. Check variable naming (short for local scope, descriptive
for wider scope). Ensure 'return early' pattern. Verify error scoping
(declare in conditionals when possible). For nolint directives, require
specific linter names and explanations. CRITICAL: Ensure all error
messages are meaningful to end users, not just developers - avoid
technical jargon, include context about what the user was trying to do,
and suggest next steps when possible.
Files:
sdk.gosdk_test.go
**/*_test.go
⚙️ CodeRabbit configuration file
**/*_test.go: Enforce table-driven test structure: PascalCase test names (no
underscores), args/want pattern, use cmp.Diff with
cmpopts.EquateErrors() for error testing. Check for proper test case
naming and reason fields. Ensure no third-party test frameworks (no
Ginkgo, Gomega, Testify).
Files:
sdk_test.go
🧬 Code graph analysis (1)
sdk_test.go (1)
sdk.go (6)
Serve(188-267)Listen(74-80)Insecure(127-134)WithMetricsServer(161-166)WithMetricsRegistry(170-175)WithMetricsServerOpts(179-184)
🔇 Additional comments (4)
sdk.go (3)
66-66: LGTM! Clean addition of metrics server options.The new
MetricsServerOptsfield is a non-breaking addition that enables custom configuration of the metrics server. The type is appropriate and follows Go conventions.
177-184: LGTM! Idiomatic ServeOption constructor.The
WithMetricsServerOptsfunction follows the established pattern of otherServeOptionconstructors perfectly. Usingappendallows multiple calls to accumulate options, which is a nice touch. The documentation clearly states the prerequisite that metrics collection requires a non-emptyMetricsAddress.
227-227: LGTM! Options correctly forwarded.The custom metrics server options are properly spread into the
NewServerMetricscall, enabling the feature as intended.sdk_test.go (1)
307-307: LGTM! Typo fixed.Good catch on the "DefaultRegisrty" → "DefaultRegistry" typo.
jbw976
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable extension to the recent added metrics support. Thanks for this @awprice!
| } | ||
|
|
||
| // TestMetricsServer_WithCustomMetricsServerOpts verifies that metrics server uses custom metrics server opts. | ||
| func TestMetricsServer_WithCustomMetricsServerOpts(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll probably want to refactor these tests at some point to reduce the duplication, but it's fine for now 🤓
| t.Run("MetricsServerTest On DefaultPort With DefaultRegistry", func(t *testing.T) { | ||
| // Test gRPC connection | ||
| conn, err := grpc.NewClient(fmt.Sprintf("localhost:%d", grpcPort), | ||
| grpc.WithTransportCredentials(insecure.NewCredentials())) | ||
| if err != nil { | ||
| t.Fatalf("Failed to connect: %v", err) | ||
| } | ||
| defer conn.Close() | ||
|
|
||
| client := v1.NewFunctionRunnerServiceClient(conn) | ||
|
|
||
| // Make the request | ||
| req := &v1.RunFunctionRequest{ | ||
| Meta: &v1.RequestMeta{Tag: "default-metrics-test"}, | ||
| } | ||
|
|
||
| _, err = client.RunFunction(context.Background(), req) | ||
| if err != nil { | ||
| t.Errorf("Request failed: %v", err) | ||
| } | ||
|
|
||
| // Wait for metrics to be collected | ||
| time.Sleep(2 * time.Second) | ||
|
|
||
| // Verify metrics endpoint is accessible | ||
| metricsURL := fmt.Sprintf("http://localhost:%d/metrics", metricsPort) | ||
| httpReq, err := http.NewRequestWithContext(context.Background(), http.MethodGet, metricsURL, nil) | ||
| if err != nil { | ||
| t.Fatalf("Failed to create request: %v", err) | ||
| } | ||
| resp, err := http.DefaultClient.Do(httpReq) | ||
| if err != nil { | ||
| t.Fatalf("Failed to get metrics: %v", err) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| body, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| t.Fatalf("Failed to read metrics: %v", err) | ||
| } | ||
|
|
||
| metricsContent := string(body) | ||
|
|
||
| // Verify metrics are present | ||
| if !strings.Contains(metricsContent, "# HELP") { | ||
| t.Error("Expected Prometheus format") | ||
| } | ||
|
|
||
| // Verify gRPC metrics are present | ||
| if !strings.Contains(metricsContent, "grpc_server_started_total") { | ||
| t.Error("Expected grpc_server_started_total metric to be present") | ||
| } | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern exists before this PR, so I wouldn't force us to change it here
Description of your changes
Allows configuration of the metrics server options by adding
WithMetricsServerOpts, which allows passing of custom options.An example use case for this is to enable histogram metrics:
I have:
make reviewableto ensure this PR is ready for review.How has this code been tested
Unit tests, using the sdk