Skip to content

Conversation

@skeeey
Copy link
Member

@skeeey skeeey commented Nov 12, 2025

Summary

introduce grpc transport v2 implementation, it decouple the ce-sdk protocol and implement the subscribe request, so that we know the subscribe action is success or not

Related issue(s)

Fixes #

Summary by CodeRabbit

  • New Features

    • Transports expose a Subscribe(ctx) API; added v2 gRPC transport/options and a subscription-ID header/constant.
  • Bug Fixes

    • Improved reconnect/resubscribe flow: context-aware lifecycle, non-blocking reconnection signaling, safer channel cleanup, and more robust receiver restart logic.
  • API Changes

    • Receive handlers now accept a context parameter.
  • Refactor

    • Broker subscription flow moved to per-subscription identifiers with updated registration, logging, and metrics.
  • Tests

    • Expanded unit and integration coverage for gRPC v2, heartbeats, reconnections, subscriptions, and lifecycle.

@openshift-ci openshift-ci bot requested review from deads2k and qiujian16 November 12, 2025 09:10
@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

Walkthrough

Adds context-aware Subscribe/Receive APIs to CloudEvents transports, refactors base client reconnection and non-blocking reconnected signaling, implements a v2 gRPC transport with constructors and tests, and moves broker subscription handling to subID-based headers; tests and builders updated to use v2 constructors.

Changes

Cohort / File(s) Summary
Base client reconnection/subscription
pkg/cloudevents/generic/clients/baseclient.go
Use caller ctx for receiver lifecycle; call transport.Subscribe(ctx) before Receive; add subscribed resubscribe flow and restart logic; make sendReconnectedSignal(ctx) non-blocking; add closeChannels() and getReceiverChan() with locking; adjust channel closing on context cancellation.
CloudEvent transport API & handler
pkg/cloudevents/generic/options/options.go
Change ReceiveHandlerFn from func(cloudevents.Event) to func(context.Context, cloudevents.Event) and add Subscribe(ctx context.Context) error to CloudEventTransport interface.
v2 gRPC constructors
pkg/cloudevents/generic/options/v2/grpc/options.go
Add NewAgentOptions and NewSourceOptions building CloudEvents options backed by the v2 gRPC transport and SubscriptionRequest closures.
v2 gRPC transport & tests
pkg/cloudevents/generic/options/v2/grpc/transport.go, pkg/cloudevents/generic/options/v2/grpc/transport_test.go
New unexported grpcTransport implementation: Connect/Send/Subscribe/Receive/ErrorChan/Close, connection-state monitoring, heartbeat handling, subscription header handling; extensive in-memory tests for send/subscribe/receive/heartbeat/errors/lifecycle.
v1 transports: Subscribe stubs & fake changes
pkg/cloudevents/generic/options/grpc/*, pkg/cloudevents/generic/options/mqtt/*, pkg/cloudevents/generic/options/fake/fakeoptions.go
Add no-op Subscribe(ctx context.Context) error implementations; update fake EventChan.Receive to call handler as fn(ctx, e).
Builder updates & tests
pkg/cloudevents/generic/options/builder/optionsbuilder.go, pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
Switch builder calls to v2 constructors (grpcv2.NewSourceOptions / grpcv2.NewAgentOptions); tests updated to assert transport type via reflection and include expected transport type.
Broker: subID-based subscription & handler signature
pkg/cloudevents/server/grpc/broker.go, pkg/cloudevents/server/grpc/broker_test.go
GRPCBroker.register signature changed to accept subReq/subServer and return (string, error); resourceHandler now receives subID (func(ctx, subID, *Event) error); per-subscriber registry/dispatch uses subID; tests updated to call Subscribe(ctx) and adapt Receive handler signature.
Constants & server header emission
pkg/cloudevents/constants/constants.go, test/integration/cloudevents/server/grpc.go
Add exported GRPCSubscriptionIDKey = "subscription-id"; server Subscribe now sends gRPC header with that key and fails fast if header send errors.
Integration/test wiring
test/integration/cloudevents/certrotation_grpc_test.go, test/integration/cloudevents/cloudevetns_grpc_v2_test.go
Tests switched to v2 constructors; added helper exposing v2 source options; integration tests updated to use new APIs.
Misc tests updated for handler signature
pkg/cloudevents/generic/clients/*_test.go, pkg/cloudevents/server/grpc/broker_test.go, pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
Update Receive handler call sites to accept context.Context as first parameter and adjust test flows accordingly.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Files needing extra attention:
    • pkg/cloudevents/generic/options/v2/grpc/transport.go — concurrency, stream lifecycle, connection-state monitoring, heartbeat/health checks.
    • pkg/cloudevents/generic/clients/baseclient.go — context propagation, resubscribe logic, non-blocking signaling, channel lifecycle.
    • pkg/cloudevents/server/grpc/broker.go — signature changes, per-subscription routing by subID, header emission and error handling.
    • pkg/cloudevents/generic/options/v2/grpc/transport_test.go — timing/race conditions and test determinism.

Possibly related PRs

Suggested labels

lgtm

Suggested reviewers

  • qiujian16
  • deads2k

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.26% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title ':seedling: grpc transport v2' is specific and directly related to the main change—implementing a gRPC transport v2.
Description check ✅ Passed The description includes a Summary section explaining the main objective (grpc transport v2 implementation with decouple protocol and subscribe request), but lacks detail on implementation scope and missing a Related issue number.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd704c3 and c4f58d5.

📒 Files selected for processing (20)
  • pkg/cloudevents/constants/constants.go (1 hunks)
  • pkg/cloudevents/generic/clients/agentclient_test.go (3 hunks)
  • pkg/cloudevents/generic/clients/baseclient.go (7 hunks)
  • pkg/cloudevents/generic/clients/sourceclient_test.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (2 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (13 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (3 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
  • test/integration/cloudevents/server/grpc.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • test/integration/cloudevents/certrotation_grpc_test.go
  • pkg/cloudevents/constants/constants.go
  • pkg/cloudevents/generic/clients/sourceclient_test.go
  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • test/integration/cloudevents/server/grpc.go
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/fake/fakeoptions.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/server/grpc/broker.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/clients/agentclient_test.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-08-27T03:21:36.789Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/server/grpc/metrics/metrics.go:230-236
Timestamp: 2025-08-27T03:21:36.789Z
Learning: In the gRPC metrics implementation for open-cluster-management.io/sdk-go, the grpc_server_ce_called_total metric intentionally has different semantics for unary vs stream calls: unary calls increment it once per RPC invocation, while stream calls increment it once per received message. This is deliberate design to provide granular visibility into stream message activity.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
🧬 Code graph analysis (6)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (5)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-26)
  • NewSourceOptions (28-39)
pkg/cloudevents/generic/options/builder/optionsbuilder.go (3)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewSourceOptions (28-39)
  • NewAgentOptions (10-26)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/server/grpc/broker_test.go (4)
pkg/cloudevents/generic/options/fake/fakeoptions.go (1)
  • NewAgentOptions (20-26)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/mqtt/agentoptions.go (1)
  • NewAgentOptions (27-40)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewAgentOptions (10-26)
pkg/cloudevents/generic/options/v2/grpc/transport.go (8)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (2)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
pkg/cloudevents/server/grpc/broker.go (6)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (1)
  • CloudEventService_SubscribeServer (146-149)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (5)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewSourceOptions (28-39)
test/integration/cloudevents/util/grpc.go (1)
  • NewGRPCSourceOptions (16-18)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
pkg/cloudevents/constants/constants.go (1)
  • ConfigTypeGRPC (5-5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (8)
pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (1)

5-5: LGTM! Test infrastructure correctly updated for v2 transport type checking.

The shift from JSON-based assertions to reflection-based type verification is appropriate and more direct for validating that builders wire the correct transport implementations.

Also applies to: 33-37, 59-59, 76-76, 112-116

test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1)

19-23: LGTM! Integration test correctly wired for v2 gRPC transport.

The function properly constructs v2 source options and the unused context parameter issue was resolved by using the blank identifier.

pkg/cloudevents/generic/clients/agentclient_test.go (1)

75-75: LGTM! Test callbacks updated for context-aware Receive API.

All three test scenarios correctly updated to match the new ReceiveHandlerFn signature that accepts context.Context as the first parameter.

Also applies to: 158-158, 328-328

pkg/cloudevents/server/grpc/broker_test.go (1)

13-13: LGTM! Broker test correctly migrated to v2 transport API.

The test properly exercises the v2 lifecycle: Connect → Subscribe → Receive with context-aware handlers. All changes align with the v2 gRPC transport semantics.

Also applies to: 93-93, 98-100, 113-113

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1)

1-1334: LGTM! Comprehensive test coverage for v2 gRPC transport.

The test suite thoroughly exercises:

  • Lifecycle operations (Connect, Subscribe, Receive, Close)
  • Error handling and edge cases
  • Heartbeat filtering and health checks
  • Connection state monitoring
  • Integration scenarios with mixed events

Past review concerns about subscription error testing were addressed in commit de876e6.

pkg/cloudevents/generic/options/v2/grpc/transport.go (1)

45-290: LGTM! V2 gRPC transport implementation is solid.

The implementation properly addresses all past review concerns:

  • ✅ Nil-safe Close with guard against double-close (lines 180-186)
  • getCloseChan() protects concurrent reads with RLock (lines 242-247)
  • Subscribe uses subscribed flag to prevent multiple subscriptions (lines 105-107)
  • ✅ Atomic header emission under lock in Subscribe (lines 98-131)
  • Receive validates subscription state before starting goroutines (lines 136-141)
  • ✅ Proper lifecycle management with context cancellation and close channel coordination

The transport follows proper Go concurrency patterns and integrates well with the v2 API surface.

pkg/cloudevents/server/grpc/broker.go (2)

125-152: LGTM! Broker correctly implements v2 subscription lifecycle with subID-aware handlers.

Key improvements that address past review concerns:

  • Atomic registration and header emission: SendHeader is now called under bkr.mu lock inside register (lines 143-147), eliminating the race where events could be sent before headers
  • Subscription-scoped context: Handler now uses subCtx.Done() (line 240) instead of the handler's context, preventing indefinite blocking when the stream closes
  • SubID propagation: Handlers receive subID parameter for improved traceability and logging

The implementation properly coordinates subscription lifecycle with error handling and cleanup.

Also applies to: 181-272


10-10: LGTM! Logging and dispatch improvements enhance observability.

The additions properly enhance debugging capabilities:

  • Imports for constants and metadata support header-based subID signaling
  • Expanded logging with eventContext throughout resync and dispatch paths
  • SubID-aware iteration and dispatch in handleRes (lines 392-403)

All changes align with the v2 transport's subscription model.

Also applies to: 21-21, 95-95, 307-307, 314-314, 325-325, 332-332, 359-359, 392-403


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
pkg/cloudevents/generic/options/options.go (1)

30-33: Clarify Subscribe failure handling semantics.

The Subscribe method documentation states it's non-blocking and should be called after Connect and before Receive, but the expected behavior when Subscribe returns an error is unclear. Should callers retry? Can Receive be called if Subscribe fails?

Consider enhancing the documentation to specify:

  • Whether Subscribe failures are transient or fatal
  • Expected caller behavior on error (retry, abort, etc.)
  • What happens if Receive is called without a successful Subscribe
 	// Subscribe sends a subscription request to the transport to subscribe topics/services.
 	// This is a non-blocking method that should be called after Connect and before Receive.
-	// Returns an error if the subscription request cannot be sent.
+	// Returns an error if the subscription request cannot be sent. On error, callers should
+	// typically retry or abort before calling Receive, as the transport may not receive events
+	// without a successful subscription.
 	Subscribe(ctx context.Context) error
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)

23-24: Consider adding deprecation notice for consistency.

NewAgentOptions in pkg/cloudevents/generic/options/grpc/agentoptions.go (line 23) is marked deprecated in favor of v2, but NewSourceOptions here is not. If both are being replaced by v2 implementations (as evidenced by pkg/cloudevents/generic/options/builder/optionsbuilder.go using grpcv2.NewSourceOptions), consider marking this deprecated as well for consistency.

+// Deprecated: using v2.grpc.NewSourceOptions instead
 func NewSourceOptions(gRPCOptions *GRPCOptions,
 	sourceID string, dataType types.CloudEventsDataType) *options.CloudEventsSourceOptions {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8266e17 and cd60e5f.

📒 Files selected for processing (13)
  • pkg/cloudevents/generic/clients/baseclient.go (4 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.291Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-11-11T13:27:36.291Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.291Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • test/integration/cloudevents/certrotation_grpc_test.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
🧬 Code graph analysis (6)
pkg/cloudevents/generic/options/v2/grpc/transport.go (7)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
pkg/cloudevents/generic/options/builder/optionsbuilder.go (5)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1)
  • NewSourceOptions (26-38)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewSourceOptions (33-49)
  • NewAgentOptions (10-31)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/mqtt/agentoptions.go (1)
  • NewAgentOptions (27-40)
pkg/cloudevents/generic/options/v2/grpc/options.go (5)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/options.go (2)
  • CloudEventsAgentOptions (76-89)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (5)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewSourceOptions (33-49)
test/integration/cloudevents/util/grpc.go (1)
  • NewGRPCSourceOptions (16-18)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
pkg/cloudevents/constants/constants.go (1)
  • ConfigTypeGRPC (5-5)
test/integration/cloudevents/certrotation_grpc_test.go (3)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsAgentOptions (76-89)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewAgentOptions (10-31)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (4)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (2)
  • CloudEventService_SubscribeServer (146-149)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-31)
  • NewSourceOptions (33-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (11)
pkg/cloudevents/generic/options/fake/fakeoptions.go (1)

46-48: LGTM! No-op Subscribe implementation is appropriate for the fake transport.

The no-op Subscribe method correctly satisfies the CloudEventTransport interface for test scenarios where subscription mechanics are not needed.

pkg/cloudevents/generic/options/mqtt/agentoptions.go (1)

141-147: LGTM! No-op Subscribe is appropriate for MQTT transport.

The no-op implementation with clear documentation is correct. MQTT subscriptions are configured in Connect (lines 89-128) and activated when the CloudEvents client calls StartReceiver. The TODO about native subscription in v2 provides helpful context for future refactoring.

pkg/cloudevents/generic/options/grpc/agentoptions.go (2)

23-23: LGTM! Deprecation notice provides clear migration path.

The deprecation comment properly directs users to the v2 grpc implementation while maintaining backward compatibility.


86-90: LGTM! No-op Subscribe maintains existing behavior.

The no-op implementation correctly preserves the current behavior where subscription is handled implicitly by the CloudEvents SDK protocol layer during receiver startup.

pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1)

135-141: LGTM! No-op Subscribe aligns with MQTT subscription lifecycle.

Consistent with the agent transport, subscriptions are configured in Connect (lines 75-121) and managed by the CloudEvents SDK. The TODO provides helpful context for future decoupling.

test/integration/cloudevents/certrotation_grpc_test.go (2)

14-14: LGTM! Clear import alias for v2 grpc options.

The grpcv2 alias clearly distinguishes the v2 implementation from the deprecated v1 grpc package.


22-22: LGTM! Migration to v2 gRPC options completed.

The test correctly migrates from the deprecated grpc.NewAgentOptions to grpcv2.NewAgentOptions with the same signature, ensuring cert rotation behavior is tested with the new v2 transport implementation.

pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)

81-85: LGTM! No-op Subscribe maintains existing behavior.

The no-op implementation is consistent with the agent transport, correctly preserving the behavior where subscription is managed by the protocol layer.

pkg/cloudevents/generic/options/builder/optionsbuilder.go (3)

10-10: LGTM! Clear import alias for v2 grpc options.

The grpcv2 alias provides clear namespace separation between the deprecated grpc package and the new v2 implementation.


61-61: LGTM! Builder migrated to v2 source options.

The builder correctly uses the v2 constructor while maintaining the same signature and config type, ensuring transparent migration for callers.


74-74: LGTM! Builder migrated to v2 agent options.

Consistent with the source options migration, the builder now uses the v2 constructor to provide access to the new Subscribe-capable transport implementation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1)

19-23: Fix unused ctx parameter to resolve compilation error.

The ctx parameter is declared but never used, causing a Go compilation error.

Apply this diff:

-func GetGRPCSourceOptionsV2(ctx context.Context, sourceID string) (*options.CloudEventsSourceOptions, string) {
+func GetGRPCSourceOptionsV2(_ context.Context, sourceID string) (*options.CloudEventsSourceOptions, string) {
🧹 Nitpick comments (2)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (2)

101-101: Consider reducing timing dependencies.

Multiple tests use time.Sleep() for synchronization (lines 101, 398, 691, 993, 1073, 1104, 1149, 1196). While common in testing, these can cause flakiness on slower CI systems. Consider using explicit synchronization channels or condition variables where possible.

Also applies to: 398-398, 691-691, 993-993, 1073-1073, 1104-1104, 1149-1149, 1196-1196


566-647: Potential test duplication.

TestGrpcTransport_Receive_HeartbeatFiltering (lines 566-647) and TestGrpcTransport_HeartbeatFiltering (lines 1018-1081) appear to test similar functionality—both verify that heartbeat events are filtered and not passed to the handler. Consider consolidating these or differentiating their test scenarios more explicitly.

Also applies to: 1018-1081

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd60e5f and 139492a.

📒 Files selected for processing (15)
  • pkg/cloudevents/generic/clients/baseclient.go (4 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (2 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • test/integration/cloudevents/certrotation_grpc_test.go
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/fake/fakeoptions.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker_test.go
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker_test.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker_test.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/server/grpc/broker_test.go
🧬 Code graph analysis (5)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (5)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewSourceOptions (33-49)
test/integration/cloudevents/util/grpc.go (1)
  • NewGRPCSourceOptions (16-18)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
pkg/cloudevents/constants/constants.go (1)
  • ConfigTypeGRPC (5-5)
pkg/cloudevents/generic/options/v2/grpc/options.go (5)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/options.go (2)
  • CloudEventsAgentOptions (76-89)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
pkg/cloudevents/generic/options/builder/optionsbuilder.go (6)
pkg/cloudevents/generic/options/fake/fakeoptions.go (2)
  • NewSourceOptions (28-33)
  • NewAgentOptions (20-26)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1)
  • NewSourceOptions (26-38)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewSourceOptions (33-49)
  • NewAgentOptions (10-31)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/mqtt/agentoptions.go (1)
  • NewAgentOptions (27-40)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (4)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-31)
  • NewSourceOptions (33-49)
pkg/cloudevents/server/grpc/broker_test.go (2)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewAgentOptions (10-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (16)
pkg/cloudevents/generic/options/options.go (1)

30-33: LGTM! Clear interface extension for subscription management.

The Subscribe method addition is well-documented and properly integrates with the transport lifecycle (Connect → Subscribe → Receive). The method signature is appropriate for asynchronous subscription requests.

pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1)

135-141: LGTM! Appropriate no-op implementation with clear documentation.

The Subscribe method correctly implements the CloudEventTransport interface with a no-op that delegates subscription to the CloudEvents client during receiver startup. The TODO provides clear guidance for future v2 improvements.

pkg/cloudevents/generic/options/builder/optionsbuilder.go (1)

10-10: LGTM! Clean migration to v2 gRPC options.

The import alias and subsequent usage in NewSourceOptions (line 61) and NewAgentOptions (line 74) properly transition the builder to use the v2 gRPC transport constructors.

pkg/cloudevents/server/grpc/broker_test.go (1)

98-100: LGTM! Test correctly demonstrates the new subscription flow.

The Subscribe call after Connect and before Receive/Send operations properly exercises the new subscription lifecycle introduced in the CloudEventTransport interface.

pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (1)

112-116: LGTM! More robust type assertion using reflection.

Switching from JSON-based comparison to direct type checking via reflect.TypeOf provides clearer validation of the transport implementation without relying on serialization behavior.

pkg/cloudevents/generic/clients/baseclient.go (3)

248-258: LGTM! Non-blocking send prevents deadlock.

The non-blocking send using select with a default case ensures that sendReconnectedSignal never blocks when no receiver is listening, preventing potential deadlocks. The context parameter addition enables proper logging context.


179-179: LGTM! Context properly propagated to receiver.

Replacing the TODO with the provided ctx ensures proper cancellation propagation to the receiver goroutine.


181-193: LGTM! Proper subscription lifecycle management.

The subscribed flag and resubscription logic correctly handle the reconnection scenario: after a restart signal, the receiver resubscribes before resuming event reception, then signals reconnection via sendReconnectedSignal.

pkg/cloudevents/generic/options/v2/grpc/options.go (2)

10-31: LGTM! Well-structured v2 agent options constructor.

The NewAgentOptions function properly initializes the grpcTransport with all necessary fields. The TODO at lines 19-22 appropriately documents that the agent client subscription source determination needs future refinement, with types.SourceAll serving as a placeholder.


33-49: LGTM! Clean v2 source options constructor.

The NewSourceOptions function correctly initializes the grpcTransport with source-specific subscription details, using the provided sourceID in the subscription request.

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (6)

27-122: LGTM! Well-designed test infrastructure.

The mock service implementation and setup helper are well-structured with good configurability. The use of bufconn for in-memory gRPC testing is a best practice, and atomic operations ensure thread safety. The 10ms sleep on line 101 is a minor timing dependency but acceptable for test stability.


124-246: LGTM! Comprehensive constructor validation.

Both constructor tests thoroughly verify options wiring, channel initialization, and subscription request composition. The tests properly validate the closure behavior for getSubscriptionRequest, which is critical for the v2 transport implementation.


274-356: LGTM! Good coverage of send paths.

Both success and error cases are tested with proper thread safety (mutex protection) and appropriate assertions. The tests verify both the publish call and the event data integrity.


358-456: LGTM! Appropriate subscribe testing.

The tests properly verify subscription request composition and acknowledge the asynchronous nature of stream errors (lines 424-426). Good coverage of both successful subscription and error scenarios.


458-769: LGTM! Comprehensive receive operation testing.

The receive tests provide excellent coverage across multiple scenarios: normal event reception, heartbeat filtering, graceful closure via channel, and stream error handling. Proper use of synchronization primitives and timeout handling throughout.


850-1336: LGTM! Excellent integration and edge case coverage.

The integration tests provide comprehensive validation of the complete transport lifecycle. Particularly strong:

  • Table-driven heartbeat integration testing with multiple scenarios (lines 897-1016)
  • Connection state monitoring edge cases including server shutdown, context cancellation, and error channel behavior
  • Mixed event stream handling with proper heartbeat filtering verification

The test suite thoroughly exercises the v2 gRPC transport implementation across normal operation and edge cases.

@skeeey
Copy link
Member Author

skeeey commented Nov 13, 2025

/hold

@skeeey skeeey force-pushed the sub branch 2 times, most recently from 053ec57 to b6f1477 Compare November 13, 2025 11:01
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/cloudevents/generic/options/v2/grpc/transport.go (1)

37-58: Consider moving Dial outside the critical section.

The Connect method holds the mutex while calling t.opts.Dialer.Dial() (line 42), which performs blocking network I/O. If multiple goroutines attempt operations that need this lock, they could be blocked unnecessarily during the potentially slow dial operation.

Consider restructuring to release the lock before dialing, or evaluate whether the lock is needed for the entire Connect operation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 053ec57 and b6f1477.

📒 Files selected for processing (16)
  • pkg/cloudevents/generic/clients/baseclient.go (4 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (9 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (2 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
  • pkg/cloudevents/generic/options/fake/fakeoptions.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/server/grpc/broker_test.go
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/clients/baseclient.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/clients/baseclient.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
🧬 Code graph analysis (6)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (4)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-30)
  • NewSourceOptions (32-47)
pkg/cloudevents/generic/options/v2/grpc/transport.go (7)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
pkg/cloudevents/generic/options/v2/grpc/options.go (5)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/options.go (2)
  • CloudEventsAgentOptions (76-89)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (5)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewSourceOptions (32-47)
test/integration/cloudevents/util/grpc.go (1)
  • NewGRPCSourceOptions (16-18)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
pkg/cloudevents/constants/constants.go (1)
  • ConfigTypeGRPC (5-5)
pkg/cloudevents/server/grpc/broker.go (3)
pkg/cloudevents/generic/types/types.go (1)
  • CloudEventsDataType (165-169)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
test/integration/cloudevents/certrotation_grpc_test.go (3)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsAgentOptions (76-89)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewAgentOptions (10-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (25)
pkg/cloudevents/generic/options/grpc/agentoptions.go (2)

23-23: LGTM! Clear deprecation notice.

The deprecation comment properly guides users to the v2 API.


86-90: LGTM! Appropriate no-op implementation.

The Subscribe method correctly implements the new interface requirement while maintaining backward compatibility for v1 transports.

pkg/cloudevents/generic/options/v2/grpc/transport.go (5)

60-71: LGTM! Clean Send implementation.

The Send method correctly converts CloudEvents to protobuf and publishes them.


73-84: LGTM! Clean Subscribe implementation.

The Subscribe method properly calls the gRPC Subscribe RPC and stores the subscription client for later use.


86-110: LGTM! Proper lifecycle management.

The Receive method correctly manages goroutine lifecycle with context cancellation and proper cleanup.


133-174: LGTM! Robust event receiving logic.

The startEventsReceiver correctly handles heartbeat filtering, event conversion, error reporting, and cancellation signals.


176-220: LGTM! Well-documented connection state monitoring.

The monitorConnectionState method correctly monitors gRPC connection state changes with clear comments explaining each state's significance.

pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)

81-85: LGTM! Consistent no-op implementation.

The Subscribe method matches the pattern used in agentoptions.go, maintaining consistency across v1 transport implementations.

test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1)

19-23: LGTM! Clean test helper.

The GetGRPCSourceOptionsV2 helper correctly constructs v2 gRPC source options using the new API.

pkg/cloudevents/generic/clients/baseclient.go (4)

99-101: LGTM! Proper channel cleanup.

The reconnectedChan is now properly closed when the context is cancelled, preventing potential goroutine leaks.


169-173: Verify that Subscribe is non-blocking.

The transport.Subscribe(ctx) call is made while holding the mutex (locked at line 159). If Subscribe performs blocking network I/O, this could cause contention. Ensure that Subscribe returns quickly or consider moving it outside the critical section as originally suggested in the past review.


181-193: LGTM! Clear reconnection logic.

The subscribed flag and resubscribe flow properly handles reconnection scenarios, ensuring subscription is renewed after transport restart.


248-257: LGTM! Non-blocking signal with good observability.

The non-blocking send pattern with logging prevents potential deadlocks while providing visibility when reconnection signals are dropped.

pkg/cloudevents/generic/options/v2/grpc/options.go (2)

10-30: LGTM! Clean constructor with noted placeholder.

The NewAgentOptions constructor is well-structured. The TODO comment appropriately flags that Source is currently set to SourceAll as a placeholder.


32-47: LGTM! Clean source options constructor.

The NewSourceOptions constructor correctly constructs subscription requests for source clients without unnecessary fields.

pkg/cloudevents/server/grpc/broker.go (5)

123-141: LGTM! Simplified error handling.

The register method signature change removes the error channel, simplifying the subscription flow. Errors are now handled inline during Subscribe, which is cleaner than per-subscriber error channels.


179-179: LGTM! Naming consistency improvement.

The typo fix from "hearbeater" to "heartbeater" improves code clarity.

Also applies to: 192-192, 242-242


218-258: LGTM! Cleaner subscription flow.

The Subscribe implementation properly handles event sending, error reporting, and cleanup. The inline error handling is more straightforward than the previous channel-based approach.


93-93: LGTM! Improved logging consistency.

The logging updates provide better context by including eventContext, clusterName, and dataType, which will aid in debugging.

Also applies to: 137-137, 150-150, 227-227, 311-311, 381-382


231-236: LGTM! Clear documentation of context behavior.

The comments clearly explain the implications of context cancellation in different scenarios, improving code maintainability.

Also applies to: 250-255

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (3)

28-122: LGTM! Excellent test infrastructure.

The mock service and setup utilities provide a flexible foundation for comprehensive testing using in-memory bufconn connections.


124-238: LGTM! Thorough constructor validation.

The constructor tests comprehensively validate field initialization and subscription request construction for both agent and source options.


240-1328: LGTM! Comprehensive test coverage.

The test suite thoroughly covers:

  • All transport methods (Send, Subscribe, Receive, Close)
  • Error handling paths
  • Heartbeat filtering and health checking
  • Connection state monitoring
  • Integration scenarios
  • Edge cases (context cancellation, error channel behavior)

This comprehensive coverage provides strong confidence in the v2 transport implementation.

test/integration/cloudevents/certrotation_grpc_test.go (2)

14-14: LGTM: Clean v2 package import.

The import alias clearly distinguishes the v2 gRPC package from the v1 package.


22-22: Migration to v2 transport is correct; certificate rotation is preserved.

The function call properly migrates from v1 to v2 transport. The v2 NewAgentOptions accepts v1 GRPCOptions, and the v2 transport correctly delegates connection setup to the v1 GRPCDialer.Dial() method, which applies the TLSConfig with the certificate rotation callback. The test's cert rotation mechanism will continue to work as expected.

return err
}
} else {
log.V(4).Info("skip the event since the agent is not subscribed.", "subID", id, "eventContext", evt.Context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return err here to let external caller retry? Or add a TODO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this part as we do the check before handle the resource, and it seems the resync can ensure we don't loss the event even we ignore the event at this point?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6f1477 and a41bfeb.

📒 Files selected for processing (5)
  • pkg/cloudevents/constants/constants.go (1 hunks)
  • pkg/cloudevents/generic/clients/baseclient.go (5 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (14 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/generic/clients/baseclient.go
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • pkg/cloudevents/server/grpc/broker.go
🧬 Code graph analysis (2)
pkg/cloudevents/generic/options/v2/grpc/transport.go (8)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
pkg/cloudevents/server/grpc/broker.go (5)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: verify
  • GitHub Check: unit
🔇 Additional comments (1)
pkg/cloudevents/constants/constants.go (1)

8-10: Standardized subscription header key looks good

Exporting GRPCSubscriptionIDKey = "subscription-id" with docs is a clean way to keep client/server header usage aligned and avoid magic strings.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
pkg/cloudevents/server/grpc/broker.go (1)

220-249: Critical race condition between subscriber registration and header emission

This issue was previously flagged and remains unresolved. The sequence at lines 220 and 245 creates a race:

  1. bkr.register(...) adds the subscriber to bkr.subscribers and releases the lock (line 220).
  2. SendHeader(...) is called outside the lock (line 245).

Between these steps, handleRes (line 372) can acquire RLock, iterate bkr.subscribers (line 388), find the new subscriber, and invoke its handler (line 393). The handler enqueues to eventCh, the send goroutine calls subServer.Send(evt) (line 206), which implicitly sends response headers with the first Send. The subsequent explicit SendHeader at line 245 then fails with "header already sent", unregistering the subscriber and returning an error—leaving the client with a failed Subscribe despite events already in flight.

As per the previous review, atomicity can be achieved by holding the broker lock across both registration and header emission:

-func (bkr *GRPCBroker) register(
-	ctx context.Context, clusterName string, dataType types.CloudEventsDataType, handler resourceHandler) string {
+func (bkr *GRPCBroker) registerAndSendHeader(
+	ctx context.Context,
+	subReq *pbv1.SubscriptionRequest,
+	dataType types.CloudEventsDataType,
+	handler resourceHandler,
+	subServer pbv1.CloudEventService_SubscribeServer,
+) (string, error) {
 	logger := klog.FromContext(ctx)
 
 	bkr.mu.Lock()
 	defer bkr.mu.Unlock()
 
 	id := uuid.NewString()
 	bkr.subscribers[id] = &subscriber{
-		clusterName: clusterName,
+		clusterName: subReq.ClusterName,
 		dataType:    dataType,
 		handler:     handler,
 	}
 
-	logger.V(4).Info("register a subscriber", "id", id, "clusterName", clusterName, "dataType", dataType)
-	metrics.IncGRPCCESubscribersMetric(clusterName, dataType.String())
+	if err := subServer.SendHeader(metadata.Pairs(constants.GRPCSubscriptionIDKey, id)); err != nil {
+		logger.Error(err, "failed to send subscription header, unregister subscriber", "subID", id)
+		delete(bkr.subscribers, id)
+		return "", err
+	}
+
+	logger.V(4).Info("register a subscriber", "id", id, "clusterName", subReq.ClusterName, "dataType", dataType)
+	metrics.IncGRPCCESubscribersMetric(subReq.ClusterName, dataType.String())
 
-	return id
+	return id, nil
 }

In Subscribe:

- subID := bkr.register(ctx, subReq.ClusterName, *dataType, func(ctx context.Context, subID string, evt *cloudevents.Event) error {
+ subID, err := bkr.registerAndSendHeader(ctx, subReq, *dataType, func(ctx context.Context, subID string, evt *cloudevents.Event) error {
 	// ...
- })
-
- // Send initial metadata to signal registration is complete
- if err := subServer.SendHeader(metadata.Pairs(constants.GRPCSubscriptionIDKey, subID)); err != nil {
-	logger.Error(err, "failed to send subscription header, , unregister subscriber", "subID", subID)
-	bkr.unregister(ctx, subID)
-	return err
- }
+ }, subServer)
+ if err != nil {
+	return err
+ }

This also fixes the double-comma typo in the error message at line 246.

pkg/cloudevents/generic/options/v2/grpc/transport.go (1)

43-49: Verify that closeChan nil panic fix is present.

A past review comment (marked as addressed in commit 053ec57) recommended initializing closeChan in newTransport to prevent a panic when Close is called before Connect. However, the current code does not show this initialization. If Close is invoked before Connect succeeds, t.closeChan will be nil and close(t.closeChan) at line 153 will panic.

Verify whether the fix was applied and then reverted, or if this review is based on an earlier commit. If the fix is missing, apply:

 func newTransport(grpcOptions *grpcoptions.GRPCOptions, getter subscriptionRequestGetter) *grpcTransport {
 	return &grpcTransport{
 		opts:                   grpcOptions,
+		closeChan:              make(chan struct{}),
 		errorChan:              make(chan error),
 		getSubscriptionRequest: getter,
 	}
 }
🧹 Nitpick comments (2)
pkg/cloudevents/generic/options/v2/grpc/transport.go (2)

74-85: Add nil check for defensive coding.

If Send is called before a successful Connect, t.client will be nil and line 80 will panic. While the expected usage pattern is to call Connect first, a defensive check improves robustness.

 func (t *grpcTransport) Send(ctx context.Context, evt cloudevents.Event) error {
+	if t.client == nil {
+		return fmt.Errorf("transport not connected")
+	}
+
 	pbEvt := &pbv1.CloudEvent{}
 	if err := protocol.WritePBMessage(ctx, (*binding.EventMessage)(&evt), pbEvt); err != nil {
 		return err
 	}
 
 	if _, err := t.client.Publish(ctx, &pbv1.PublishRequest{Event: pbEvt}); err != nil {
 		return err
 	}
 
 	return nil
 }

159-206: Consider mutex protection when accessing closeChan.

Line 200 reads t.closeChan without holding t.mu, while Close (lines 142-157) may be concurrently reinitializing or closing it. Although Go allows reading from a closed channel and the race might be benign in practice, accessing shared state without synchronization can lead to subtle bugs.

Since closeChan is read-only after initialization in Connect, and the pattern is to select on it (not to inspect its state), the current implementation is likely safe. However, for strict correctness, consider:

  1. Documenting that closeChan must not be reassigned after goroutines start reading it, OR
  2. Using atomic operations or a separate read-lock if reassignment is necessary

Given the current reconnect pattern (Close → Connect → Subscribe → Receive), the existing code should work correctly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a41bfeb and 598a760.

📒 Files selected for processing (2)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (14 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • pkg/cloudevents/server/grpc/broker.go
🧬 Code graph analysis (2)
pkg/cloudevents/generic/options/v2/grpc/transport.go (8)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
pkg/cloudevents/server/grpc/broker.go (5)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (3)
pkg/cloudevents/generic/options/v2/grpc/transport.go (3)

138-140: LGTM!

Clean getter exposing the error channel as read-only.


142-157: Close implementation looks correct for reconnect cycles.

The double-close guard (lines 149-154) prevents panic on repeated Close calls. While a past review suggested reinitializing closeChan here, the current implementation reinitializes it in Connect (line 64), which also supports reconnect cycles and is a valid alternative pattern.


208-249: LGTM!

The connection state monitoring logic correctly detects disconnections (TransientFailure, Shutdown, Idle), reports errors via the channel with fallback logging, and appropriately handles the Shutdown state by skipping the redundant Close call. The goroutine exits after the first disconnection, allowing the error handler to manage reconnection.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/cloudevents/generic/options/v2/grpc/transport.go (2)

43-49: Initialize closeChan in newTransport to prevent nil panic.

The past review comment on this issue remains valid. Although Close now has a nil check at line 170, the select statement at line 172 (case <-t.closeChan:) will still panic if closeChan is nil and Close is called before Connect succeeds.

Apply this diff:

 func newTransport(grpcOptions *grpcoptions.GRPCOptions, getter subscriptionRequestGetter) *grpcTransport {
 	return &grpcTransport{
 		opts:                   grpcOptions,
+		closeChan:              make(chan struct{}),
 		errorChan:              make(chan error),
 		getSubscriptionRequest: getter,
 	}
 }

51-72: Prevent monitor goroutine leak on repeated Connect calls.

The past review comment on this issue remains valid. Each Connect call spawns a new monitorConnectionState goroutine (line 67) without stopping the previous one. On reconnect cycles, this accumulates stale monitor goroutines watching old connections.

Consider adding a cancel function to stop the previous monitor:

 type grpcTransport struct {
 	opts *grpcoptions.GRPCOptions
 
 	client    pbv1.CloudEventServiceClient
 	subClient pbv1.CloudEventService_SubscribeClient
 	subID     string
 
 	mu        sync.Mutex
+	monitorCancel context.CancelFunc
 	closeChan chan struct{}
 	errorChan chan error
 
 	getSubscriptionRequest subscriptionRequestGetter
 }
 func (t *grpcTransport) Connect(ctx context.Context) error {
 	t.mu.Lock()
 	defer t.mu.Unlock()
 
+	// Cancel previous monitor if any
+	if t.monitorCancel != nil {
+		t.monitorCancel()
+	}
+
 	logger := klog.FromContext(ctx)
 	conn, err := t.opts.Dialer.Dial()
 	if err != nil {
 		return err
 	}
 
 	t.client = pbv1.NewCloudEventServiceClient(conn)
 
 	// Initialize closeChan to support reconnect cycles
 	t.closeChan = make(chan struct{})
 
 	// Start a goroutine to monitor the gRPC connection state changes
-	go t.monitorConnectionState(ctx, conn)
+	monitorCtx, cancel := context.WithCancel(ctx)
+	t.monitorCancel = cancel
+	go t.monitorConnectionState(monitorCtx, conn)
 
 	logger.Info("grpc is connected", "grpcURL", t.opts.Dialer.URL)
 
 	return nil
 }
🧹 Nitpick comments (2)
pkg/cloudevents/generic/options/builder/optionsbuilder.go (1)

61-61: Migration to v2 gRPC options completed successfully.

The builder now uses v2 gRPC option constructors (grpcv2.NewSourceOptions and grpcv2.NewAgentOptions), which align with the new Subscribe-enabled transport flow introduced in this PR. The changes are transparent to callers since the function signatures remain unchanged.

Consider adding deprecation notices to the v1 gRPC option constructors if they should no longer be used directly.

Also applies to: 74-74

pkg/cloudevents/server/grpc/broker.go (1)

254-257: Consider returning error to enable external retry logic.

A past comment by qiujian16 suggested returning the error here instead of just logging and unregistering. This would allow the caller to distinguish between normal stream closure and error-driven unsubscription, potentially enabling retry logic.

Current behavior (unregister and return err) is acceptable, but if you want explicit retry signaling to callers, consider preserving the error return as-is, which already happens on line 257.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 598a760 and 26fd512.

📒 Files selected for processing (18)
  • pkg/cloudevents/constants/constants.go (1 hunks)
  • pkg/cloudevents/generic/clients/baseclient.go (5 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (14 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (2 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
  • test/integration/cloudevents/server/grpc.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • test/integration/cloudevents/certrotation_grpc_test.go
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/fake/fakeoptions.go
  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • pkg/cloudevents/server/grpc/broker_test.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/server/grpc/broker_test.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/server/grpc/broker_test.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker_test.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
🧬 Code graph analysis (8)
pkg/cloudevents/generic/options/v2/grpc/options.go (6)
pkg/cloudevents/generic/options/fake/fakeoptions.go (2)
  • NewAgentOptions (20-26)
  • NewSourceOptions (28-33)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/options.go (2)
  • CloudEventsAgentOptions (76-89)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
test/integration/cloudevents/server/grpc.go (1)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/v2/grpc/transport.go (8)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
pkg/cloudevents/generic/options/builder/optionsbuilder.go (3)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewSourceOptions (28-39)
  • NewAgentOptions (10-26)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (5)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewSourceOptions (28-39)
test/integration/cloudevents/util/grpc.go (1)
  • NewGRPCSourceOptions (16-18)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
pkg/cloudevents/constants/constants.go (1)
  • ConfigTypeGRPC (5-5)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (5)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-26)
  • NewSourceOptions (28-39)
pkg/cloudevents/server/grpc/broker.go (5)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/server/grpc/broker_test.go (2)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewAgentOptions (10-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (11)
pkg/cloudevents/generic/options/options.go (1)

30-33: LGTM! Well-designed non-blocking subscription API.

The new Subscribe method provides a clean separation between connection establishment and subscription setup. The non-blocking semantics and clear lifecycle placement (after Connect, before Receive) align well with modern event-driven architectures.

pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1)

135-141: LGTM! Appropriate no-op for MQTT transport.

The no-op implementation is correct since MQTT subscriptions are already established in the Connect method (lines 86-110 via cloudeventsmqtt.WithSubscribe). The TODO noting potential v2 native subscription logic is a reasonable future consideration.

pkg/cloudevents/generic/options/fake/fakeoptions.go (1)

46-48: LGTM! Appropriate no-op for test transport.

The no-op Subscribe implementation is correct for the fake in-memory transport, which doesn't require actual subscription setup. This satisfies the interface requirement without unnecessary complexity in test code.

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1)

1-1334: Excellent comprehensive test suite for v2 gRPC transport.

This test file provides thorough coverage of the v2 gRPC transport implementation:

  • Mock server with configurable behavior for flexible testing
  • In-memory gRPC testing via bufconn (standard best practice)
  • Complete lifecycle testing (Connect → Subscribe → Send/Receive → Close)
  • Edge cases including heartbeat filtering, stream errors, and connection monitoring
  • Integration tests validating end-to-end workflows

The test structure is well-organized and follows Go testing conventions.

test/integration/cloudevents/server/grpc.go (1)

173-175: LGTM! Subscription ID header emission aligns with v2 flow.

The header emission using constants.GRPCSubscriptionIDKey correctly implements the v2 gRPC subscription signaling. The fail-fast behavior on header send errors is appropriate.

Note: The hardcoded "sub-test-id" value is acceptable for integration tests but consider making it configurable if future tests require unique subscription IDs.

test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1)

19-23: LGTM! V2 gRPC test helper correctly implemented.

The GetGRPCSourceOptionsV2 helper properly constructs v2 source options and follows the existing test pattern. The unused ctx parameter is correctly handled with the blank identifier (addressing the previous review comment).

pkg/cloudevents/constants/constants.go (1)

8-10: LGTM! Well-documented constant for gRPC subscription ID standardization.

The new GRPCSubscriptionIDKey constant provides a standardized header key for subscription identification across the v2 gRPC transport implementation. The documentation clearly explains its purpose.

pkg/cloudevents/generic/options/builder/optionsbuilder.go (1)

10-10: Clear import alias for v2 gRPC options.

The grpcv2 import alias clearly distinguishes the v2 gRPC options package from the v1 package, improving code readability.

pkg/cloudevents/server/grpc/broker_test.go (1)

13-13: LGTM! Test properly exercises v2 gRPC transport with explicit Subscribe flow.

The test correctly imports and uses the v2 gRPC options package, and the Subscribe call after Connect aligns with the new transport API design that decouples subscription from connection establishment.

Also applies to: 93-100

pkg/cloudevents/generic/options/v2/grpc/options.go (2)

10-26: LGTM! Agent options constructor properly wraps v2 transport.

The use of types.SourceAll as a placeholder is reasonable given the TODO comment acknowledging that agent client source determination needs future work.


28-39: LGTM! Source options constructor properly wraps v2 transport.

The source-specific SubscriptionRequest correctly uses the provided sourceID.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
pkg/cloudevents/generic/options/v2/grpc/transport.go (3)

43-49: Initialize closeChan in newTransport to prevent panic.

A previous review (marked "Addressed in commit 053ec57") flagged that closeChan should be initialized here, but it's still not initialized. If Close() is called before Connect() succeeds (e.g., in a deferred cleanup around a failed Connect), line 175 will attempt close(t.closeChan) on a nil channel, causing a panic.

Apply this fix:

 func newTransport(grpcOptions *grpcoptions.GRPCOptions, getter subscriptionRequestGetter) *grpcTransport {
 	return &grpcTransport{
 		opts:                   grpcOptions,
+		closeChan:              make(chan struct{}),
 		errorChan:              make(chan error),
 		getSubscriptionRequest: getter,
 	}
 }

Connect can continue reassigning t.closeChan = make(chan struct{}) on line 64 for reconnect cycles.


163-180: Reinitialize closeChan to support reconnect cycles.

A previous review (marked "Addressed in commit 053ec57") recommended reinitializing closeChan after closing it to support multiple reconnect cycles. The current code closes the channel but doesn't recreate it. On the next ConnectReceive cycle, line 149 case <-t.closeChan: will immediately trigger since closed channels are always readable.

Apply this fix to ensure Receive works correctly after reconnection:

 func (t *grpcTransport) Close(ctx context.Context) error {
 	t.mu.Lock()
 	defer t.mu.Unlock()
 
 	klog.FromContext(ctx).Info("close grpc transport")
 
 	// Guard against double-close panic and nil channel
 	if t.closeChan != nil {
 		select {
 		case <-t.closeChan:
 			// Already closed
 		default:
 			close(t.closeChan)
 		}
+		// Reinitialize for next reconnect cycle
+		t.closeChan = make(chan struct{})
 	}
 
 	return t.opts.Dialer.Close()
 }

51-72: Prevent goroutine leak on repeated Connect calls.

Each Connect call spawns a new monitorConnectionState goroutine (line 67) without stopping the previous one. During reconnect cycles, this can accumulate multiple monitor goroutines, each watching stale connections and consuming resources.

Consider tracking and cancelling the previous monitor goroutine:

 type grpcTransport struct {
 	opts *grpcoptions.GRPCOptions
 
 	client    pbv1.CloudEventServiceClient
 	subClient pbv1.CloudEventService_SubscribeClient
 	subID     string
 
 	mu        sync.Mutex
+	monitorCancel context.CancelFunc
 	closeChan chan struct{}
 	errorChan chan error
 
 	getSubscriptionRequest subscriptionRequestGetter
 }
 func (t *grpcTransport) Connect(ctx context.Context) error {
 	t.mu.Lock()
 	defer t.mu.Unlock()
 
+	// Cancel previous monitor if any
+	if t.monitorCancel != nil {
+		t.monitorCancel()
+	}
+
 	logger := klog.FromContext(ctx)
 	conn, err := t.opts.Dialer.Dial()
 	if err != nil {
 		return err
 	}
 
 	t.client = pbv1.NewCloudEventServiceClient(conn)
 
 	// Initialize closeChan to support reconnect cycles
 	t.closeChan = make(chan struct{})
 
 	// Start a goroutine to monitor the gRPC connection state changes
-	go t.monitorConnectionState(ctx, conn)
+	monitorCtx, cancel := context.WithCancel(ctx)
+	t.monitorCancel = cancel
+	go t.monitorConnectionState(monitorCtx, conn)
 
 	logger.Info("grpc is connected", "grpcURL", t.opts.Dialer.URL)
 
 	return nil
 }
pkg/cloudevents/generic/clients/baseclient.go (1)

169-173: Move Subscribe call outside the mutex.

The transport.Subscribe(ctx) call performs blocking network I/O while holding baseclient's mutex:

  • baseclient.go lines 159-160: Mutex acquired and deferred unlock
  • baseclient.go line 170: Subscribe called under lock
  • v2/grpc/transport.go lines 105-110: Subscribe performs blocking gRPC call and waits for headers

This causes network latency to block all other operations requiring the baseclient mutex. Move the Subscribe call outside the critical section:

c.Lock()
if c.receiverChan != nil {
	logger.V(2).Info("the subscription has already started")
	c.Unlock()
	return
}
c.Unlock()

if err := c.transport.Subscribe(ctx); err != nil {
	runtime.HandleErrorWithContext(ctx, err, "failed to subscribe")
	return
}

c.Lock()
c.receiverChan = make(chan int)
c.Unlock()

Then start the receiver goroutine outside the lock.

🧹 Nitpick comments (3)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (3)

105-107: Consider removing or reducing the sleep duration.

The serverReady channel on line 106 already provides synchronization to ensure the server goroutine has started. The additional 10ms sleep on line 107 may be unnecessary and could potentially be removed or reduced. If timing issues occur without it, consider using a more robust synchronization mechanism rather than a fixed sleep duration.


395-396: Consider more robust synchronization than time.Sleep.

Multiple tests use time.Sleep to wait for asynchronous operations to complete (e.g., lines 395-396, 689, 991, 1071). While common in integration tests, this pattern can lead to flaky tests if timing varies across systems. Consider using channels or other synchronization primitives where possible to signal completion rather than fixed sleep durations.

For example, the mock's subscribeFunc could signal via a channel when it has been invoked, allowing the test to wait on that channel instead of sleeping.


490-493: Consider extracting common timeout values to constants.

The value 100 * time.Millisecond for ServerHealthinessTimeout is repeated in multiple tests (lines 490, 598, 660, 717). Extracting these to named constants would improve maintainability and make it easier to adjust timeouts consistently.

const (
	testServerHealthTimeout = 100 * time.Millisecond
	testLongTimeout        = 1 * time.Second
)

Also applies to: 598-601, 660-663, 717-720

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26fd512 and f81f8d7.

📒 Files selected for processing (18)
  • pkg/cloudevents/constants/constants.go (1 hunks)
  • pkg/cloudevents/generic/clients/baseclient.go (5 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (14 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (2 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
  • test/integration/cloudevents/server/grpc.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • test/integration/cloudevents/certrotation_grpc_test.go
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/server/grpc/broker_test.go
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/fake/fakeoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
🧬 Code graph analysis (5)
pkg/cloudevents/generic/options/builder/optionsbuilder.go (3)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewSourceOptions (28-39)
  • NewAgentOptions (10-26)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/v2/grpc/transport.go (8)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
pkg/cloudevents/generic/options/v2/grpc/options.go (5)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/options.go (2)
  • CloudEventsAgentOptions (76-89)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
pkg/cloudevents/server/grpc/broker.go (6)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (1)
  • CloudEventService_SubscribeServer (146-149)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (5)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-26)
  • NewSourceOptions (28-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit
  • GitHub Check: integration
  • GitHub Check: verify
🔇 Additional comments (14)
pkg/cloudevents/generic/options/options.go (1)

30-33: LGTM! Well-designed subscription API.

The new Subscribe method provides a clean separation between connection establishment and subscription, enabling explicit success/failure feedback as outlined in the PR objectives. The documentation clearly specifies it's non-blocking and should be called after Connect but before Receive.

pkg/cloudevents/generic/options/fake/fakeoptions.go (1)

46-48: LGTM! Appropriate no-op implementation for fake transport.

The no-op Subscribe implementation correctly satisfies the CloudEventTransport interface for testing purposes without introducing unnecessary complexity.

pkg/cloudevents/generic/clients/baseclient.go (2)

248-257: LGTM! Non-blocking signal prevents goroutine hangs.

The non-blocking select with a default case ensures that sendReconnectedSignal never blocks if no receiver is listening. This is good defensive programming and prevents potential goroutine deadlocks.


99-101: LGTM! Proper channel cleanup on context cancellation.

Closing reconnectedChan on context done ensures downstream receivers are properly notified of shutdown and prevents goroutine leaks.

pkg/cloudevents/generic/options/v2/grpc/transport.go (2)

126-132: LGTM! Proper validation before starting receiver.

The nil check on subClient ensures Receive fails fast with a clear error if Subscribe wasn't called, preventing a panic in startEventsReceiver. This addresses a previous review concern.


182-229: LGTM! Robust event receiver implementation.

The startEventsReceiver properly:

  • Handles heartbeats separately from regular events
  • Distinguishes canceled context from stream errors
  • Uses non-blocking channel sends with proper context cancellation checks
  • Converts protobuf messages to CloudEvents correctly
pkg/cloudevents/constants/constants.go (1)

8-10: LGTM! Clear constant for gRPC subscription header.

The new GRPCSubscriptionIDKey constant provides a well-documented, centralized definition for the subscription ID header key used in gRPC transport.

pkg/cloudevents/generic/options/builder/optionsbuilder.go (2)

10-10: LGTM! Clean import alias for v2 gRPC options.

The grpcv2 alias clearly distinguishes the v2 implementation from the existing v1 gRPC package.


61-61: LGTM! Migration to v2 gRPC constructors.

The builder now uses the v2 gRPC options constructors, aligning with the PR's objective to introduce the v2 gRPC transport implementation.

Also applies to: 74-74

pkg/cloudevents/generic/options/v2/grpc/options.go (2)

10-26: LGTM! Well-structured agent options constructor.

The NewAgentOptions constructor cleanly wraps the transport with a subscription request builder. The TODO on lines 15-17 appropriately documents that the Source field is a placeholder using types.SourceAll until the agent client usage pattern is finalized.


28-39: LGTM! Consistent source options constructor.

NewSourceOptions follows the same pattern as NewAgentOptions, providing a clean, consistent API for building v2 gRPC transport options.

pkg/cloudevents/server/grpc/broker.go (3)

124-152: LGTM! Race condition correctly addressed with atomic registration and header emission.

The register method now performs both subscriber registration (lines 135-140) and header emission (lines 143-147) atomically under the same mutex (acquired line 132). This prevents the race where handleRes could trigger event delivery before the subscription-id header was sent, causing "header already sent" errors.

The fix correctly implements the pattern suggested in previous reviews.

Based on past review comments.


33-33: LGTM! Consistent signature update for subscription ID tracking.

The resourceHandler signature change to include subID enables per-subscriber tracing and logging throughout the event delivery pipeline, improving observability.


390-400: LGTM! Subscription ID-aware event routing.

The handleRes method now correctly iterates subscribers by subID and passes it to handlers, enabling targeted delivery and better tracing of event flow.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1)

421-454: Test doesn't effectively validate subscription errors.

This test still has the issue identified in the previous review. The context is cancelled before Subscribe is called (lines 446-447), so the error originates from context cancellation rather than from the mock's subscribeFunc returning errors.New("subscribe failed") (line 428). The mock's error path is never exercised.

To properly test subscription errors, consider:

Option 1: Test immediate RPC setup errors

 func TestGrpcTransport_Subscribe_Error(t *testing.T) {
-	// Note: Subscribe errors from the stream happen asynchronously in the server's goroutine
-	// The Subscribe method itself may succeed, but the stream may fail later
-	// To test subscription failure, we need to use a context that's already cancelled
-
 	service := &mockCloudEventService{
 		subscribeFunc: func(req *pbv1.SubscriptionRequest, stream pbv1.CloudEventService_SubscribeServer) error {
 			return errors.New("subscribe failed")
 		},
 	}
 
 	conn, cleanup := setupMockServer(t, service)
 	defer cleanup()
 
 	client := pbv1.NewCloudEventServiceClient(conn)
 	transport := &grpcTransport{
 		client:    client,
 		errorChan: make(chan error, 1),
 		closeChan: make(chan struct{}),
 		getSubscriptionRequest: func() *pbv1.SubscriptionRequest {
 			return &pbv1.SubscriptionRequest{}
 		},
 	}
 
-	// Use a cancelled context to force immediate error
-	ctx, cancel := context.WithCancel(context.Background())
-	cancel()
-
-	err := transport.Subscribe(ctx)
-	if err == nil {
-		t.Log("Subscribe succeeded with cancelled context (server-side error will be reported later)")
-		// This is acceptable - the error will be caught when trying to receive
-	}
+	ctx := context.Background()
+	err := transport.Subscribe(ctx)
+	if err != nil {
+		t.Errorf("Subscribe failed immediately: %v", err)
+	}
+	
+	// The error should be reported via errorChan when the stream fails
+	select {
+	case err := <-transport.errorChan:
+		if err == nil {
+			t.Fatal("expected error from failed subscription, got nil")
+		}
+		t.Logf("Received expected subscription error: %v", err)
+	case <-time.After(1 * time.Second):
+		t.Fatal("timeout waiting for subscription error in errorChan")
+	}
 }

Option 2: Separate tests for sync vs async errors

Create distinct tests:

  • TestGrpcTransport_Subscribe_ImmediateError for RPC setup failures
  • TestGrpcTransport_Subscribe_StreamError (already exists as TestGrpcTransport_Receive_StreamError at line 705) for async stream failures
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f81f8d7 and 81df60f.

📒 Files selected for processing (18)
  • pkg/cloudevents/constants/constants.go (1 hunks)
  • pkg/cloudevents/generic/clients/baseclient.go (5 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (14 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (2 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
  • test/integration/cloudevents/server/grpc.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/fake/fakeoptions.go
  • test/integration/cloudevents/server/grpc.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/constants/constants.go
  • test/integration/cloudevents/certrotation_grpc_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/server/grpc/broker_test.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/server/grpc/broker_test.go
🧬 Code graph analysis (3)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (5)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-26)
  • NewSourceOptions (28-39)
pkg/cloudevents/server/grpc/broker.go (6)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (1)
  • CloudEventService_SubscribeServer (146-149)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/server/grpc/broker_test.go (2)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewAgentOptions (10-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: integration
  • GitHub Check: verify
  • GitHub Check: unit
  • GitHub Check: build
🔇 Additional comments (8)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)

86-90: LGTM!

The no-op Subscribe implementation is appropriate for the v1 transport since subscription is handled automatically during Connect (via protocol options at lines 41-48). This method satisfies the updated transport interface while maintaining backward compatibility.

pkg/cloudevents/server/grpc/broker.go (2)

124-152: Race condition fix verified - registration and header emission are now atomic.

The previous race condition has been correctly addressed. The register method now calls SendHeader while holding bkr.mu (line 143), ensuring the subscriber is not visible to handleRes until after the subscription-id header has been successfully sent. The error handling (line 145) properly cleans up by deleting the subscriber if SendHeader fails.

This guarantees that:

  • Subscribers become visible to handleRes only after their header is sent
  • No "header already sent" errors occur from racing Send calls
  • Failed registrations are cleaned up atomically

229-254: Subscribe flow correctly integrates with atomic registration.

The Subscribe method properly uses the updated register signature, passing subReq and subServer so that registration and header emission occur atomically. The returned subID is used for subsequent logging and unregistration, providing clear traceability throughout the subscription lifecycle.

pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (1)

5-5: Type validation using reflection is appropriate for this test.

The addition of runtime type checking ensures that BuildCloudEventsSourceOptions constructs the correct transport implementation for each configuration type. Using reflect.TypeOf(...).String() to obtain the concrete type name is a straightforward approach for validating the builder's behavior.

Also applies to: 33-37, 59-59, 76-76, 112-116

pkg/cloudevents/server/grpc/broker_test.go (1)

13-13: V2 transport API usage is correct.

The test properly uses grpcv2.NewAgentOptions and calls Subscribe(ctx) after Connect(ctx), aligning with the new v2 transport contract introduced in this PR. This explicit subscription step provides clear success/failure feedback as described in the PR objectives.

Also applies to: 93-100

pkg/cloudevents/generic/options/mqtt/agentoptions.go (1)

141-147: LGTM! No-op Subscribe implementation is appropriate.

The no-op Subscribe method correctly satisfies the CloudEventsTransport interface requirement. The comments clearly explain that subscription is managed by the CloudEvents client during receiver startup, and the TODO appropriately flags future v2 enhancement opportunities.

pkg/cloudevents/generic/clients/baseclient.go (2)

99-101: LGTM! Proper cleanup of reconnectedChan.

The reconnectedChan is now properly closed when the context is done, preventing channel leaks and ensuring clean shutdown.


248-258: LGTM! Non-blocking signal send prevents deadlocks.

The refactored sendReconnectedSignal now accepts a context and uses a non-blocking select with default case. This prevents the sender from blocking if no receiver is listening on reconnectedChan, improving robustness.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
pkg/cloudevents/generic/clients/baseclient.go (1)

158-176: Avoid calling transport.Subscribe while holding baseClient mutex

Lines 159–173 hold c.Lock() while invoking c.transport.Subscribe(ctx), which may perform blocking network I/O. This can delay other operations that need the same lock (e.g., readiness updates) if the server is slow or unresponsive, and it reintroduces the earlier “Subscribe called while holding lock” concern.

You can reduce contention by:

  • Doing the “already subscribed?” check and any in-memory state changes under the lock,
  • Releasing the lock before calling Subscribe(ctx), and
  • Re-acquiring the lock afterward to set receiverChan and start the receiver goroutine.

This preserves safety while keeping I/O out of the critical section.

pkg/cloudevents/server/grpc/broker.go (1)

124-152: Atomic subscriber registration and header emission resolve prior race

The new register implementation (Lines 125–151) now:

  • Holds bkr.mu while inserting the subscriber into bkr.subscribers,
  • Sends the subscription-id header via subServer.SendHeader(...) inside the same critical section, and
  • Rolls back the subscriber and returns an error if SendHeader fails.

Subscribe then uses the returned subID and ensures unregister is called both on send errors and on normal stream teardown (Lines 229–271).

This addresses the earlier race where events could be sent before SendHeader, causing “header already sent” errors and bogus Subscribe failures; the registration and header emission are now atomic from the broker’s perspective.

Also applies to: 167-273

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1)

421-454: TestGrpcTransport_Subscribe_Error doesn’t exercise server-side subscribe failure

In this test, the context is cancelled before calling transport.Subscribe (Lines 446–447). As a result, any failure you see will be due to client-side context cancellation, not the mock server’s subscribeFunc returning errors.New("subscribe failed"). That means the test never truly validates how the transport behaves when the server returns an error from the Subscribe RPC.

To actually test subscription errors, consider:

  • Using a non-cancelled context.Background() for the call, and
  • Either asserting that Subscribe(ctx) returns the mock error immediately, or
  • Keeping Subscribe successful and then asserting that the stream failure is surfaced on transport.errorChan in a Receive-focused test.

That will ensure the subscription error path (from the server side) is covered rather than just context cancellation.

🧹 Nitpick comments (1)
pkg/cloudevents/generic/options/v2/grpc/transport.go (1)

51-72: Narrow mutex scope to keep network I/O outside critical sections

Connect, Send, Subscribe, and Close (Lines 51–72, 74–92, 94–124, 163–180) all hold t.mu while doing potentially blocking operations:

  • Dialer.Dial() in Connect
  • Publish() in Send
  • client.Subscribe() and Header() in Subscribe
  • Dialer.Close() in Close

This is functionally safe but can serialize unrelated callers (e.g., concurrent Sends) and increase tail latency if the server is slow or the network stalls.

You can keep the same safety with less contention by:

  • Locking only long enough to read/validate shared state (client, subClient, closeChan) and copy what you need into locals,
  • Unlocking before issuing the RPC, and
  • Re-locking only around state updates (e.g., assigning subClient, subID, or swapping closeChan).

This keeps the lock focused on protecting in-memory state rather than I/O duration.

Also applies to: 74-92, 94-124, 163-180

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81df60f and de876e6.

📒 Files selected for processing (18)
  • pkg/cloudevents/constants/constants.go (1 hunks)
  • pkg/cloudevents/generic/clients/baseclient.go (5 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (14 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (2 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
  • test/integration/cloudevents/server/grpc.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/options.go
  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/constants/constants.go
  • pkg/cloudevents/generic/options/fake/fakeoptions.go
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/clients/baseclient.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-08-27T03:21:36.789Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/server/grpc/metrics/metrics.go:230-236
Timestamp: 2025-08-27T03:21:36.789Z
Learning: In the gRPC metrics implementation for open-cluster-management.io/sdk-go, the grpc_server_ce_called_total metric intentionally has different semantics for unary vs stream calls: unary calls increment it once per RPC invocation, while stream calls increment it once per received message. This is deliberate design to provide granular visibility into stream message activity.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
🧬 Code graph analysis (7)
pkg/cloudevents/generic/options/v2/grpc/options.go (5)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/options.go (2)
  • CloudEventsAgentOptions (76-89)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
test/integration/cloudevents/certrotation_grpc_test.go (4)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsAgentOptions (76-89)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewAgentOptions (10-26)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (5)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-26)
  • NewSourceOptions (28-39)
pkg/cloudevents/generic/options/builder/optionsbuilder.go (3)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewSourceOptions (28-39)
  • NewAgentOptions (10-26)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/v2/grpc/transport.go (8)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (5)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewSourceOptions (28-39)
test/integration/cloudevents/util/grpc.go (1)
  • NewGRPCSourceOptions (16-18)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
pkg/cloudevents/constants/constants.go (1)
  • ConfigTypeGRPC (5-5)
pkg/cloudevents/server/grpc/broker.go (6)
pkg/cloudevents/generic/types/types.go (1)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (1)
  • CloudEventService_SubscribeServer (146-149)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (13)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)

10-26: LGTM! Constructor correctly structured with clear TODO.

The NewAgentOptions constructor properly initializes the agent options with a placeholder subscription source (types.SourceAll). The TODO comment clearly explains that agent-specific subscription logic is deferred because the gRPC agent client is not yet utilized.


28-39: LGTM! Source options constructor is correctly implemented.

The NewSourceOptions constructor properly initializes source-specific subscription options with the correct fields (Source, DataType) and returns a well-formed CloudEventsSourceOptions struct.

pkg/cloudevents/generic/options/builder/optionsbuilder.go (1)

10-10: LGTM! Successfully migrated to v2 gRPC constructors.

The builder correctly switches from v1 (grpc.New*Options) to v2 (grpcv2.New*Options) constructors with proper aliasing and matching signatures. This aligns with the broader v2 transport migration.

Also applies to: 61-61, 74-74

pkg/cloudevents/generic/options/grpc/agentoptions.go (1)

23-23: LGTM! Deprecation and no-op Subscribe method are correctly implemented.

The deprecation notice properly directs users to the v2 implementation, and the no-op Subscribe method is appropriate for the v1 transport where subscription is handled during receiver startup.

Also applies to: 86-90

pkg/cloudevents/generic/options/mqtt/agentoptions.go (1)

141-147: LGTM! No-op Subscribe method with forward-looking TODO.

The no-op Subscribe implementation is appropriate for the MQTT v1 transport where subscription is managed by the CloudEvents client. The TODO appropriately flags potential future decoupling work.

pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1)

135-141: LGTM! Consistent no-op Subscribe method.

The no-op Subscribe implementation matches the pattern in mqtt/agentoptions.go and is appropriate for the MQTT source transport.

test/integration/cloudevents/certrotation_grpc_test.go (1)

14-14: LGTM! Integration test correctly migrated to v2.

The certificate rotation test now uses the v2 gRPC constructor, maintaining test coverage for the new transport implementation.

Also applies to: 22-22

test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1)

19-23: LGTM! Test helper correctly implements v2 source options.

The GetGRPCSourceOptionsV2 helper function properly constructs v2 gRPC source options for integration testing. The blank identifier for the unused ctx parameter is appropriate.

pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (1)

5-5: Based on the verification from the script output and web search results, I can now analyze the review comment.

The script confirmed that:

  1. The reflect package uses package name, whereas link names use package path
  2. The grpcTransport struct is defined in pkg/cloudevents/generic/options/v2/grpc/transport.go at line 29 with package grpc

The test expectation "*grpc.grpcTransport" on line 76 is correct because:

  • The v2 package declares package grpc (not package v2grpc)
  • When reflect.TypeOf().String() is called on a pointer to grpcTransport, it formats as *<package_name>.<type_name>
  • Therefore, *grpc.grpcTransport is the correct representation for the v2 gRPC transport type

The reflection-based type checking is a valid approach and cleaner than the previous JSON-based method. The expected type strings for both MQTT and gRPC are accurate reflections of the actual type names returned by the v2 constructors.

Verify the expected transport type string for gRPC v2.

The reflection-based type verification is correct. The expected type string "*grpc.grpcTransport" on line 76 accurately reflects the actual type returned by grpcv2.NewSourceOptions() from the v2/grpc package, which declares package grpc. The test changes are valid.

pkg/cloudevents/generic/clients/baseclient.go (1)

179-194: Resubscribe/backoff and non-blocking reconnection signaling look solid

The resubscribe loop (Lines 184–194) now:

  • Retries transport.Subscribe(ctx) only when subscribed == false,
  • Uses the shared DelayFn backoff to avoid a tight retry loop on failure, and
  • Calls sendReconnectedSignal(ctx) only after a successful resubscribe.

Combined with rebuilding receiverCtx on restart (Line 226) and making sendReconnectedSignal non-blocking (Lines 249–258), this design avoids CPU spin, prevents goroutine blocking when no one is listening, and provides a clean hook for resync logic.

No issues from my side here.

Also applies to: 226-229, 249-258

pkg/cloudevents/generic/options/v2/grpc/transport.go (1)

126-157: Receive/heartbeat/connection-state handling is coherent and non-blocking

The combination of:

  • Receive’s precondition check on subClient (Lines 127–132),
  • The separate closeChan used to unblock the outer Receive select (Lines 146–150),
  • startEventsReceiver’s handling of heartbeat vs non-heartbeat events and non-blocking error reporting on errorChan (Lines 182–229), and
  • monitorConnectionState’s state-based error reporting and connection close for TransientFailure/Idle (Lines 233–268)

yields a clean lifecycle:

  • Subscribe must succeed before Receive, otherwise Receive fails fast.
  • Heartbeats are filtered out from the handler but still drive health checks.
  • Stream/connection errors are surfaced to callers via ErrorChan() without risking goroutine blockage if the channel is full.
  • Connection state monitoring exits on context cancellation or terminal states.

The structure looks correct and consistent with the intended reconnect/error-signaling behavior.

Also applies to: 182-229, 231-272

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1)

456-767: Receive/heartbeat and connection-monitoring tests provide good coverage

The Receive-related tests (TestGrpcTransport_Receive_WithEvents, _HeartbeatFiltering, _CloseChan, _StreamError, ..._MixedEventsWithHeartbeats) together with the heartbeat/health-check and connection-state monitoring tests (TestGrpcTransport_HeartbeatIntegration, ...ConnectionStateMonitoring_*) exercise:

  • Successful event delivery and handler invocation,
  • Heartbeat-only streams being filtered from the handler while still driving health checks,
  • Proper shutdown when closeChan is closed or the context is cancelled,
  • Stream errors being reported on errorChan, and
  • Connection state transitions (server shutdown, context cancellation, full error channels).

These scenarios align well with the intended behavior of the transport and should help catch regressions in future changes.

Also applies to: 895-1334

pkg/cloudevents/server/grpc/broker.go (1)

283-405: SubID-aware dispatch and enhanced logging look correct

The changes in respondResyncSpecRequest and handleRes (Lines 283–405):

  • Use eventContext/obj.Context consistently in logs to aid debugging,
  • Iterate over subscribers keyed by subID and pass subID into the resourceHandler, and
  • Filter by clusterName and dataType before invoking the handler.

This keeps routing semantics intact (only matching subscribers receive events) while enriching logs with subID and context, which should help diagnose resync and fan-out issues without altering behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pkg/cloudevents/generic/options/v2/grpc/transport.go (1)

182-229: Consider protecting subClient reads in startEventsReceiver.

Line 187 reads t.subClient.Recv() without mutex protection. While Subscribe sets subClient under mutex and Receive validates it under mutex before starting this goroutine, the lack of synchronization could theoretically race if the API allows concurrent calls. The expected usage pattern (Connect→Subscribe→Receive) makes this safe in practice, but explicit synchronization would be more robust.

Consider one of these approaches:

  1. Add a comment documenting that concurrent Subscribe/Receive calls are not supported and callers must serialize them.
  2. Capture subClient into a local variable in Receive while holding the mutex, then pass it to startEventsReceiver to avoid accessing the shared field.

The current implementation is acceptable given the expected usage pattern shown in tests and baseclient.go.

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1)

421-454: Subscribe error test validates context cancellation, not RPC failure.

The test uses a pre-cancelled context (lines 446-447), so it validates that Subscribe handles context cancellation gracefully rather than exercising the mock's subscribeFunc error on line 428. The comment (lines 422-424) acknowledges this behavior.

To test actual Subscribe RPC failures, consider a separate test that:

  1. Uses a valid (non-cancelled) context
  2. Has the mock's subscribeFunc return an error immediately (e.g., errors.New("subscribe RPC failed"))
  3. Asserts that Subscribe returns this error

The current test validates a valid scenario (context cancellation) and is acceptable, but renaming it to TestGrpcTransport_Subscribe_ContextCancellation would better reflect what it tests.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de876e6 and c33e94f.

📒 Files selected for processing (18)
  • pkg/cloudevents/constants/constants.go (1 hunks)
  • pkg/cloudevents/generic/clients/baseclient.go (5 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (14 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (2 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
  • test/integration/cloudevents/server/grpc.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • test/integration/cloudevents/certrotation_grpc_test.go
  • pkg/cloudevents/generic/options/fake/fakeoptions.go
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/options.go
  • test/integration/cloudevents/server/grpc.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker_test.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-08-27T03:21:36.789Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/server/grpc/metrics/metrics.go:230-236
Timestamp: 2025-08-27T03:21:36.789Z
Learning: In the gRPC metrics implementation for open-cluster-management.io/sdk-go, the grpc_server_ce_called_total metric intentionally has different semantics for unary vs stream calls: unary calls increment it once per RPC invocation, while stream calls increment it once per received message. This is deliberate design to provide granular visibility into stream message activity.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
🧬 Code graph analysis (7)
test/integration/cloudevents/server/grpc.go (1)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (5)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewSourceOptions (28-39)
test/integration/cloudevents/util/grpc.go (1)
  • NewGRPCSourceOptions (16-18)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
pkg/cloudevents/constants/constants.go (1)
  • ConfigTypeGRPC (5-5)
pkg/cloudevents/generic/options/v2/grpc/options.go (5)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/options.go (2)
  • CloudEventsAgentOptions (76-89)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
pkg/cloudevents/generic/options/v2/grpc/transport.go (8)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
pkg/cloudevents/server/grpc/broker.go (6)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (1)
  • CloudEventService_SubscribeServer (146-149)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/server/grpc/broker_test.go (2)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewAgentOptions (10-26)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (5)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-26)
  • NewSourceOptions (28-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (20)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)

86-90: LGTM! Consistent no-op Subscribe implementation for v1 transport.

The no-op Subscribe implementation is appropriate for this v1 gRPC agent transport, where subscription is managed by the CloudEvents SDK during receiver startup. This aligns with the broader pattern across v1 transports (MQTT, gRPC source/agent) while satisfying the updated CloudEventTransport interface.

pkg/cloudevents/generic/options/mqtt/agentoptions.go (1)

141-147: LGTM! Consistent no-op with helpful TODO for future enhancement.

The no-op Subscribe implementation correctly satisfies the updated CloudEventTransport interface. The TODO comment appropriately flags a potential future enhancement to implement native subscription logic in v2, decoupled from the CloudEvents SDK.

pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1)

135-141: LGTM! Consistent with other v1 MQTT transport implementations.

The no-op Subscribe method maintains consistency with the MQTT agent transport approach, with an appropriate TODO comment for potential native subscription logic in v2.

pkg/cloudevents/generic/options/options.go (1)

30-33: LGTM! Well-documented interface expansion enabling v2 transport.

The Subscribe method addition to CloudEventTransport is a well-documented public API expansion. The documentation clearly specifies the call order (after Connect, before Receive), the non-blocking nature, and error semantics. All transport implementations have been updated to satisfy this interface (v1 with no-ops, v2 with actual subscription logic).

pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)

81-85: LGTM! Consistent no-op implementation for v1 gRPC source transport.

The Subscribe implementation maintains consistency with the gRPC agent transport approach, appropriately delegating subscription management to the CloudEvents SDK during receiver startup.

pkg/cloudevents/constants/constants.go (1)

7-10: LGTM! Well-documented constant for gRPC subscription header.

The GRPCSubscriptionIDKey constant is well-documented and follows Go naming conventions. It's appropriately used in the gRPC server integration for sending subscription ID headers.

test/integration/cloudevents/server/grpc.go (1)

173-175: LGTM! Proper subscription header emission with fail-fast error handling.

The SendHeader call correctly uses the new GRPCSubscriptionIDKey constant to emit the subscription ID header. The fail-fast error handling (immediate return on error) is appropriate, ensuring clients receive subscription confirmation before proceeding.

test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1)

19-23: LGTM! Well-structured v2 gRPC test integration.

The GetGRPCSourceOptionsV2 function correctly constructs CloudEventsSourceOptions using the v2 gRPC constructor (grpcv2.NewSourceOptions). The blank identifier for the unused context parameter is appropriate, and the return tuple matches the expected test signature.

pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (1)

5-5: LGTM! Improved test assertion using reflection.

The switch from JSON-based containment checks to direct type reflection is more robust and clearer for verifying transport types.

Also applies to: 112-116

pkg/cloudevents/server/grpc/broker_test.go (1)

93-100: LGTM! Test correctly exercises v2 Subscribe flow.

The test now properly calls Subscribe(ctx) after Connect and before Send/Receive, aligning with the new v2 gRPC transport semantics.

pkg/cloudevents/generic/options/v2/grpc/options.go (1)

10-39: LGTM! Clean v2 option constructors.

Both NewAgentOptions and NewSourceOptions correctly construct transport options with appropriate subscription requests. The agent uses SourceAll as a placeholder (with a TODO noting future refinement), while the source uses the provided sourceID.

pkg/cloudevents/generic/clients/baseclient.go (2)

164-189: LGTM! Subscribe lifecycle correctly implemented.

The explicit Subscribe(ctx) call before starting the receiver (lines 164-168) and the resubscribe flow with backoff (lines 179-189) properly implement the v2 transport semantics. The subscribed flag ensures resubscription happens after reconnection, and the backoff delay prevents tight retry loops.


244-270: LGTM! Safe channel cleanup and non-blocking signaling.

The closeChannels() method (lines 244-256) provides safe, centralized cleanup with nil checks. The updated sendReconnectedSignal (lines 258-270) is now non-blocking via select/default and logs when no receiver is listening, preventing potential deadlocks.

pkg/cloudevents/generic/options/v2/grpc/transport.go (4)

43-49: LGTM! Safe initialization and close handling.

The nil check in Close (line 170) prevents panics when Close is called before Connect succeeds. The select-based close guard (lines 171-176) prevents double-close panics. Both address concerns from past reviews.

Also applies to: 163-180


94-124: LGTM! Subscribe correctly protected with mutex.

The mutex protection (lines 95-96) when setting subClient and subID (lines 119-120) addresses the race condition concern from past reviews.


126-157: LGTM! Receive validates subscription state.

The nil check for subClient (lines 127-132) ensures Subscribe was called before Receive, addressing past review feedback.


233-272: LGTM! Connection state monitoring handles failures correctly.

The monitorConnectionState goroutine properly detects TransientFailure, Idle, and Shutdown states, reports errors via the error channel with non-blocking send (lines 254-258), and exits cleanly on context cancellation (lines 237-241).

pkg/cloudevents/server/grpc/broker.go (2)

124-152: LGTM! Race condition fixed with atomic registration.

The register method now calls SendHeader (lines 143-147) while holding the mutex, ensuring the subscription ID is sent before the subscriber becomes visible to handleRes. This addresses the critical race condition from past reviews where events could be delivered before the header was sent, causing "header already sent" errors.


229-270: LGTM! Per-subscription context and lifecycle management.

The handler now receives subID (line 229), enabling targeted dispatch. The context cancellation handling (lines 242-247) correctly avoids treating expected stream closure as an error. The subID-based error logging (lines 260-261, 270) improves observability.

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1)

27-128: LGTM! Comprehensive test coverage.

The test suite thoroughly exercises:

  • Constructor validation and wiring (lines 130-244)
  • Send/Subscribe/Receive lifecycle
  • Heartbeat filtering and health checks (lines 456-767, 895-1079)
  • Connection state monitoring under various conditions (lines 1081-1214)
  • Error handling and edge cases

Also applies to: 456-767, 848-1214

@skeeey
Copy link
Member Author

skeeey commented Nov 14, 2025

/unhold

@skeeey
Copy link
Member Author

skeeey commented Nov 14, 2025

/assign @qiujian16

@skeeey
Copy link
Member Author

skeeey commented Nov 14, 2025

/cc @morvencao

@openshift-ci openshift-ci bot requested a review from morvencao November 14, 2025 08:47
return nil
}

func (t *grpcTransport) Subscribe(ctx context.Context) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we support multiple subscribes calls? If not, you might want a subscribed flag

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a flag to avoid multiple subscribes calls

return nil
}

func (t *grpcTransport) Receive(ctx context.Context, handleFn options.ReceiveHandlerFn) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment on this is a blocking call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

healthChecker := heartbeat.NewHealthChecker(t.opts.ServerHealthinessTimeout, t.errorChan)

// start to receive the events from stream
logger.Info("receiving events", "subID", t.subID, "subOption", subOption)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we pass the subID in the logger's context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass the subID and a context in the receivehanlder, so that we have the context in it

// Subscribe sends a subscription request to the transport to subscribe topics/services.
// This is a non-blocking method that should be called after Connect and before Receive.
// Returns an error if the subscription request cannot be sent.
Subscribe(ctx context.Context) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this step(Subecribe() call) rely on an active connection? Will the CloudEvents client retry if this step fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it rely on an active connection, the CloudEvents client does not retry if this step fails, the CloudEvents still depends on the errorChan to retry

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
pkg/cloudevents/generic/clients/baseclient.go (1)

153-170: Avoid holding baseClient mutex while calling transport.Subscribe.

subscribe takes c.Lock() with defer c.Unlock() and then calls c.transport.Subscribe(ctx) (Lines 153–168). If Subscribe blocks on network I/O or server-side backoff, this holds the client mutex for the entire duration, delaying other operations that need the lock (e.g., transitions on receiverChan or clientReady).

Consider the pattern previously suggested:

  • Under the lock, check/mark that a subscription goroutine is being started and read any needed fields.
  • Release the lock before calling transport.Subscribe(ctx).
  • After Subscribe returns, reacquire the lock to set receiverChan and other client state (or abort if Subscribe failed).

This keeps subscription state safe while avoiding long critical sections around network calls.

pkg/cloudevents/server/grpc/broker.go (1)

124-152: Registration + subscription header emission now atomic and race-free.

register now:

  • Holds bkr.mu while inserting into subscribers.
  • Sends the subscription-id header via SendHeader under the same lock.
  • Rolls back the map entry if SendHeader fails.

This prevents the previous race where handleRes could deliver events and trigger Send before headers were sent, causing “header already sent” errors. The change also ensures the client either sees a valid header or a clean failure.

One small nit: the comment “return client id and error channel” is now stale; consider updating it to reflect the new (string, error) return.

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1)

421-454: TestGrpcTransport_Subscribe_Error still doesn’t exercise server-side subscribe failures.

This test cancels the context before calling transport.Subscribe (Lines 446–449). That causes the RPC to fail due to context cancellation, so the mock subscribeFunc’s errors.New("subscribe failed") is never meaningfully observed by the client. The assertion then accepts both success and failure, so the test isn’t validating a specific behavior.

To actually test subscription failures:

  • Use a non-cancelled context, e.g. ctx := context.Background().
  • Let subscribeFunc return an error immediately, and assert that transport.Subscribe(ctx) returns a non-nil error (or, alternatively, assert that the error is delivered on errorChan after Receive starts).

Example direction:

ctx := context.Background()
err := transport.Subscribe(ctx)
if err == nil {
    t.Fatalf("expected Subscribe to fail due to server error, got nil")
}

This will ensure the test fails if Subscribe stops propagating server-side errors.

🧹 Nitpick comments (3)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)

81-85: LGTM! No-op Subscribe implementation is appropriate for gRPC transport.

The no-op implementation is correct because subscription is configured during Connect (lines 39-42 via protocol.WithSubscribeOption) and activated when Receive calls StartReceiver (line 88). This aligns with the v2 transport pattern where the gRPC protocol layer handles subscription lifecycle internally.


Optional: Consider enhancing the comment to be more explicit about the gRPC-specific behavior:

-	// Subscription is handled by the cloudevents client during receiver startup.
-	// No action needed here.
+	// Subscription is configured during Connect (via SubscribeOption) and activated
+	// by the cloudevents client when StartReceiver is called. No action needed here.
pkg/cloudevents/generic/clients/baseclient.go (1)

178-189: Resubscribe loop ignores context cancellation during backoff.

In the resubscribe path, a failed Subscribe sleeps via RealTimer(DelayFn()).C() without checking ctx.Done() (Lines 181–184). If the outer context is canceled during this sleep, the goroutine will continue to retry until the timer fires.

Not critical, but you could make the backoff responsive to cancellation:

select {
case <-wait.RealTimer(DelayFn()).C():
    // retry
case <-ctx.Done():
    return
}

This keeps reconnect behavior the same while avoiding unnecessary retries after shutdown.

pkg/cloudevents/generic/options/v2/grpc/transport.go (1)

76-94: Avoid holding mu across Publish network calls (minor).

Send holds t.mu while converting the event and calling t.client.Publish (Lines 76–94). Since client is concurrency-safe, you can reduce lock scope:

  • Lock only to check t.client != nil and copy the client reference to a local.
  • Unlock before WritePBMessage and Publish.

That preserves correctness while avoiding unnecessary serialization of concurrent sends.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 806b100 and 51f6b23.

📒 Files selected for processing (20)
  • pkg/cloudevents/constants/constants.go (1 hunks)
  • pkg/cloudevents/generic/clients/agentclient_test.go (3 hunks)
  • pkg/cloudevents/generic/clients/baseclient.go (7 hunks)
  • pkg/cloudevents/generic/clients/sourceclient_test.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (2 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (14 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (3 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
  • test/integration/cloudevents/server/grpc.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • pkg/cloudevents/constants/constants.go
  • pkg/cloudevents/generic/options/fake/fakeoptions.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/options.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/clients/sourceclient_test.go
  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/clients/agentclient_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/server/grpc/broker.go
  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-08-27T03:21:36.789Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/server/grpc/metrics/metrics.go:230-236
Timestamp: 2025-08-27T03:21:36.789Z
Learning: In the gRPC metrics implementation for open-cluster-management.io/sdk-go, the grpc_server_ce_called_total metric intentionally has different semantics for unary vs stream calls: unary calls increment it once per RPC invocation, while stream calls increment it once per received message. This is deliberate design to provide granular visibility into stream message activity.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
🧬 Code graph analysis (8)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (5)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewSourceOptions (28-39)
test/integration/cloudevents/util/grpc.go (1)
  • NewGRPCSourceOptions (16-18)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
pkg/cloudevents/constants/constants.go (1)
  • ConfigTypeGRPC (5-5)
pkg/cloudevents/generic/options/builder/optionsbuilder.go (3)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewSourceOptions (28-39)
  • NewAgentOptions (10-26)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
test/integration/cloudevents/certrotation_grpc_test.go (3)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsAgentOptions (76-89)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewAgentOptions (10-26)
pkg/cloudevents/generic/options/v2/grpc/transport.go (8)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
pkg/cloudevents/server/grpc/broker.go (6)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (1)
  • CloudEventService_SubscribeServer (146-149)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
test/integration/cloudevents/server/grpc.go (1)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (5)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-26)
  • NewSourceOptions (28-39)
pkg/cloudevents/generic/options/v2/grpc/options.go (5)
pkg/cloudevents/generic/options/grpc/agentoptions.go (1)
  • NewAgentOptions (24-36)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/options.go (2)
  • CloudEventsAgentOptions (76-89)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/grpc/sourceoptions.go (1)
  • NewSourceOptions (23-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (25)
pkg/cloudevents/generic/options/builder/optionsbuilder.go (3)

10-10: LGTM: Clean v2 package import.

The import alias follows Go conventions for versioned packages.


55-65: LGTM: Source options migration to v2.

The migration to v2 gRPC options is correct, with matching signatures and parameters.


67-78: LGTM: Agent options migration to v2.

The migration to v2 gRPC options is correct, with matching signatures and parameters.

pkg/cloudevents/generic/options/v2/grpc/options.go (2)

10-26: LGTM: Agent options constructor with documented placeholder.

The implementation is correct. The TODO comment appropriately documents that the Source field is a placeholder for all sources since the gRPC agent client is not currently utilized.


28-39: LGTM: Clean source options constructor.

The implementation is correct and straightforward.

test/integration/cloudevents/server/grpc.go (2)

17-17: LGTM: Required imports for subscription header.

The imports are necessary for emitting the subscription ID header.

Also applies to: 25-25


173-175: LGTM: Proper subscription header emission in test server.

The SendHeader call is correctly positioned after setup and before blocking on context. Error handling is fail-fast, which is appropriate. The hardcoded "sub-test-id" is acceptable for test purposes.

pkg/cloudevents/generic/clients/agentclient_test.go (3)

75-81: LGTM: Context-aware callback with proper cancellation.

The callback signature is updated correctly and properly uses the context for cancellation detection.


158-164: LGTM: Context-aware callback with proper cancellation.

The callback signature is updated correctly and properly uses the context for cancellation detection.


328-332: LGTM: Updated callback signature for API consistency.

The callback signature is updated correctly. Not using the context parameter is acceptable for this test case.

pkg/cloudevents/generic/clients/sourceclient_test.go (3)

64-70: LGTM: Context-aware callback with proper cancellation.

The callback signature is updated correctly and properly uses the context for cancellation detection.


137-143: LGTM: Context-aware callback with proper cancellation.

The callback signature is updated correctly and properly uses the context for cancellation detection.


335-339: LGTM: Updated callback signature for API consistency.

The callback signature is updated correctly. Not using the context parameter is acceptable for this test case.

test/integration/cloudevents/cloudevetns_grpc_v2_test.go (2)

19-23: LGTM: Correctly implemented v2 test helper.

The function properly uses the v2 gRPC options constructor with correct parameters. The unused context parameter is appropriately handled with the blank identifier.


15-15: LGTM: Proper v2 test registration.

The test is correctly registered with Ginkgo using the v2 options helper.

pkg/cloudevents/generic/options/options.go (2)

10-13: LGTM: Context-aware handler enables cancellation.

The addition of the context parameter to ReceiveHandlerFn is a breaking change but enables proper cancellation and context-aware event handling. All test usages in the PR have been updated to match the new signature.


30-33: LGTM: Subscribe method adds explicit subscription lifecycle.

The new Subscribe method provides explicit control over subscription setup. The documentation clearly specifies it's non-blocking and should be called after Connect but before Receive. Based on learnings, this method relies on an active connection and does not automatically retry on failure; clients should monitor the ErrorChan for subscription errors.

test/integration/cloudevents/certrotation_grpc_test.go (2)

14-14: LGTM: Clean v2 package import.

The import alias follows Go conventions for versioned packages.


20-23: LGTM: Agent options migration to v2.

The migration to v2 gRPC options is correct, with matching signatures and parameters.

pkg/cloudevents/generic/clients/baseclient.go (1)

245-289: Channel closing semantics for receiver/reconnected signals look correct.

closeChannels closes and nils both receiverChan and reconnectedChan under Lock, while sendReceiverSignal, sendReconnectedSignal, and getReceiverChan access them under RLock (Lines 245–289). This prevents sends on closed channels and avoids data races between close and send/receive.

The getReceiverChan indirection also ensures the select in the receiver loop handles closed/nil channels safely. No issues here.

pkg/cloudevents/generic/options/v2/grpc/transport.go (2)

96-131: Subscribe guard and header validation look solid.

Subscribe now:

  • Verifies t.client is non-nil.
  • Enforces a single active subscription via t.subscribed.
  • Requires exactly one subscription-id header and stores subID/subClient under mu.

This matches the intended “one subscription per transport” model and provides clear failure modes. Looks good.


193-240: Event/heartbeat handling in startEventsReceiver matches expectations.

The receiver loop correctly:

  • Separates heartbeat events by Type == HeartbeatCloudEventsType and forwards them to the heartbeat channel.
  • Converts other messages to CloudEvents and invokes the handler with context.
  • Reports stream errors via errorChan without blocking (non-blocking send with log fallback).

Behavior aligns with the existing heartbeat/health-check design.

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1)

29-128: Mock service and bufconn setup look appropriate.

mockCloudEventService and setupMockServer give a flexible in-memory gRPC server using bufconn and grpc.NewClient, which is a good basis for transport testing. No issues here.

pkg/cloudevents/server/grpc/broker.go (2)

229-271: Per-subscriber handler now correctly keyed by subID.

Subscribe registers a handler closure that logs with subID and sends events into eventCh, and handleRes iterates for subID, subscriber := range bkr.subscribers and passes subID to the handler (Lines 229–271, 370–405). This ties broker-side logging and unregistration cleanly to the subscription ID header used by the transport and tests.

The added context comments around expected stream closure conditions are also helpful. No functional issues here.

Also applies to: 370-405


95-121: Logging changes improve debuggability without over-logging payloads.

Switching log keys from "event" to "eventContext" in Publish and resync-related paths (Lines 95–121, 307–361) avoids dumping full event data while still surfacing enough context (type, IDs, extensions) for debugging, which is desirable in controllers.

Looks good as-is.

Also applies to: 307-361

@qiujian16
Copy link
Member

/approve
leave to @morvencao for final review.

@openshift-ci
Copy link

openshift-ci bot commented Nov 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qiujian16, skeeey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/cloudevents/generic/clients/baseclient.go (1)

153-170: Consider moving initial Subscribe call outside the mutex

subscribe holds c.Lock() for the whole function, including the initial c.transport.Subscribe(ctx) call. If Subscribe performs network I/O or blocks on the server, this can hold the client mutex longer than necessary and delay other operations that need the lock (e.g., setClientReady, closeChannels, or other future state changes).

You can reduce contention by limiting the critical section to state checks/updates around receiverChan and doing the blocking Subscribe call without the lock, similar to the resubscribe path:

func (c *baseClient) subscribe(ctx context.Context, receive receiveFn) {
-	c.Lock()
-	defer c.Unlock()
-
-	logger := klog.FromContext(ctx)
-	// make sure there is only one subscription go routine starting for one client.
-	if c.receiverChan != nil {
-		logger.V(2).Info("the subscription has already started")
-		return
-	}
-
-	// send subscription request before starting to receive events
-	if err := c.transport.Subscribe(ctx); err != nil {
-		runtime.HandleErrorWithContext(ctx, err, "failed to subscribe")
-		return
-	}
-
-	c.receiverChan = make(chan int)
+	logger := klog.FromContext(ctx)
+	c.Lock()
+	if c.receiverChan != nil {
+		c.Unlock()
+		logger.V(2).Info("the subscription has already started")
+		return
+	}
+	c.receiverChan = make(chan int)
+	c.Unlock()
+
+	// send subscription request before starting to receive events
+	if err := c.transport.Subscribe(ctx); err != nil {
+		runtime.HandleErrorWithContext(ctx, err, "failed to subscribe")
+		return
+	}

This keeps state changes serialized while avoiding a lock around the potentially blocking network call.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51f6b23 and 1e39a04.

📒 Files selected for processing (20)
  • pkg/cloudevents/constants/constants.go (1 hunks)
  • pkg/cloudevents/generic/clients/agentclient_test.go (3 hunks)
  • pkg/cloudevents/generic/clients/baseclient.go (7 hunks)
  • pkg/cloudevents/generic/clients/sourceclient_test.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (2 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (14 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (3 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
  • test/integration/cloudevents/server/grpc.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • pkg/cloudevents/constants/constants.go
  • pkg/cloudevents/generic/options/fake/fakeoptions.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/clients/sourceclient_test.go
  • test/integration/cloudevents/server/grpc.go
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/options.go
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/clients/agentclient_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/options.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/options.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • pkg/cloudevents/server/grpc/broker.go
🧬 Code graph analysis (3)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (5)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-26)
  • NewSourceOptions (28-39)
pkg/cloudevents/server/grpc/broker.go (6)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (1)
  • CloudEventService_SubscribeServer (146-149)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
test/integration/cloudevents/certrotation_grpc_test.go (3)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsAgentOptions (76-89)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewAgentOptions (10-26)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: verify
  • GitHub Check: unit
🔇 Additional comments (27)
pkg/cloudevents/generic/options/mqtt/agentoptions.go (1)

141-147: LGTM! No-op implementation aligns with current architecture.

The Subscribe method correctly implements a no-op pattern since MQTT subscriptions are established during the CloudEvents receiver startup (via cloudeventsmqtt.WithSubscribe in Connect). The TODO for future decoupling is appropriate.

pkg/cloudevents/generic/options/options.go (2)

13-13: Context-aware handler signature improves cancellation support.

Adding context.Context as the first parameter to ReceiveHandlerFn follows Go best practices and enables proper cancellation, deadline, and value propagation through the event handling chain.


30-33: Explicit subscription API improves transport lifecycle control.

The new Subscribe method provides explicit control over subscription requests, decoupling subscription from the Receive loop. The non-blocking design and clear documentation about the call sequence (Connect → Subscribe → Receive) improve clarity. Based on past review comments, implementations should note that Subscribe relies on an active connection and may not retry on failure.

pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1)

135-141: LGTM! Consistent no-op implementation.

The Subscribe method follows the same pattern as the agent transport, correctly delegating subscription management to the CloudEvents client during receiver startup. This consistency across source and agent transports is appropriate.

pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (1)

5-5: Improved test approach using reflection for type verification.

The refactored test uses reflect.TypeOf() to directly verify transport types instead of JSON marshaling. This is more precise, cleaner, and less brittle than string matching in JSON output. The new expectedTransportType field makes test expectations explicit.

Also applies to: 33-37, 59-59, 76-76, 112-116

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (5)

1-128: Excellent test infrastructure with in-memory gRPC server.

The test setup using bufconn for in-memory gRPC connections is well-designed. The configurable mockCloudEventService allows flexible testing of various scenarios. The infrastructure is clean and reusable across test cases.


130-244: Thorough constructor validation.

Both TestNewAgentOptions and TestNewSourceOptions comprehensively validate option construction, including transport wiring, field initialization, and subscription request generation. The tests verify that subscription requests contain the correct Source, ClusterName, and DataType values.


246-354: Send path and error channel well-tested.

The tests cover both successful sends and error propagation. The use of sync.Mutex to protect shared state in concurrent test scenarios is appropriate and prevents race conditions.


356-454: Subscribe tests cover success and error paths.

The subscription tests validate successful subscription setup and error handling. The use of a cancelled context in the error test was noted in previous reviews and is acceptable given the asynchronous nature of stream errors. The test provides adequate coverage.


456-1334: Comprehensive coverage of receive paths and edge cases.

The test suite thoroughly validates:

  • Event reception and handler invocation
  • Heartbeat filtering (ensuring heartbeats don't reach handlers)
  • Stream error handling and error channel propagation
  • Connection state monitoring and graceful shutdown
  • Integration scenarios with full lifecycle

The use of atomic.Int32 for concurrent counters is appropriate. The tests cover critical edge cases like server shutdown, context cancellation, and full error channels.

test/integration/cloudevents/certrotation_grpc_test.go (1)

14-14: Migration to v2 gRPC options aligns with deprecation.

The test correctly migrates from grpc.NewAgentOptions to grpcv2.NewAgentOptions, consistent with the deprecation notice in pkg/cloudevents/generic/options/grpc/agentoptions.go (line 23). The signature remains identical, ensuring no functional change.

Also applies to: 22-22

pkg/cloudevents/generic/options/grpc/agentoptions.go (2)

23-23: Deprecation properly documents migration path.

The deprecation comment correctly directs users to v2.grpc.NewAgentOptions, providing a clear upgrade path. The grammar was corrected from "using" to "use" as noted in previous reviews.


86-90: No-op Subscribe aligns with current architecture.

The Subscribe method correctly implements a no-op since the v1 gRPC transport manages subscriptions through the protocol layer during receiver startup. This is consistent with the MQTT transport implementations.

pkg/cloudevents/generic/clients/agentclient_test.go (1)

75-75: Test handlers updated for context-aware signature.

The Receive callback signatures are correctly updated to include context.Context as the first parameter, aligning with the ReceiveHandlerFn signature change in pkg/cloudevents/generic/options/options.go. The handlers retain their original logic while gaining access to the context for cancellation handling.

Also applies to: 158-158, 328-328

pkg/cloudevents/generic/clients/baseclient.go (5)

82-96: Reconnection loop + channel closing look safe and consistent

Using c.closeChannels() on ctx.Done() together with sendReceiverSignal/sendReconnectedSignal guarded by the same RWMutex cleanly avoids send-on-closed-channel panics and ensures the receiver goroutine shuts down once the client context is canceled. This reconnection/teardown path looks correct.


174-189: Subscribe lifecycle and resubscribe flow look coherent

The startReceiving / subscribed flags, restart/stop signals, and resubscribe loop are wired together coherently:

  • On restart, you rebuild receiverCtx, set startReceiving = true and subscribed = false so the next loop iteration resubscribes before starting Receive.
  • Resubscribe uses the same DelayFn backoff as connect, avoiding a tight retry loop.
  • The Receive callback now takes a context.Context, uses a per-event logger, and calls the provided receive(ctx, evt).

This matches the connection management in connect() and should behave correctly under disconnect/reconnect cycles.

Also applies to: 193-199, 211-225


245-257: Channel closing helper correctly synchronized with senders

closeChannels acquires the write lock, closes receiverChan / reconnectedChan, and nils them. Combined with:

  • sendReceiverSignal sending under RLock, and
  • sendReconnectedSignal sending under RLock,

this prevents concurrent send vs. close races and ensures readers using getReceiverChan() see either an open channel or nil. This is a good cleanup/concurrency improvement.


259-271: Non-blocking reconnection signaling is a good improvement

Making sendReconnectedSignal(ctx):

  • Guard the channel read under RLock, and
  • Use a non-blocking select with a log message when there is no active listener,

ensures reconnection notification can’t wedge the client if no goroutine is waiting on reconnectedChan. This is a solid safety/observability improvement.


285-289: getReceiverChan accessor is safe for use in select

Returning receiverChan under RLock and then using it in select (case signal, ok := <-c.getReceiverChan()) is safe:

  • If the channel is closed, the receive unblocks with ok == false.
  • If closeChannels has already set it to nil, the select case is effectively disabled.

This accessor keeps the field protected while avoiding send-on-closed-channel issues.

pkg/cloudevents/server/grpc/broker.go (8)

10-33: Imports and resourceHandler signature change look appropriate

Adding the constants and metadata imports and extending resourceHandler to func(ctx context.Context, subID string, res *cloudevents.Event) error make sense for emitting a subscription-id header and propagating that ID through handlers/logs. No issues here.


95-95: Logging key switch to eventContext is fine

Using "eventContext", evt.Context (instead of a generic "event") is clearer and matches the other logging changes in this file. Looks good.


125-152: Atomic registration + header send correctly fixes prior race

register now:

  • Acquires bkr.mu once,
  • Inserts the subscriber into bkr.subscribers,
  • Sends the subscription-id header while still holding the lock, and
  • Rolls back the map entry if SendHeader fails,

before finally incrementing metrics and returning the ID.

This makes subscriber visibility to handleRes atomic with header emission, eliminating the previous race where events could be sent before the header and SendHeader would fail with “headers already sent”.


155-164: Unregister logging/metrics update is consistent

unregister now logs id, clusterName, and dataType, and decrements GRPCCESubscribers with the same labels used on registration. This keeps metrics and logs aligned with the new per-subscription ID.


190-227: Heartbeat + send loop structure is reasonable

The dedicated goroutine that:

  • Selects on ctx.Done(), heartbeater.Heartbeat(), and eventCh,
  • Cancels the stream context and reports a single error via sendErrCh on send failure,

is a good pattern for serializing all gRPC Send calls on the stream and propagating a clear terminal error back to Subscribe. This matches gRPC’s non-concurrency-safe stream semantics.


256-272: Heartbeater start + final unsubscribe logic look correct

Starting the heartbeater with the same ctx used by the send goroutine, and then:

  • On sendErrCh, logging the error and unregistering subID, or
  • On ctx.Done(), unregistering subID and returning nil,

ensures each subscription cleans up its broker state and heartbeat loop when the stream terminates, without treating normal stream closure as an error. This matches the intended semantics.


309-361: Resync logging updates improve observability without changing behavior

The changes in respondResyncSpecRequest to consistently log "eventContext" at key decision points (no objs, invalid resourceVersion, update/delete paths) make troubleshooting resync flows easier and don’t alter control flow. Looks good.


393-400: handleRes dispatch with subID is consistent with new handler type

Iterating for subID, subscriber := range bkr.subscribers and invoking subscriber.handler(ctx, subID, evt) only when clusterName and dataType match correctly:

  • Associates each delivery with its subID for logging/metrics, and
  • Reuses the existing filter semantics.

Given the handler is now responsible for pushing onto eventCh, this dispatch layer looks coherent.

Copy link
Member

@morvencao morvencao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/cloudevents/generic/clients/baseclient.go (1)

153-169: Avoid holding the client lock while calling Subscribe.

We still hold baseClient.Lock() when invoking transport.Subscribe(ctx). If Subscribe performs network I/O or blocks, everything else needing the mutex (including reconnect/error handling) stalls and we can deadlock on shutdown. Please release the lock before calling Subscribe, then reacquire it to publish the new receiverChan and guard against a racing second subscriber.

 func (c *baseClient) subscribe(ctx context.Context, receive receiveFn) {
-	c.Lock()
-	defer c.Unlock()
-
 	logger := klog.FromContext(ctx)
-	// make sure there is only one subscription go routine starting for one client.
-	if c.receiverChan != nil {
-		logger.V(2).Info("the subscription has already started")
-		return
-	}
-
-	// send subscription request before starting to receive events
-	if err := c.transport.Subscribe(ctx); err != nil {
+	c.Lock()
+	if c.receiverChan != nil {
+		c.Unlock()
+		logger.V(2).Info("the subscription has already started")
+		return
+	}
+	c.Unlock()
+
+	if err := c.transport.Subscribe(ctx); err != nil {
 		runtime.HandleErrorWithContext(ctx, err, "failed to subscribe")
 		return
 	}
 
-	c.receiverChan = make(chan int)
+	c.Lock()
+	if c.receiverChan != nil {
+		c.Unlock()
+		logger.V(2).Info("the subscription has already started")
+		return
+	}
+	c.receiverChan = make(chan int)
+	c.Unlock()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e39a04 and fd704c3.

📒 Files selected for processing (20)
  • pkg/cloudevents/constants/constants.go (1 hunks)
  • pkg/cloudevents/generic/clients/agentclient_test.go (3 hunks)
  • pkg/cloudevents/generic/clients/baseclient.go (7 hunks)
  • pkg/cloudevents/generic/clients/sourceclient_test.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go (3 hunks)
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go (5 hunks)
  • pkg/cloudevents/generic/options/fake/fakeoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/agentoptions.go (2 hunks)
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1 hunks)
  • pkg/cloudevents/generic/options/options.go (2 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/options.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport.go (1 hunks)
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1 hunks)
  • pkg/cloudevents/server/grpc/broker.go (13 hunks)
  • pkg/cloudevents/server/grpc/broker_test.go (3 hunks)
  • test/integration/cloudevents/certrotation_grpc_test.go (1 hunks)
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1 hunks)
  • test/integration/cloudevents/server/grpc.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • pkg/cloudevents/generic/options/mqtt/agentoptions.go
  • pkg/cloudevents/constants/constants.go
  • pkg/cloudevents/generic/options/grpc/sourceoptions.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder.go
  • pkg/cloudevents/generic/options/builder/optionsbuilder_test.go
  • pkg/cloudevents/generic/options/v2/grpc/options.go
  • pkg/cloudevents/generic/options/fake/fakeoptions.go
  • pkg/cloudevents/server/grpc/broker_test.go
  • pkg/cloudevents/generic/clients/agentclient_test.go
  • test/integration/cloudevents/certrotation_grpc_test.go
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.
📚 Learning: 2025-11-11T13:27:36.331Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 162
File: pkg/cloudevents/generic/options/pubsub/options.go:157-174
Timestamp: 2025-11-11T13:27:36.331Z
Learning: For open-cluster-management PubSub transport in pkg/cloudevents/generic/options/pubsub: broadcast topics (SourceBroadcast, AgentBroadcast) and their corresponding subscriptions are always required, not optional. The omitempty tags on types.Topics broadcast fields exist because the struct is shared with MQTT (where broadcasts are optional), but PubSub requires all broadcast channels for resync functionality.

Applied to files:

  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • test/integration/cloudevents/cloudevetns_grpc_v2_test.go
  • pkg/cloudevents/generic/options/mqtt/sourceoptions.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/options.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.

Applied to files:

  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-17T13:29:00.675Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 147
File: pkg/cloudevents/generic/options/grpc/protocol/heartbeat_integration_test.go:66-68
Timestamp: 2025-09-17T13:29:00.675Z
Learning: In the pkg/cloudevents/generic/options/grpc/protocol package, bufSize is already defined as a constant in protocal_test.go and is accessible to other test files in the same package, including heartbeat_integration_test.go.

Applied to files:

  • test/integration/cloudevents/server/grpc.go
  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/server/grpc/broker.go
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/clients/sourceclient_test.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/server/grpc/broker.go
  • pkg/cloudevents/generic/options/grpc/agentoptions.go
  • pkg/cloudevents/generic/options/options.go
📚 Learning: 2025-09-12T02:55:39.701Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:36-39
Timestamp: 2025-09-12T02:55:39.701Z
Learning: In the GRPC protocol health check implementation, WithReconnectErrorOption already validates the interval parameter and sets a default of 20 seconds if interval <= 0, preventing time.NewTicker panics.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-12T06:49:25.727Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:186-197
Timestamp: 2025-09-12T06:49:25.727Z
Learning: In the gRPC health check implementation, Canceled/DeadlineExceeded errors are intentionally handled externally by connection management components. The health check component has a focused responsibility of only monitoring server reachability, not handling connection-level issues.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-09-16T06:03:45.232Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/server/grpc/broker.go:169-194
Timestamp: 2025-09-16T06:03:45.232Z
Learning: In the Open Cluster Management gRPC CloudEvents implementation, client reconnections due to heartbeat timeout are considered acceptable behavior by the maintainer skeeey, rather than something that needs optimization to prevent.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport_test.go
  • pkg/cloudevents/generic/options/v2/grpc/transport.go
  • pkg/cloudevents/generic/clients/baseclient.go
  • pkg/cloudevents/generic/options/options.go
📚 Learning: 2025-09-12T06:43:05.679Z
Learnt from: qiujian16
Repo: open-cluster-management-io/sdk-go PR: 140
File: pkg/server/grpc/server_test.go:260-265
Timestamp: 2025-09-12T06:43:05.679Z
Learning: In the gRPC Go library, grpc.NewClient was introduced in v1.63 and is the recommended way to create gRPC client connections. grpc.Dial and grpc.DialContext are deprecated as of v1.63, though they will be supported throughout the 1.x series. NewClient uses "dns" as the default name resolver while the deprecated Dial functions use "passthrough" for backward compatibility.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
📚 Learning: 2025-08-27T03:21:36.789Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/server/grpc/metrics/metrics.go:230-236
Timestamp: 2025-08-27T03:21:36.789Z
Learning: In the gRPC metrics implementation for open-cluster-management.io/sdk-go, the grpc_server_ce_called_total metric intentionally has different semantics for unary vs stream calls: unary calls increment it once per RPC invocation, while stream calls increment it once per received message. This is deliberate design to provide granular visibility into stream message activity.

Applied to files:

  • pkg/cloudevents/generic/options/v2/grpc/transport.go
🧬 Code graph analysis (5)
test/integration/cloudevents/server/grpc.go (1)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/v2/grpc/transport_test.go (5)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventService_SubscribeServer (146-149)
  • RegisterCloudEventServiceServer (116-118)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/types/types.go (3)
  • HeartbeatCloudEventsType (13-13)
  • CloudEventsDataType (165-169)
  • SourceAll (22-22)
pkg/cloudevents/generic/options/grpc/options.go (2)
  • GRPCOptions (114-118)
  • GRPCDialer (30-37)
pkg/cloudevents/generic/options/v2/grpc/options.go (2)
  • NewAgentOptions (10-26)
  • NewSourceOptions (28-39)
pkg/cloudevents/generic/options/v2/grpc/transport.go (8)
pkg/cloudevents/generic/options/grpc/options.go (1)
  • GRPCOptions (114-118)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (3)
  • CloudEventServiceClient (34-37)
  • CloudEventService_SubscribeClient (71-74)
  • NewCloudEventServiceClient (43-45)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/generic/options/options.go (1)
  • ReceiveHandlerFn (13-13)
pkg/cloudevents/server/grpc/heartbeat/healthcheck.go (1)
  • NewHealthChecker (20-32)
pkg/cloudevents/generic/types/types.go (1)
  • HeartbeatCloudEventsType (13-13)
pkg/cloudevents/generic/options/grpc/protocol/message.go (1)
  • NewMessage (42-57)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (5)
pkg/cloudevents/generic/options/options.go (1)
  • CloudEventsSourceOptions (62-73)
pkg/cloudevents/generic/options/v2/grpc/options.go (1)
  • NewSourceOptions (28-39)
test/integration/cloudevents/util/grpc.go (1)
  • NewGRPCSourceOptions (16-18)
pkg/cloudevents/clients/work/payload/manifestbundle.go (1)
  • ManifestBundleEventDataType (11-15)
pkg/cloudevents/constants/constants.go (1)
  • ConfigTypeGRPC (5-5)
pkg/cloudevents/server/grpc/broker.go (6)
pkg/cloudevents/generic/types/types.go (2)
  • CloudEventsDataType (165-169)
  • ExtensionDeletionTimestamp (71-71)
pkg/cloudevents/generic/options/grpc/protobuf/v1/cloudevent_grpc.pb.go (1)
  • CloudEventService_SubscribeServer (146-149)
pkg/cloudevents/constants/constants.go (1)
  • GRPCSubscriptionIDKey (10-10)
pkg/cloudevents/server/grpc/metrics/metrics.go (1)
  • IncGRPCCESubscribersMetric (305-307)
pkg/cloudevents/server/grpc/heartbeat/heartbeat.go (1)
  • NewHeartbeater (18-23)
pkg/cloudevents/generic/options/grpc/protocol/write_message.go (1)
  • WritePBMessage (20-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (12)
test/integration/cloudevents/cloudevetns_grpc_v2_test.go (1)

19-23: LGTM! Unused parameter correctly handled.

The blank identifier _ for the unused ctx parameter is the correct Go idiom. The function cleanly wraps the v2 gRPC options and integrates well with the test infrastructure.

pkg/cloudevents/generic/options/v2/grpc/transport.go (4)

53-74: Connection lifecycle correctly implemented.

The Connect method properly initializes closeChan and starts the connection state monitor. The past review concerns about goroutine lifecycle and nil channel handling have been addressed through:

  • Nil check in Close (lines 180-187) prevents panic if Close is called before Connect
  • monitorConnectionState exits cleanly on context cancellation or connection failure (lines 251-290)

96-131: Subscribe implementation correctly prevents races and duplicate subscriptions.

The method properly:

  • Guards against multiple subscription attempts with the subscribed flag (lines 105-107)
  • Holds the mutex throughout registration and header emission (lines 98-99, defer at 99)
  • Provides clear error messages for invalid states

The past review concern about race conditions between registration and SendHeader has been addressed by keeping all critical operations under the same lock.


135-167: Data race on closeChan properly fixed with synchronized access.

The introduction of getCloseChan() (lines 242-247) successfully eliminates the data race identified in past reviews:

  • All reads of closeChan in Receive (line 159) and startEventsReceiver (line 234) go through the synchronized helper
  • The helper uses RLock to safely read the channel reference
  • Close and Connect continue to modify closeChan under write lock

This design allows reconnect cycles to work correctly while preventing concurrent read/write races.

Also applies to: 242-247


173-191: Close method is defensive and handles edge cases well.

The implementation correctly:

  • Guards against nil channel panic (line 180)
  • Prevents double-close panic using select (lines 181-186)
  • Resets the subscribed flag (line 189) to allow re-subscription after reconnect
  • Closes the underlying dialer connection

This defensive approach ensures Close can be safely called at any point in the transport lifecycle, including before Connect succeeds or after multiple calls.

pkg/cloudevents/generic/options/v2/grpc/transport_test.go (1)

1-1334: Excellent test coverage for the v2 gRPC transport.

The test suite is comprehensive and well-structured:

  • Clean mock server setup using bufconn for in-memory testing (lines 88-128)
  • Thorough validation of constructor behavior (lines 130-244)
  • Extensive coverage of transport lifecycle: Send, Subscribe, Receive, Close, error handling
  • Edge cases well-tested: heartbeat filtering, connection state changes, stream errors, context cancellation
  • Integration tests verify end-to-end flows (lines 848-1014)

The tests provide strong confidence in the transport implementation's correctness and robustness.

pkg/cloudevents/server/grpc/broker.go (2)

124-152: Race condition between registration and header emission successfully eliminated.

The refactored register method now correctly makes subscriber registration and header emission atomic by holding bkr.mu throughout the entire operation (lines 132-133, defer at 133):

  1. Subscriber added to map (lines 136-140)
  2. Header sent while still holding lock (lines 143-147)
  3. On SendHeader failure, subscriber cleaned up atomically (line 145)

This prevents the race where handleRes could deliver events before the subscription-id header was sent, which would cause SendHeader to fail with "header already sent" and incorrectly unregister an active subscriber.


228-250: Subscription context correctly prevents handler blocking on stream closure.

The handler registration properly uses two contexts:

  • handlerCtx (line 228 parameter): Per-invocation context from handleRes for marshaling and logging
  • subCtx (line 240 in select): Subscription-scoped context that cancels when the stream closes

This ensures that when the subscription stream ends, all handlers for that subscription immediately unblock from eventCh <- pbEvt (line 239) via the subCtx.Done() case (line 240), preventing goroutine leaks and eliminating the potential deadlock identified in past reviews.

pkg/cloudevents/generic/options/grpc/agentoptions.go (1)

23-23: Deprecation and Subscribe implementation look good.

  • The deprecation comment (line 23) correctly uses the imperative form "use" (grammatically correct)
  • The no-op Subscribe method (lines 86-90) is appropriate since subscription is handled by the CloudEvents client during receiver startup in the v1 protocol

This maintains backward compatibility while conforming to the new transport interface.

Also applies to: 86-90

test/integration/cloudevents/server/grpc.go (1)

173-175: Subscription-id header emission correctly implemented.

The SendHeader call properly signals subscription registration to the client by emitting the subscription-id header. The fail-fast error handling (returning immediately on error) is appropriate since a failed header emission indicates the subscription cannot proceed.

pkg/cloudevents/generic/options/mqtt/sourceoptions.go (1)

135-141: No-op Subscribe implementation is correct for MQTT v1 transport.

The implementation correctly delegates subscription to the CloudEvents client during receiver startup. The TODO comment (lines 138-139) appropriately notes that v2 could implement native subscription logic to fully decouple from the CloudEvents SDK, which aligns with the broader v2 transport goals.

pkg/cloudevents/generic/clients/sourceclient_test.go (1)

64-70: Test updates correctly reflect context-aware Receive handler API.

The test callbacks now properly accept context.Context as the first parameter (lines 64, 137, 335), aligning with the updated ReceiveHandlerFn signature. The tests continue to correctly exercise context cancellation paths through ctx.Done() (lines 67, 140, demonstrating proper integration with the new API.

Also applies to: 137-143, 335-339

Signed-off-by: Wei Liu <[email protected]>
@qiujian16
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Nov 17, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit a979478 into open-cluster-management-io:main Nov 17, 2025
12 checks passed
@skeeey skeeey deleted the sub branch November 20, 2025 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants