Skip to content

Fix race condition during CSM message exchange#611

Merged
jkralik merged 8 commits intomasterfrom
feature/fix-race-condition-for-csm-messages
Nov 19, 2025
Merged

Fix race condition during CSM message exchange#611
jkralik merged 8 commits intomasterfrom
feature/fix-race-condition-for-csm-messages

Conversation

@lubo-svk
Copy link
Copy Markdown
Collaborator

@lubo-svk lubo-svk commented Jul 28, 2025

Summary by CodeRabbit

  • New Features

    • Configurable timeout for TCP CSM exchange.
    • Callback hook to notify when TCP signal messages are received.
  • Improvements

    • Client waits for CSM exchange completion or timeout before proceeding.
    • Dialing updated to use context-aware TLS and TCP dialing.
  • Breaking Changes

    • Client constructor now returns an error alongside the connection.
  • Tests

    • Added tests covering CSM exchange success and timeout scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jul 28, 2025

Walkthrough

Adds a CSM exchange timeout option and config, updates the TCP Client to wait for a peer CSM (or timeout) during startup, introduces a per-Conn TCP signal-received handler, switches several tests to context-aware dialing, and adds tests for CSM exchange timeout scenarios.

Changes

Cohort / File(s) Change Summary
TCP CSM Exchange Option
options/tcpOptions.go
Added CSMExchangeTimeoutOpt (stores timeout), TCPClientApply(cfg *tcpClient.Config) to apply the option, and constructor WithCSMExchangeTimeout(time.Duration); imports time.
TCP Client: dialing & CSM exchange
tcp/client.go
Client signature changed to (*client.Conn, error); TLS dialing now uses tls.Dialer + DialContext; added CSM exchange handling (registers TCPSignalReceivedHandler, waits on csmExchangeDone or timeout, reports error and closes conn on timeout, clears handler on success); added codes import.
TCP Client Config
tcp/client/config.go
Added CSMExchangeTimeout time.Duration field to Config.
Conn: TCP signal handler
tcp/client/conn.go
Added type TCPSignalReceivedHandler func(codes.Code), added tcpSignalReceivedHandler and handlerMutex sync.RWMutex fields, SetTCPSignalReceivedHandler method, and handleTCPSignalReceived invocation points in signal processing.
Tests: context-aware dialing
net/conn_test.go, net/tlslistener_test.go, tcp/server_test.go
Replaced net.Dial / tls.DialWithDialer with net.Dialer{}.DialContext(...) and tls.Dialer.DialContext(...) to use context-aware dialing.
CSM exchange tests
tcp/client_test.go
Added TestConnWithCSMExchangeTimeout covering: no CSM, successful CSM exchange, and timeout when peer disables CSM; added import tcp/server.

Sequence Diagram(s)

sequenceDiagram
  participant App as Application
  participant TCPClient as TCP Client
  participant Peer as Server/Peer

  App->>TCPClient: call Client(conn, opts... with CSMExchangeTimeout)
  TCPClient->>Peer: establish TCP/TLS (DialContext)
  alt CSMExchangeTimeout>0 and peer sends CSM
    Note right of TCPClient `#E6F4EA`: register TCPSignalReceivedHandler\nwait on csmExchangeDone or timeout
    Peer-->>TCPClient: send CSM (codes.CSM)
    TCPClient->>TCPClient: handler closes csmExchangeDone
    TCPClient->>App: return (*client.Conn, nil)
  else timeout before CSM
    TCPClient->>App: report timeout error, close conn
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review focus:
    • tcp/client.go for goroutine + CSM wait semantics and correct cleanup on timeout.
    • tcp/client/conn.go for concurrency (RWMutex) and correct invocation of the new handler.
    • tcp/client_test.go for deterministic CSM exchange simulations and proper resource cleanup.

Poem

I’m a rabbit in code, listening for a chime,
I wait for a CSM, patient in time.
If none comes by, I close with a wink,
Else I hop along—no need to blink.
🐇⏳

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix race condition during CSM message exchange' directly describes the primary change: adding synchronization and timeout handling for CSM (Capabilities and Settings Message) exchanges to prevent race conditions.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/fix-race-condition-for-csm-messages

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 081b33b and a0baf04.

📒 Files selected for processing (1)
  • tcp/client_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tcp/client_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Fuzzing
  • GitHub Check: test (windows-latest)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 28, 2025

Codecov Report

❌ Patch coverage is 95.55556% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.06%. Comparing base (f73dcbd) to head (a0baf04).

Files with missing lines Patch % Lines
tcp/client/conn.go 88.88% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #611      +/-   ##
==========================================
+ Coverage   73.17%   76.06%   +2.88%     
==========================================
  Files          73       73              
  Lines        6986     5965    -1021     
==========================================
- Hits         5112     4537     -575     
+ Misses       1489     1044     -445     
+ Partials      385      384       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
tcp/client/conn.go (1)

380-439: Address high cyclomatic complexity.

The pipeline failure indicates that the handleSignals method has a cyclomatic complexity of 17, which exceeds the limit of 15. Consider refactoring to reduce complexity.

Extract individual signal handlers into separate methods:

 func (cc *Conn) handleSignals(r *pool.Message) bool {
 	switch r.Code() {
 	case codes.CSM:
-		if cc.disablePeerTCPSignalMessageCSMs {
-			return true
-		}
-		if size, err := r.GetOptionUint32(message.TCPMaxMessageSize); err == nil {
-			cc.peerMaxMessageSize.Store(size)
-		}
-		if r.HasOption(message.TCPBlockWiseTransfer) {
-			cc.peerBlockWiseTranferEnabled.Store(true)
-		}
-
-		// signal CSM message is received.
-		if cc.tcpSignalReceivedHandler != nil {
-			cc.tcpSignalReceivedHandler(codes.CSM)
-		}
-		return true
+		return cc.handleCSM(r)
 	case codes.Ping:
-		// if r.HasOption(message.TCPCustody) {
-		// TODO
-		// }
-		if err := cc.sendPong(r.Token()); err != nil && !coapNet.IsConnectionBrokenError(err) {
-			cc.Session().errors(fmt.Errorf("cannot handle ping signal: %w", err))
-		}
-
-		if cc.tcpSignalReceivedHandler != nil {
-			cc.tcpSignalReceivedHandler(codes.Ping)
-		}
-		return true
+		return cc.handlePing(r)
 	case codes.Release:
-		// if r.HasOption(message.TCPAlternativeAddress) {
-		// TODO
-		// }
-
-		if cc.disablePeerTCPSignalMessageCSMs {
-			cc.tcpSignalReceivedHandler(codes.Release)
-		}
-		return true
+		return cc.handleRelease(r)
 	case codes.Abort:
-		// if r.HasOption(message.TCPBadCSMOption) {
-		// TODO
-		// }
-
-		if cc.disablePeerTCPSignalMessageCSMs {
-			cc.tcpSignalReceivedHandler(codes.Abort)
-		}
-		return true
+		return cc.handleAbort(r)
 	case codes.Pong:
-		if h, ok := cc.tokenHandlerContainer.LoadAndDelete(r.Token().Hash()); ok {
-			cc.processReceivedMessage(r, cc, h)
-		}
-
-		if cc.tcpSignalReceivedHandler != nil {
-			cc.tcpSignalReceivedHandler(codes.Pong)
-		}
-		return true
+		return cc.handlePong(r)
 	}
 	return false
 }

+func (cc *Conn) handleCSM(r *pool.Message) bool {
+	if cc.disablePeerTCPSignalMessageCSMs {
+		return true
+	}
+	if size, err := r.GetOptionUint32(message.TCPMaxMessageSize); err == nil {
+		cc.peerMaxMessageSize.Store(size)
+	}
+	if r.HasOption(message.TCPBlockWiseTransfer) {
+		cc.peerBlockWiseTranferEnabled.Store(true)
+	}
+	if cc.tcpSignalReceivedHandler != nil {
+		cc.tcpSignalReceivedHandler(codes.CSM)
+	}
+	return true
+}

+// Add similar methods for handlePing, handleRelease, handleAbort, handlePong
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f431046 and 41c230f.

📒 Files selected for processing (4)
  • options/tcpOptions.go (2 hunks)
  • tcp/client.go (2 hunks)
  • tcp/client/config.go (1 hunks)
  • tcp/client/conn.go (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
tcp/client.go (1)
message/codes/codes.go (2)
  • Code (12-12)
  • CSM (53-53)
options/tcpOptions.go (1)
tcp/client/config.go (1)
  • Config (41-54)
🪛 GitHub Actions: Golangci-lint
tcp/client/conn.go

[error] 380-380: golangci-lint: cyclomatic complexity 17 of func (*Conn).handleSignals is high (> 15) (gocyclo)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Fuzzing
  • GitHub Check: test (windows-latest)
🔇 Additional comments (9)
tcp/client/config.go (1)

53-53: LGTM!

The addition of CSMExchangeTimeout field to the Config struct is clean and appropriately typed for timeout configuration.

options/tcpOptions.go (1)

5-5: LGTM!

The addition of the time import is necessary for the new timeout functionality.

tcp/client.go (2)

10-10: LGTM!

The import of message/codes is necessary for the CSM code constant.


104-113: Potential race condition in CSM handler setup.

There's a potential race condition where the CSM message could arrive and be processed before the select statement in lines 124-129 is reached, causing the channel close to be missed.

The current implementation creates the channel and sets up the handler before starting the connection goroutine, which is good. However, consider adding a safety mechanism to ensure the handler is properly synchronized with the waiting logic.

 	var csmExchangeDone chan struct{}
 	if cfg.CSMExchangeTimeout != 0 && !cfg.DisablePeerTCPSignalMessageCSMs {
 		csmExchangeDone = make(chan struct{})
+		// Use sync.Once to ensure channel is closed only once
+		var closeOnce sync.Once
 
 		cc.OnTCPSignalReceivedHandler(func(code codes.Code) {
 			if code == codes.CSM {
-				close(csmExchangeDone)
+				closeOnce.Do(func() { close(csmExchangeDone) })
 			}
 		})
 	}
tcp/client/conn.go (5)

30-30: LGTM!

The type alias for TCPSignalReceivedHandler is well-defined and follows Go conventions.


56-56: LGTM!

The addition of the tcpSignalReceivedHandler field to the Conn struct is appropriate.


393-396: LGTM!

The CSM signal handler invocation is correctly implemented with proper null checking.


406-408: LGTM!

The Ping signal handler invocation is correctly implemented with proper null checking.


433-435: LGTM!

The Pong signal handler invocation is correctly implemented with proper null checking.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tcp/client/conn.go (2)

275-279: LGTM with naming suggestion!

The method correctly implements thread-safe handler registration using proper locking.

However, consider fixing the naming convention:

-func (cc *Conn) SetTCPSignalReceivedHandler(handler TCPSignalReceivedHandler) {
+func (cc *Conn) SetTCPSignalReceivedHandler(handler TCPSignalReceivedHandler) {

Actually the method name is fine - the static analysis warning is about the other method name.


384-390: Fix naming convention issue.

The method logic is correct with proper synchronization and nil checking.

However, fix the naming convention as flagged by static analysis:

-func (cc *Conn) handleTcpSignalReceived(code codes.Code) {
+func (cc *Conn) handleTCPSignalReceived(code codes.Code) {

And update all calls to this method accordingly.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41c230f and 8111d36.

📒 Files selected for processing (3)
  • options/tcpOptions.go (2 hunks)
  • tcp/client.go (2 hunks)
  • tcp/client/conn.go (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tcp/client.go
  • options/tcpOptions.go
🧰 Additional context used
🧬 Code Graph Analysis (1)
tcp/client/conn.go (2)
message/codes/codes.go (6)
  • Code (12-12)
  • CSM (53-53)
  • Ping (54-54)
  • Pong (55-55)
  • Release (56-56)
  • Abort (57-57)
mux/client.go (1)
  • Conn (17-51)
🪛 GitHub Check: lint
tcp/client/conn.go

[failure] 384-384:
var-naming: method handleTcpSignalReceived should be handleTCPSignalReceived (revive)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (windows-latest)
  • GitHub Check: Fuzzing
🔇 Additional comments (8)
tcp/client/conn.go (8)

8-8: LGTM!

The sync import is correctly added to support the new synchronization mechanisms.


31-31: LGTM!

The handler type definition is clean and follows Go conventions for callback functions.


57-58: LGTM!

The new fields are appropriately typed and positioned. The mutex will provide thread-safe access to the handler field.


404-406: LGTM!

The CSM signal handler notification is correctly placed after processing the CSM message.


416-416: LGTM!

The Ping signal handler notification is appropriately added.


418-424: LGTM!

The Pong handling correctly preserves existing token handler logic while adding signal notification.


430-430: LGTM!

The Release signal handler notification is correctly implemented.


437-437: LGTM!

The Abort signal handler notification is correctly implemented.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
net/tlslistener_test.go (1)

107-109: Consider adding TLS MinVersion for better security.

The TLS configuration lacks a minimum version specification, defaulting to potentially insecure older versions. While this may be acceptable for testing environments, consider adding explicit version control for better security practices.

 Config: &tls.Config{
+    MinVersion:         tls.VersionTLS12,
     InsecureSkipVerify: true,
     Certificates:       []tls.Certificate{cert},
 },
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c58872d and b64cd82.

📒 Files selected for processing (4)
  • net/conn_test.go (1 hunks)
  • net/tlslistener_test.go (2 hunks)
  • tcp/client.go (3 hunks)
  • tcp/server_test.go (1 hunks)
🧰 Additional context used
🪛 ast-grep (0.38.6)
net/tlslistener_test.go

[warning] 106-109: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{
InsecureSkipVerify: true,
Certificates: []tls.Certificate{cert},
}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🔇 Additional comments (7)
tcp/client.go (3)

35-39: LGTM: Modernized TLS dialing approach.

The switch from tls.DialWithDialer to using tls.Dialer with DialContext provides better context support and consistency with the rest of the codebase.


109-118: LGTM: Well-structured CSM exchange setup.

The conditional CSM exchange setup is properly implemented:

  • Only creates the channel and handler when CSM exchange is enabled
  • Uses a buffered channel to prevent blocking
  • Handler correctly identifies CSM messages and signals completion

127-141: LGTM: Proper timeout handling and cleanup.

The CSM exchange timeout logic is well-implemented:

  • Properly handles both timeout and successful completion cases
  • Correctly closes connection and returns nil on timeout (addressing race condition)
  • Cleans up the handler after exchange completion or timeout
  • Error messaging is clear and informative

This addresses the race condition mentioned in the PR objectives by ensuring the client waits for CSM exchange before proceeding.

net/conn_test.go (1)

68-69: LGTM: Consistent dialing modernization.

The switch to context-aware dialing using net.Dialer with DialContext is consistent with the modernization pattern applied across the codebase.

tcp/server_test.go (1)

304-305: LGTM: Consistent dialing modernization.

The update to use net.Dialer with DialContext maintains consistency with the modernization pattern applied throughout the codebase.

net/tlslistener_test.go (2)

103-112: LGTM: Modernized TLS dialing approach.

The switch to tls.Dialer with DialContext provides better context support and is consistent with similar changes in tcp/client.go.


193-194: LGTM: Consistent dialing modernization.

The update to context-aware dialing using net.Dialer maintains consistency with the modernization pattern applied across test files.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tcp/client_test.go (2)

844-844: Fix gofumpt whitespace warnings

There are unnecessary leading blank lines at Line 844 and Line 887, and the file fails gofumpt formatting. Please remove the extra blank lines and run gofumpt.

Apply minimal fixes:

-
 func TestConnWithCSMExchangeTimeout(t *testing.T) {

And later around Line 887:

-
 			l, err := coapNet.NewTCPListener("tcp", "")

Also, run gofumpt to normalize formatting across the file.

Also applies to: 887-887


867-881: Speed up the timeout scenario to keep CI fast

A 3s timeout slows the suite. Given local networking and the determinism of disabling CSM on the server, 1s is typically sufficient and reduces flakiness.

-					options.WithCSMExchangeTimeout(time.Second * 3),
+					options.WithCSMExchangeTimeout(time.Second * 1),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b64cd82 and d24f3de.

📒 Files selected for processing (2)
  • tcp/client.go (3 hunks)
  • tcp/client_test.go (2 hunks)
🧰 Additional context used
🪛 GitHub Check: lint
tcp/client_test.go

[failure] 887-887:
unnecessary leading newline (whitespace)


[failure] 844-844:
unnecessary leading newline (whitespace)


[failure] 845-845:
File is not properly formatted (gofumpt)

🪛 GitHub Actions: Golangci-lint
tcp/client_test.go

[error] 845-845: File is not properly formatted (gofumpt). Step: golangci-lint run --timeout=5m

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (windows-latest)
  • GitHub Check: Fuzzing
🔇 Additional comments (2)
tcp/client_test.go (1)

21-21: Import of tcp/server is appropriate

Needed for the server.Option type used in the new test. No issues.

tcp/client.go (1)

34-41: TLS dialing path update looks good

Using tls.Dialer with DialContext respects context and NetDialer settings and is the preferred approach over tls.DialWithDialer. Nice improvement.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tcp/client_test.go (1)

905-914: Fix resource leak and avoid shadowing the imported client package; also assert timeout error type

  • The local variable name client shadows the imported client package, which harms readability.
  • On the success path the connection is never closed, leaking a socket and goroutines.
  • For the timeout scenario, assert the expected error kind to lock in behavior.

Apply this diff:

-            client, err := Dial(l.Addr().String(),
-                tt.args.clientOptions...)
-            if tt.wantErr {
-                require.Nil(t, client)
-                require.Error(t, err)
-            } else {
-                require.NotNil(t, client)
-                require.NoError(t, err)
-            }
+            cc, err := Dial(l.Addr().String(), tt.args.clientOptions...)
+            if tt.wantErr {
+                require.Nil(t, cc)
+                require.Error(t, err)
+                // CSM-exchange timeout should surface as a deadline exceeded.
+                require.ErrorIs(t, err, context.DeadlineExceeded)
+                return
+            }
+            require.NoError(t, err)
+            require.NotNil(t, cc)
+            defer func() {
+                _ = cc.Close()
+                <-cc.Done()
+            }()
🧹 Nitpick comments (1)
tcp/client_test.go (1)

844-883: Keep tests snappy and precise: shorten timeouts and parallelize subtests

  • A 3s CSM timeout will slow CI; consider tightening to a few hundred milliseconds for local loopback tests.
  • Optionally run subtests in parallel; each subtest uses its own listener/server, so it’s safe and speeds up the suite.

Apply this diff to reduce test duration and parallelize:

@@
 func TestConnWithCSMExchangeTimeout(t *testing.T) {
@@
-    {
+    {
         name: "client-server-csm-success",
         args: args{
             clientOptions: []Option{
-                options.WithCSMExchangeTimeout(time.Second * 3),
+                options.WithCSMExchangeTimeout(300 * time.Millisecond),
             },
         },
         wantErr: false,
     },
     {
         name: "client-server-csm-timeout",
         args: args{
             clientOptions: []Option{
-                options.WithCSMExchangeTimeout(time.Second * 3),
+                options.WithCSMExchangeTimeout(300 * time.Millisecond),
             },
             serverOptions: []server.Option{
                 options.WithDisableTCPSignalMessageCSM(),
             },
         },
         wantErr: true,
     },
@@
     for _, tt := range tests {
         t.Run(tt.name, func(t *testing.T) {
+            t.Parallel()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d24f3de and 158fa8c.

📒 Files selected for processing (1)
  • tcp/client_test.go (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tcp/client_test.go (4)
tcp/client.go (2)
  • Option (20-22)
  • Dial (25-47)
options/tcpOptions.go (2)
  • WithCSMExchangeTimeout (50-54)
  • WithDisableTCPSignalMessageCSM (38-40)
net/tcplistener.go (1)
  • NewTCPListener (31-37)
tcp/server.go (1)
  • NewServer (7-9)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (windows-latest)
  • GitHub Check: Fuzzing
🔇 Additional comments (1)
tcp/client_test.go (1)

21-21: LGTM: server import is necessary for typed server options

Importing tcp/server enables the use of server.Option in the new CSM test. Looks good.

@lubo-svk lubo-svk changed the title Draft: Fix race condition during CSM message exchange Fix race condition during CSM message exchange Sep 24, 2025
@jkralik jkralik force-pushed the feature/fix-race-condition-for-csm-messages branch from 158fa8c to 081b33b Compare November 13, 2025 09:43
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
tcp/client_test.go (1)

905-913: Resource leak on success path.

The connection is not closed when the test succeeds, which can leak goroutines and sockets. Additionally, the variable name client shadows the imported client package. This issue was already flagged in a previous review.

Rename the variable to cc for consistency and add proper cleanup:

-			client, err := Dial(l.Addr().String(),
-				tt.args.clientOptions...)
-			if tt.wantErr {
-				require.Nil(t, client)
-				require.Error(t, err)
-			} else {
-				require.NotNil(t, client)
-				require.NoError(t, err)
-			}
+			cc, err := Dial(l.Addr().String(), tt.args.clientOptions...)
+			if tt.wantErr {
+				require.Nil(t, cc)
+				require.Error(t, err)
+				return
+			}
+			require.NoError(t, err)
+			require.NotNil(t, cc)
+			defer func() {
+				_ = cc.Close()
+				<-cc.Done()
+			}()
tcp/client.go (2)

126-139: Fix timer leak, add connection close check, and ensure handler cleanup.

Three issues need attention:

  1. Timer leak: time.After creates a timer that isn't stopped when the CSM arrives quickly, leaking resources.
  2. Missing connection close check: If the connection closes during the wait, the code hangs until timeout instead of exiting immediately.
  3. Handler not cleared on timeout: The handler cleanup at line 138 is unreachable on the timeout path (which returns at line 133).

Apply this fix:

+import (
+	"sync"
+	...
+)
+
 	var csmExchangeDone chan struct{}
+	var csmOnce sync.Once
 	if cfg.CSMExchangeTimeout != 0 && !cfg.DisablePeerTCPSignalMessageCSMs {
 		csmExchangeDone = make(chan struct{})
 		cc.SetTCPSignalReceivedHandler(func(code codes.Code) {
 			if code == codes.CSM {
-				close(csmExchangeDone)
+				csmOnce.Do(func() { close(csmExchangeDone) })
 			}
 		})
+		// Ensure handler is always cleared, regardless of outcome
+		defer cc.SetTCPSignalReceivedHandler(nil)
 	}
 
 	go func() {
 		err := cc.Run()
 		if err != nil {
 			cfg.Errors(fmt.Errorf("%v: %w", cc.RemoteAddr(), err))
 		}
 	}()
 
-	// if CSM messages are enabled, wait for the CSM messages to be exchanged
+	// If CSM messages are enabled, wait for exchange to complete or time out
 	if cfg.CSMExchangeTimeout != 0 && !cfg.DisablePeerTCPSignalMessageCSMs {
+		timer := time.NewTimer(cfg.CSMExchangeTimeout)
+		defer timer.Stop()
 		select {
-		case <-time.After(cfg.CSMExchangeTimeout):
+		case <-timer.C:
 			err := fmt.Errorf("%v: timeout waiting for CSM exchange with peer", cc.RemoteAddr())
 			cfg.Errors(err)
-			cc.Close()      // Close connection on timeout
-			return nil, err // or return cc with an error state
+			_ = cc.Close()
+			return nil, err
+		case <-cc.Done():
+			return nil, fmt.Errorf("%v: connection closed before CSM exchange: %w", cc.RemoteAddr(), cc.Context().Err())
 		case <-csmExchangeDone:
 			// CSM exchange completed successfully
 		}
-		// Clear the handler after exchange is complete or timed out
-		cc.SetTCPSignalReceivedHandler(nil)
 	}
 
 	return cc, nil

108-117: Critical: Prevent panic from multiple CSM signals.

If multiple CSM signals arrive, close(csmExchangeDone) will be called multiple times, causing a panic. Use sync.Once to guard the channel close:

+import (
+	"sync"
+	...
+)
+
 	var csmExchangeDone chan struct{}
+	var csmOnce sync.Once
 	if cfg.CSMExchangeTimeout != 0 && !cfg.DisablePeerTCPSignalMessageCSMs {
 		csmExchangeDone = make(chan struct{})
 
 		cc.SetTCPSignalReceivedHandler(func(code codes.Code) {
 			if code == codes.CSM {
-				close(csmExchangeDone)
+				csmOnce.Do(func() { close(csmExchangeDone) })
 			}
 		})
 	}
🧹 Nitpick comments (1)
net/tlslistener_test.go (1)

103-112: Good modernization to context-aware TLS dialing.

The switch to tls.Dialer with DialContext is correct and improves context handling.

Note: Static analysis suggests adding MinVersion: tls.VersionTLS13 to harden the TLS configuration, but since this is test code with InsecureSkipVerify already set, it's a nice-to-have rather than essential.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 158fa8c and 081b33b.

📒 Files selected for processing (8)
  • net/conn_test.go (1 hunks)
  • net/tlslistener_test.go (2 hunks)
  • options/tcpOptions.go (2 hunks)
  • tcp/client.go (3 hunks)
  • tcp/client/config.go (1 hunks)
  • tcp/client/conn.go (7 hunks)
  • tcp/client_test.go (2 hunks)
  • tcp/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tcp/client/conn.go
  • options/tcpOptions.go
  • net/conn_test.go
🧰 Additional context used
🧬 Code graph analysis (3)
net/tlslistener_test.go (1)
net/tlslistener.go (3)
  • l (47-49)
  • listener (13-17)
  • l (34-44)
tcp/client.go (4)
tcp/client/config.go (1)
  • Config (41-54)
dtls/client.go (2)
  • Client (56-128)
  • Dial (36-53)
tcp/client/conn.go (2)
  • Conn (47-62)
  • Option (70-70)
message/codes/codes.go (2)
  • Code (12-12)
  • CSM (53-53)
tcp/client_test.go (4)
tcp/client.go (2)
  • Option (20-22)
  • Dial (25-47)
options/tcpOptions.go (2)
  • WithCSMExchangeTimeout (50-54)
  • WithDisableTCPSignalMessageCSM (38-40)
net/tcplistener.go (1)
  • NewTCPListener (31-37)
tcp/server.go (1)
  • NewServer (7-9)
🪛 ast-grep (0.39.9)
net/tlslistener_test.go

[warning] 106-109: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{
InsecureSkipVerify: true,
Certificates: []tls.Certificate{cert},
}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Fuzzing
  • GitHub Check: test (windows-latest)
  • GitHub Check: test (macOS-latest)
🔇 Additional comments (7)
tcp/client/config.go (1)

53-53: LGTM!

The CSMExchangeTimeout field addition is clean and follows standard Go patterns. The zero value default (no timeout) is appropriate.

tcp/server_test.go (1)

304-305: LGTM!

Good modernization to use context-aware dialing. This aligns with the broader pattern of context-aware APIs used throughout the PR.

net/tlslistener_test.go (1)

193-194: LGTM!

Context-aware dialing pattern is correctly applied.

tcp/client_test.go (1)

21-21: LGTM!

Necessary import for the new CSM exchange timeout test.

tcp/client.go (3)

10-10: LGTM!

Necessary import for CSM code handling.


34-38: LGTM!

Good modernization to use tls.Dialer with context-aware DialContext.


46-50: LGTM!

Proper signature change to support error returns from CSM exchange validation.

@sonarqubecloud
Copy link
Copy Markdown

@jkralik jkralik merged commit 97a7523 into master Nov 19, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants