Skip to content

Conversation

@lidel
Copy link
Member

@lidel lidel commented Jan 7, 2026

Problem

SendMessage() in bitswap can block indefinitely when the remote peer is unresponsive, causing goroutine leaks. Over time, this exhausts worker goroutines and bitswap stops serving blocks entirely.

Symptoms observed on production nodes:

Root cause(?): When lazy multistream-select is used, stream.Close() must complete the protocol handshake read. If the peer doesn't respond (network issues, overloaded, unclean disconnect), ReadNextToken() blocks forever with no deadline.

Fix

Set a read deadline before calling stream.Close() in SendMessage():

_ = s.SetReadDeadline(time.Now().Add(timeout))
return s.Close()

This ensures Close() times out instead of blocking indefinitely.

Related

Note

This PR is a localized fix for bitswap that can be shipped independently from the go-libp2p fix (defense in depth).

@lidel lidel changed the title fix(bitswap/network): set read deadline before stream Close to preven… fix(bitswap/network): stream.Close() blocks indefinitely on unresponsive peers Jan 7, 2026
@lidel lidel mentioned this pull request Jan 7, 2026
49 tasks
@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.11%. Comparing base (de0b141) to head (9937acb).
⚠️ Report is 2 commits behind head on main.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1083      +/-   ##
==========================================
+ Coverage   61.08%   61.11%   +0.02%     
==========================================
  Files         264      264              
  Lines       26225    26227       +2     
==========================================
+ Hits        16020    16028       +8     
+ Misses       8525     8520       -5     
+ Partials     1680     1679       -1     
Files with missing lines Coverage Δ
bitswap/network/bsnet/ipfs_impl.go 74.45% <100.00%> (-0.55%) ⬇️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…t blocking

SendMessage() can block indefinitely when the remote peer is slow or
unresponsive during the multistream-select handshake completion.

The fix sets a read deadline (using the calculated send timeout) before
calling stream.Close(), ensuring the operation will time out rather than
block indefinitely.

See: multiformats/go-multistream#47
See: ipshipyard/waterworks-infra#860
@lidel lidel force-pushed the fix/stream-close-deadline branch from 006ef85 to 9937acb Compare January 7, 2026 01:59
@lidel lidel marked this pull request as ready for review January 7, 2026 02:10
@lidel lidel requested a review from a team as a code owner January 7, 2026 02:10
@lidel lidel merged commit 12e077c into main Jan 8, 2026
17 checks passed
@lidel lidel deleted the fix/stream-close-deadline branch January 8, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants