-
Notifications
You must be signed in to change notification settings - Fork 190
Description
WebSocket Transport Interoperability Failures with Other libp2p Implementations
Summary
Python libp2p's WebSocket transport fails interoperability tests with multiple libp2p implementations (Nim v1.14, JVM v1.2, rust-v0.53, Chromium-Rust v0.53) when handling yamux operations over WebSocket. The connection closes prematurely during active stream data reading, causing error cascades. This works correctly with rust-v0.54+, which uses proactive closure detection.
Affected Tests: 11 WebSocket tests fail across multiple implementations
Status: β
Works with rust-v0.54, rust-v0.55, rust-v0.56 (these versions have proactive closure detection)
Problem Statement
When Python's WebSocket transport is used with yamux or mplex stream multiplexers, connections fail during active stream operations. The failure occurs when:
- A ping stream is created successfully
- Data is sent over the stream
- Python attempts to read the response
- During the read operation, yamux tries to send a window update
- The WebSocket connection has already been closed by the peer
- The write operation fails, followed by read failures
- An
IncompleteReadErroris raised with a generic error message
Error Message:
Connection closed: expected 2 bytes but received 0 (connection may have been reset by peer)
Location: libp2p/io/utils.py:read_exactly() raises IncompleteReadError when WebSocket connection closes during yamux stream operations.
Affected Implementations
| Implementation | Failed Tests | Transports | Muxers | Security |
|---|---|---|---|---|
| Rust v0.53 | 4 | ws | yamux, mplex | noise |
| JVM v1.2 | 4 | ws | yamux, mplex | noise |
| Nim v1.14 | 4 | ws | yamux, mplex | noise |
| Chromium-Rust v0.53 | 1 | ws | mplex | noise |
Total: 11 WebSocket test failures
Note: All tests pass with rust-v0.54+, rust-v0.55, and rust-v0.56, which implement proactive closure detection (see Rust PR #4568).
Root Cause Analysis
The Core Problem: Reactive vs Proactive Closure Detection
Python's Approach (Reactive):
- WebSocket close frame arrives β Connection closed by peer
- Python unaware β No proactive detection
- Yamux tries to read β
read_exactly(12)called - WebSocket read fails β
ConnectionClosedexception - Exception caught β Returns
b""(empty bytes) - Yamux continues β Tries to send window update
- WebSocket write fails β "Connection closed" exception
- Error propagated β
IncompleteReadErrorraised
Rust's Approach (Proactive):
- WebSocket close frame arrives β
Incoming::Closedemitted - Stream ends β
BytesConnection::poll_nextreturnsNone - EOF detected β
RwStreamSink::poll_readreturnsOk(0) - Yamux detects closure β
poll_next_inboundreturnsNoneβConnectionError::Closed - Connection state updated β No further writes attempted
- Window updates skipped β Connection already known to be closed
Why TCP Works But WebSocket Doesn't
TCP:
- Connection stays open until explicitly closed
- Both sides can read/write independently
- FIN on a stream doesn't close the TCP connection
- Yamux can continue operating normally
WebSocket:
- Message-based protocol with explicit close frames
- When peer sends close frame, connection is closed
- Python doesn't detect closure until attempting read/write
- No way to "keep connection open for new streams" if it's already closed
Technical Details
Failure Sequence (from test logs):
1. Python creates ping stream successfully
2. Python sends ping data
3. Python tries to read ping response (stream.read(PING_LENGTH))
4. During the read, yamux tries to send window update (after reading data)
5. WebSocket write fails: "connection closed" - Peer has already closed the WebSocket
6. WebSocket read fails: "expected 2 bytes but received 0"
Key Code Locations:
-
libp2p/transport/websocket/connection.py(lines 341-353):- Returns
b""when connection closes instead of raising an exception - This causes
read_exactly()to retry indefinitely - Connection closure is detected reactively during operations
- Returns
-
libp2p/io/utils.py(lines 27-30):- Generic error message doesn't provide context about transport type
- No indication that this is a WebSocket-specific issue
-
libp2p/stream_muxer/yamux/yamux.py(lines 648-656):- Generic error logging doesn't include transport context
- Makes debugging difficult
Proposed Solution
1. Implement Proactive Closure Detection
Goal: Detect WebSocket connection closure through stream polling before attempting operations, not during operations.
Changes Needed:
- Monitor WebSocket connection state during stream polling
- Treat EOF as
Ok(0)(normal condition), not an exception - Track connection state and skip operations on closed connections
- Detect closure during stream polling, not during read/write operations
2. Graceful EOF Handling
Current Issue: When WebSocket connection closes, connection.py:read() returns b"", causing read_exactly() to retry indefinitely.
Fix: Raise IOException with clear message when connection closes, instead of returning empty bytes.
File: libp2p/transport/websocket/connection.py
Before (lines 341-353):
except Exception as e:
if ("CloseReason" in error_str or "ConnectionClosed" in error_type):
self._closed = True
return b"" # β Causes read_exactly() to retry indefinitelyAfter:
except Exception as e:
if ("CloseReason" in error_str or "ConnectionClosed" in error_type):
self._closed = True
close_code = getattr(e, 'code', None)
close_reason = getattr(e, 'reason', None) or "Connection closed by peer"
raise IOException(
f"WebSocket connection closed by peer during read operation: "
f"code={close_code}, reason={close_reason}. "
f"This may indicate the peer closed the connection, a network issue, "
f"or a protocol error during yamux stream operation."
)3. Enhanced Error Messages
File: libp2p/io/utils.py
Add transport context to IncompleteReadError messages:
raise IncompleteReadError(
f"Connection closed during read operation: expected {n} bytes but received {len(buffer)} bytes. "
f"This may indicate the peer closed the connection prematurely, a network issue, "
f"or a transport-specific problem (e.g., WebSocket message boundary handling)."
f"{context_info}"
)4. Connection State Tracking
Add connection state monitoring to detect closure earlier:
- Check connection state before operations
- Handle WebSocket close frames proactively
- Detect connection closure from read operations immediately
Code Changes Summary
| File | Lines | Change Type | Description |
|---|---|---|---|
libp2p/io/utils.py |
27-30 | Error message enhancement | Add transport context to IncompleteReadError |
libp2p/transport/websocket/connection.py |
341-353 | Bug fix | Raise IOException instead of returning b"" on close |
libp2p/transport/websocket/connection.py |
230-245 | Bug fix | Better close detection in read(n=None) path |
libp2p/stream_muxer/yamux/yamux.py |
648-656 | Error logging enhancement | Add transport context to error logs |
Expected Improvements
Before Fix:
Connection closed: expected 2 bytes but received 0 (connection may have been reset by peer)
After Fix:
WebSocket connection closed by peer during read operation: code=1000, reason=Connection closed by peer. This may indicate the peer closed the connection, a network issue, or a protocol error during yamux stream operation. (transport: websocket, duration: 0.12s)
Benefits:
- Faster failure detection: Connection closure detected immediately (raises exception)
- No infinite retries:
read_exactly()won't retry indefinitely - Better debugging: Error messages include transport type and connection state
- Clear indication: WebSocket-specific issues are clearly identified
Testing
Unit Tests
Create test file: tests/core/transport/websocket/test_websocket_yamux_nim_compat.py
Test scenarios:
- Connection closes during yamux header read
- Connection closes during yamux data read
- WebSocket message boundaries during yamux operations
- Error message clarity and context
Interoperability Tests
Run the failing tests:
cd transport-interop
SAVE_LOGS=all WORKER_COUNT=1 npm test -- --name-filter="python-v0.4 x nim-v1.14 (ws, noise, yamux)"References
Detailed Analysis Documents
All detailed analysis documents are available in downloads/WebSocketIssue/:
PYTHON_WEBSOCKET_ROOT_CAUSE_ANALYSIS.md- Comprehensive root cause analysisRUST_WEBSOCKET_LIFECYCLE_ANALYSIS.md- Deep dive into how rust-libp2p handles WebSocket lifecycle (critical for understanding the solution)PYTHON_WEBSOCKET_FIX_PROPOSAL.md- Detailed fix proposal with code changesCODE_INTEGRATION_SUMMARY.md- Quick reference for code changesMAINTAINER_REPORT.md- Overview of all test failuresREADME.md- Guide to the documentation package
Related Issues and PRs
- Python libp2p PR: feat: Enhance WebSocket transport with advanced featuresΒ #964
- Rust libp2p PR #4568: "Implement refactored Transport" (fixes similar issues in Rust)
- Test plans repository: https://github.com/libp2p/test-plans
Python Implementation Details
- Commit SHA:
926fae4476965aa5f194ce46c128e3b3b7d5a55b - Test Implementation:
transport-interop/impl/python/v0.4/ping_test.py - Version: v0.4
Implementation Priority
- High Priority: Implement proactive closure detection in
libp2p/transport/websocket/connection.py - High Priority: Update
libp2p/io/utils.pyto handle EOF gracefully - Medium Priority: Improve error messages with transport context
- Medium Priority: Add connection state tracking
- Low Priority: Add test cases for WebSocket closure scenarios
Conclusion
The root cause is Python's reactive closure detection vs Rust's proactive closure detection. Python only discovers connection closure when attempting read/write operations, while Rust detects closure through stream polling before attempting operations. This causes Python to attempt window updates on already-closed connections, triggering error cascades.
The fix requires implementing proactive connection state tracking and graceful EOF handling, similar to how rust-v0.54+ handles WebSocket connections. This will allow Python to detect connection closure early and avoid problematic operations on closed connections.