Skip to content

TODO: add read/write lock#748

Merged
seetadev merged 15 commits intolibp2p:mainfrom
Jineshbansal:add-read-write-lock
Jul 19, 2025
Merged

TODO: add read/write lock#748
seetadev merged 15 commits intolibp2p:mainfrom
Jineshbansal:add-read-write-lock

Conversation

@Jineshbansal
Copy link
Contributor

@Jineshbansal Jineshbansal commented Jul 7, 2025

Purpose

Implements the pending TODO for read/write synchronization in MplexStream.

Logic Behind the Lock

  1. Writing is blocked if any read is in progress
  2. Reading is blocked if a write is in progress
  3. Multiple reads are allowed concurrently
  4. Only one write is allowed at a time

Steps Taken

  1. Introduced a custom ReadWriteLock class to handle concurrent access
  2. Instantiated rw_lock in MplexStream.init
  3. Wrapped read() and write() with appropriate rw_lock methods, without changing their internal logic

Tests

To validate the above logic, a unit test has been added in libp2p/stream_muxer/mplex/test_read_write_lock.py.

@Jineshbansal
Copy link
Contributor Author

Please take a look at this @seetadev, If you tell me some pointers on how to setup test files, that would be really helpful

@seetadev
Copy link
Contributor

seetadev commented Jul 7, 2025

@Jineshbansal : Welcome to py-libp2p :) Thank you for submitting this PR. Appreciate your efforts and initiative.

CCing @lla-dane, @acul71, @kaneki003 for their feedback and review on your PR. They will also help you in setting up the test suite, add key scenarios, newsfragment and arrive at a good conclusion on the issue addressed via the PR.

Also, CCing @guha-rahul and @sumanjeet0012. Would like them to review your PR with the team and share feedback too.

@lla-dane
Copy link
Contributor

lla-dane commented Jul 8, 2025

Also @Jineshbansal: Take a look at this file: newsfragment-readme, you can add a newsfragment file as per instructions here.

@acul71
Copy link
Contributor

acul71 commented Jul 8, 2025

@Jineshbansal

Review: Locking Mechanism in MplexStream and Trio Refactor

📄 Original Implementation

In the current implementation (mplex_stream.py), the MplexStream class uses threading.Lock to manage concurrent access to the internal _closed state and the send_message calls.

Lock Usage

import threading

self._lock = threading.Lock()

def write(self, data: bytes) -> None:
    with self._lock:
        if self._closed:
            raise StreamClosedError("Stream is closed")
        self._muxer.send_message(self._id, MessageType.DATA, data)

This works for synchronous, threaded programs, but py-libp2p uses Trio for its concurrency model, which is built on coroutines and structured concurrency, not threads.

⚠️ Problems with threading.Lock in Trio

  • Incompatible: Blocking locks like threading.Lock are unsafe in async Trio code. They can block the entire event loop.
  • Not async-aware: If used in async functions, threading.Lock can lead to deadlocks or reduced performance.
  • Non-cooperative: Prevents concurrency primitives like cancellation and timeouts from working properly.

✅ Suggested Fix: Use trio.Lock

Trio-Compatible Refactor

Replace threading.Lock with trio.Lock and update all methods that use the lock to be async def:

import trio

self._lock = trio.Lock()

async def write(self, data: bytes) -> None:
    async with self._lock:
        if self._closed:
            raise StreamClosedError("Stream is closed")
        await self._muxer.send_message(self._id, MessageType.DATA, data)

Context Management

Also update context management methods:

async def __aenter__(self) -> "MplexStream":
    return self

async def __aexit__(self, exc_type, exc, tb) -> None:
    await self.close()

🔁 Full Method Changes

  • write()async def write()
  • close()async def close()
  • reset()async def reset()
  • __enter__()__aenter__()
  • __exit__()__aexit__()

🧩 Muxer Assumption

This refactor assumes that self._muxer.send_message(...) is also an async def function. If it's still sync, wrap it with:

await trio.to_thread.run_sync(lambda: self._muxer.send_message(...))

✅ Benefits of Trio Refactor

  • Compatible with the rest of py-libp2p (Trio-based)
  • Prevents blocking the event loop
  • Supports structured concurrency and cancellation
  • Cleaner and more testable async code

🛠️ Next Steps

  • Update Mplex.send_message() to be async def if it isn't already.
  • Update all call sites to use await stream.write(...), await stream.close(), etc.
  • Ensure tests run in an async context using pytest-trio or similar.

⚠️ Race Condition Concerns During reset() and close()

Both reset() and close() mutate the _closed state and send control messages. Without proper locking, race conditions can occur:

  • Concurrent calls to reset() and close() might cause inconsistent stream states or duplicated messages.
  • Without the lock, two coroutines could both check _closed as False and proceed to send conflicting messages (CLOSE and RESET) simultaneously.
  • This may lead to protocol errors or unexpected behavior in peers.

How Trio Lock Helps

Using a trio.Lock and the pattern:

async with self._lock:
    if self._closed:
        return
    self._closed = True
    await self._muxer.send_message(...)

ensures:

  • Mutual exclusion: Only one coroutine can mutate _closed at a time.
  • Consistent state: Once _closed is set, subsequent calls no-op.
  • Safe message sending: Avoids sending contradictory or duplicated control frames.

Summary

Proper locking around reset() and close() is critical to:

  • Maintain stream state consistency.
  • Prevent subtle race conditions in async concurrent contexts.
  • Guarantee protocol compliance and stability.

@acul71
Copy link
Contributor

acul71 commented Jul 8, 2025

Example tests with trio

import pytest
import trio
from unittest.mock import AsyncMock, MagicMock

from libp2p.stream_muxer.exceptions import StreamClosedError
from libp2p.stream_muxer.mplex.constants import MessageType
from libp2p.stream_muxer.mplex.mplex_stream import MplexStream


@pytest.fixture
def mock_muxer():
    muxer = MagicMock()
    muxer.send_message = AsyncMock()
    return muxer


@pytest.mark.trio
async def test_write_calls_send_message(mock_muxer):
    stream = MplexStream(mock_muxer, 1)
    await stream.write(b"hello")
    mock_muxer.send_message.assert_awaited_once_with(1, MessageType.DATA, b"hello")


@pytest.mark.trio
async def test_write_raises_if_closed(mock_muxer):
    stream = MplexStream(mock_muxer, 1)
    await stream.close()
    with pytest.raises(StreamClosedError):
        await stream.write(b"data")


@pytest.mark.trio
async def test_close_sends_close_message_once(mock_muxer):
    stream = MplexStream(mock_muxer, 1)
    await stream.close()
    await stream.close()  # second call no-op
    mock_muxer.send_message.assert_awaited_once_with(1, MessageType.CLOSE)


@pytest.mark.trio
async def test_reset_sends_reset_message_and_closes(mock_muxer):
    stream = MplexStream(mock_muxer, 1)
    await stream.reset()
    assert stream.is_closed()
    mock_muxer.send_message.assert_awaited_once_with(1, MessageType.RESET)


@pytest.mark.trio
async def test_is_closed_reflects_state(mock_muxer):
    stream = MplexStream(mock_muxer, 1)
    assert not stream.is_closed()
    await stream.close()
    assert stream.is_closed()


@pytest.mark.trio
async def test_async_context_manager_calls_close(mock_muxer):
    async with MplexStream(mock_muxer, 1) as stream:
        await stream.write(b"test")
    mock_muxer.send_message.assert_any_await(1, MessageType.CLOSE)


# --- New tests for Trio locking behavior ---


@pytest.mark.trio
async def test_concurrent_write_lock(mock_muxer):
    stream = MplexStream(mock_muxer, 1)

    async def writer(data):
        await stream.write(data)

    async with trio.open_nursery() as nursery:
        nursery.start_soon(writer, b"hello")
        nursery.start_soon(writer, b"world")

    assert mock_muxer.send_message.await_count == 2


@pytest.mark.trio
async def test_concurrent_close_reset_lock(mock_muxer):
    stream = MplexStream(mock_muxer, 1)

    async def do_close():
        await stream.close()

    async def do_reset():
        await stream.reset()

    async with trio.open_nursery() as nursery:
        nursery.start_soon(do_close)
        nursery.start_soon(do_reset)

    # Only one control message should be sent (CLOSE or RESET)
    assert mock_muxer.send_message.await_count == 1


@pytest.mark.trio
async def test_write_after_close_raises(mock_muxer):
    stream = MplexStream(mock_muxer, 1)
    await stream.close()
    with pytest.raises(StreamClosedError):
        await stream.write(b"data")

@Jineshbansal Jineshbansal force-pushed the add-read-write-lock branch from 27c99d8 to e65e38a Compare July 8, 2025 13:42
@Jineshbansal Jineshbansal requested a review from lla-dane July 8, 2025 14:36
@Jineshbansal
Copy link
Contributor Author

Jineshbansal commented Jul 8, 2025

@lla-dane @acul71 Iks I had done all the recommended changes, please review it

@acul71
Copy link
Contributor

acul71 commented Jul 8, 2025

Review of MplexStream ReadWriteLock Implementation and Tests

Overview

This document contains a focused review of the ReadWriteLock integration within the MplexStream class of py-libp2p, including both the implementation and its unit tests.

The goal is to assess correctness, concurrency safety, and test robustness, and to offer actionable improvements with code suggestions.


✅ Strengths

Implementation

  • The ReadWriteLock is well-encapsulated and uses Trio's condition variables effectively.
  • Read/write access to the stream is protected with lock acquisition (acquire_read()/acquire_write()).
  • The MplexStream.read() and write() methods correctly guard access with rw_lock, enforcing mutual exclusion.

Tests

  • The test suite effectively verifies:
    • Mutual exclusion between readers and writers.
    • Concurrent access for multiple readers.
    • Only one writer can hold the lock at a time.
    • Correct interleaving behavior.

🔧 Implementation Suggestions

1. Use Async Context Manager for Lock Usage

Currently:

await self.rw_lock.acquire_read()
try:
    ...
finally:
    await self.rw_lock.release_read()

This is error-prone and verbose. Suggest adding context manager support:

🔄 Proposed change to ReadWriteLock:

class ReadWriteLock:
    ...

    @asynccontextmanager
    async def read_lock(self):
        await self.acquire_read()
        try:
            yield
        finally:
            await self.release_read()

    @asynccontextmanager
    async def write_lock(self):
        await self.acquire_write()
        try:
            yield
        finally:
            self.release_write()

✅ Then usage becomes:

async with self.rw_lock.read_lock():
    ...

This improves readability and ensures safer usage throughout MplexStream.


2. Handle Cancellation During Lock Acquisition

Currently, if a task is cancelled while acquiring the lock, the internal counters and waiters (_writer_waiting, _readers, _writer, etc.) may be left in an inconsistent state. This can lead to deadlocks or incorrect behavior (e.g., a lock that is never released).

🛡 Suggested Improvement

Wrap lock acquisition logic in a try/except trio.Cancelled block. Ensure the task cleans up any registration it did before cancellation (e.g., removing itself from waiters).

✍️ Example for acquire_read

Before (prone to inconsistent state on cancellation):

async def acquire_read(self) -> None:
    async with self._lock:
        while self._writer or self._writer_waiting:
            await self._no_writer.wait()
        self._readers += 1

After (safe from cancellation during wait):

async def acquire_read(self) -> None:
    try:
        async with self._lock:
            while self._writer or self._writer_waiting:
                await self._no_writer.wait()
            self._readers += 1
    except trio.Cancelled:
        # Optional: decrement wait count or log cancellation
        raise

✍️ Example for acquire_write with wait queue tracking

Before:

async def acquire_write(self) -> None:
    async with self._lock:
        self._writer_waiting = True
        while self._writer or self._readers > 0:
            await self._no_writer.wait()
        self._writer = True
        self._writer_waiting = False

After (safe from cancellation, preserves state):

async def acquire_write(self) -> None:
    self._writer_waiting = True
    try:
        async with self._lock:
            while self._writer or self._readers > 0:
                await self._no_writer.wait()
            self._writer = True
    except trio.Cancelled:
        # Cleanup the waiting flag if cancelled
        async with self._lock:
            self._writer_waiting = False
        raise
    else:
        self._writer_waiting = False

✅ Why This Matters

Trio’s cancellation is immediate and guaranteed, which means any side effects before an await must be carefully undone if the task is cancelled during that await. Otherwise, shared state (_writer_waiting, _readers) can remain in a corrupt state, leading to:

  • 💥 Deadlocks
  • 🐛 Incorrect concurrent behavior
  • 🧪 Flaky test failures

3. Naming Clarity (Optional)

  • _no_writer could be renamed to something like _writer_condition or _no_writer_or_pending_write to better reflect semantics.

4. Consider Adding Logging (Optional)

For debugging lock behavior in complex concurrency cases, optional debug logs using Python’s logging module can help trace:

  • When a read/write is requested, granted, or released
  • When tasks block and resume

🔬 Test Suggestions

1. Replace Time-Based Coordination with Events

Current test synchronization is sleep-based:

await trio.sleep(0.05)

This may cause flakiness. Replace with trio.Event:

start_event = trio.Event()

async def reader():
    await start_event.wait()
    ...

# in test
nursery.start_soon(reader)
start_event.set()

2. Add Tests for Stream-Level Lock Integration

Tests only verify lock behavior, not stream read/write correctness under concurrency.

➕ Suggested new test:

@pytest.mark.trio
async def test_stream_write_protected_by_rwlock(stream_with_lock):
    stream, _ = stream_with_lock
    stream.rw_lock.acquire_write = AsyncMock()
    stream.rw_lock.release_write = MagicMock()

    await stream.write(b"test")

    stream.rw_lock.acquire_write.assert_awaited_once()
    stream.rw_lock.release_write.assert_called_once()

Likewise, test read() with read-lock instrumentation.


3. Test Reset, EOF, and Close Interactions

Missing coverage:

  • Reading from a closed stream
  • Writing to a reset stream
  • Behavior when EOF is reached

➕ Suggested test:

@pytest.mark.trio
async def test_read_after_stream_closed(stream_with_lock):
    stream, send_chan = stream_with_lock
    stream.event_remote_closed.set()
    data = await stream.read(100)
    assert data == b"", "Expected EOF after remote closed"

✨ Summary of Key Suggestions

Area Suggestion
Lock API Add context manager support (read_lock, write_lock)
Cancellation Safety Handle Cancelled exceptions to maintain state
Tests Replace sleep() with Event for reliability
New Tests Cover actual read()/write() calls with lock instrumentation
Edge Cases Add tests for reset, close, EOF behavior

✅ Conclusion

The current implementation is structurally sound and aligns with Trio concurrency principles. A few refinements — especially regarding context managers, error/cancellation handling, and expanded test cases — will significantly improve robustness, maintainability, and test reliability.

@acul71
Copy link
Contributor

acul71 commented Jul 8, 2025

@Jineshbansal Well done, if you can refine: better.
tests should be placed here: tests/core/stream_muxer/test_mplex_stream.py

@Jineshbansal Jineshbansal force-pushed the add-read-write-lock branch from fa7601f to 9a950f6 Compare July 13, 2025 13:03
@Jineshbansal Jineshbansal force-pushed the add-read-write-lock branch from 9a950f6 to 9cd3805 Compare July 13, 2025 13:08
@Jineshbansal
Copy link
Contributor Author

Jineshbansal commented Jul 13, 2025

@acul71 @lla-dane @seetadev please review it, imo I had made all the recommended changes ie extend the testing also with handling of the cancellation of read or write task

@seetadev
Copy link
Contributor

seetadev commented Jul 16, 2025

@Jineshbansal : This is indeed close to final review and merge. Great work, Jinesh. Please add a newsfragment file.

Thank you @lla-dane and @acul71 for your continued support to Jinesh on this PR. Appreciate it.

@seetadev
Copy link
Contributor

@lla-dane , @acul71 : Please share any final review points and comments. Once @Jineshbansal adds the newsfragment file, I'll merge it.

@Jineshbansal : Also, please add developer docs and demo to the discussion page. Appreciate your initiative. Kindly get in touch with @lla-dane on this effort.

@pacrob
Copy link
Member

pacrob commented Jul 16, 2025

@Jineshbansal - why are the graves being removed throughout?

@Jineshbansal
Copy link
Contributor Author

Earlier I thought this changes are comming from main branch, what now I noticed it, It may be due to I had run some python command or something. Sorry for that I will update this PR.
@pacrob Thanks for catching this up, any other suggestions

@acul71
Copy link
Contributor

acul71 commented Jul 17, 2025

@Jineshbansal
The test tests/core/stream_muxer/test_mplex_stream.py::test_mplex_stream_both_close may fail due to race conditions (though it might also pass depending on environmental timing).
If you're on Windows, could you try running it to see if the issue occurs?

@acul71
Copy link
Contributor

acul71 commented Jul 17, 2025

@Jineshbansal
Maybe we can ignore the windows error, because it's only one....

The PR is good, see if you can discover more on the windows error (easiest: increase sleep time, better: poll in a timeout for an event)

ReadWriteLock Implementation Analysis

Overview

This document analyzes the ReadWriteLock implementation in the py-libp2p repository, specifically examining whether it fulfills the requirements mentioned in PR #748.

Current Implementation Status

IMPLEMENTED: ReadWriteLock Class

The repository contains a complete ReadWriteLock implementation in libp2p/stream_muxer/mplex/mplex_stream.py:

class ReadWriteLock:
    """
    A read-write lock that allows multiple concurrent readers
    or one exclusive writer, implemented using Trio primitives.
    """

    def __init__(self) -> None:
        self._readers = 0
        self._readers_lock = trio.Lock()  # Protects access to _readers count
        self._writer_lock = trio.Semaphore(1)  # Allows only one writer at a time

    async def acquire_read(self) -> None:
        """Acquire a read lock. Multiple readers can hold it simultaneously."""
        try:
            async with self._readers_lock:
                if self._readers == 0:
                    await self._writer_lock.acquire()
                self._readers += 1
        except trio.Cancelled:
            raise

    async def release_read(self) -> None:
        """Release a read lock."""
        async with self._readers_lock:
            if self._readers == 1:
                self._writer_lock.release()
            self._readers -= 1

    async def acquire_write(self) -> None:
        """Acquire an exclusive write lock."""
        try:
            await self._writer_lock.acquire()
        except trio.Cancelled:
            raise

    def release_write(self) -> None:
        """Release the exclusive write lock."""
        self._writer_lock.release()

    @asynccontextmanager
    async def read_lock(self) -> AsyncGenerator[None, None]:
        """Context manager for acquiring and releasing a read lock safely."""
        acquire = False
        try:
            await self.acquire_read()
            acquire = True
            yield
        finally:
            if acquire:
                with trio.CancelScope() as scope:
                    scope.shield = True
                    await self.release_read()

    @asynccontextmanager
    async def write_lock(self) -> AsyncGenerator[None, None]:
        """Context manager for acquiring and releasing a write lock safely."""
        acquire = False
        try:
            await self.acquire_write()
            acquire = True
            yield
        finally:
            if acquire:
                self.release_write()

IMPLEMENTED: Integration with MplexStream

The ReadWriteLock is properly integrated into the MplexStream class:

class MplexStream(IMuxedStream):
    # ... other attributes ...
    rw_lock: ReadWriteLock

    def __init__(self, ...):
        # ... other initialization ...
        self.rw_lock = ReadWriteLock()

    async def read(self, n: int | None = None) -> bytes:
        async with self.rw_lock.read_lock():
            # ... read implementation ...

    async def write(self, data: bytes) -> None:
        async with self.rw_lock.write_lock():
            # ... write implementation ...

IMPLEMENTED: Comprehensive Test Suite

The implementation includes extensive tests in tests/core/stream_muxer/test_read_write_lock.py covering:

  1. Basic Lock Functionality

    • Stream write protection by RW lock
    • Stream read protection by RW lock
    • Multiple readers can coexist
    • Writer blocks readers
  2. Concurrency Testing

    • Writer waits for readers to finish
    • Lock behavior during cancellation
    • Concurrent read-write sequences
    • Multiple readers can read concurrently
  3. Edge Cases

    • Read after remote close triggers EOF
    • Read on closed stream raises EOF
    • Write to locally closed stream raises
    • Read from reset stream raises
    • Write to reset stream raises
    • Stream reset cleans up resources

Key Features of the Implementation

1. Trio-Based Implementation

  • Uses trio.Lock() and trio.Semaphore() for thread safety
  • Properly handles async/await patterns
  • Integrates well with Trio's cancellation system

2. Context Manager Support

  • Provides read_lock() and write_lock() context managers
  • Ensures proper cleanup even during exceptions
  • Uses trio.CancelScope with shielding for read lock release

3. Proper Lock Semantics

  • Multiple readers can hold the lock simultaneously
  • Only one writer can hold the lock at a time
  • Writers are blocked until all readers release their locks
  • Readers are blocked when a writer holds the lock

4. Integration with Stream Operations

  • All read operations are protected by read locks
  • All write operations are protected by write locks
  • Maintains stream state consistency during concurrent access

Comparison with Requirements

Based on the PR comment analysis, the current implementation appears to fulfill the core requirements:

Race Condition Prevention

  • The ReadWriteLock prevents race conditions between concurrent read/write operations
  • Multiple readers can operate safely without interference
  • Writers get exclusive access when needed

Thread Safety

  • Uses Trio primitives for proper async thread safety
  • Handles cancellation correctly
  • Provides context managers for safe resource management

Performance Considerations

  • Allows concurrent reads for better performance
  • Minimal overhead for single-threaded operations
  • Proper cleanup to prevent resource leaks

Additional Observations

1. Test Coverage

The test suite is comprehensive and covers:

  • Basic functionality
  • Concurrency scenarios
  • Edge cases and error conditions
  • Resource cleanup

2. Code Quality

  • Well-documented with clear docstrings
  • Follows Python async/await best practices
  • Proper error handling and exception propagation

3. Integration

  • Seamlessly integrated into the existing MplexStream implementation
  • Doesn't break existing functionality
  • Maintains backward compatibility

Conclusion

The current py-libp2p repository fully implements the ReadWriteLock functionality as requested. The implementation:

  1. ✅ Provides a complete ReadWriteLock class with proper async semantics
  2. ✅ Integrates the lock into the MplexStream for protecting read/write operations
  3. ✅ Includes comprehensive test coverage
  4. ✅ Handles edge cases and error conditions properly
  5. ✅ Uses Trio primitives for thread safety
  6. ✅ Provides context managers for safe resource management

The implementation appears to be production-ready and addresses the concurrency issues that the original PR was intended to solve.

⚠️ Known Issues and Recommendations

Windows-Specific Test Failure (Hypothesis: Race Condition)

Issue: The test test_mplex_stream_both_close in tests/core/stream_muxer/test_mplex_stream.py has been reported to fail on Windows, as mentioned in the PR comment by @acul71:

"The test tests/core/stream_muxer/test_mplex_stream.py::test_mplex_stream_both_close may fail due to race conditions (though it might also pass depending on environmental timing). If you're on Windows, could you try running it to see if the issue occurs?"

Current Status: This is currently a hypothesis that needs validation. The actual root cause has not been determined.

Hypothesized Root Causes (to be validated):

  1. Race conditions in stream closure operations
  2. Platform-specific timing differences between Windows and Unix-like systems
  3. Timing sensitivity in the trio.sleep(0.01) calls used for synchronization
  4. Different scheduling behavior on Windows
  5. Potential interaction issues between close_lock and rw_lock

Validation Steps Needed:

  1. Reproduce the Issue:

    # Run the specific test on Windows multiple times
    pytest tests/core/stream_muxer/test_mplex_stream.py::test_mplex_stream_both_close -v --tb=short
    
    # Run with different Python versions on Windows
    python -m pytest tests/core/stream_muxer/test_mplex_stream.py::test_mplex_stream_both_close -v
  2. Add Debugging Information:

    # Add logging to understand the timing
    import logging
    logging.basicConfig(level=logging.DEBUG)
    
    async def test_mplex_stream_both_close(mplex_stream_pair):
        stream_0, stream_1 = mplex_stream_pair
        logging.debug("Test started")
        
        # Add debug prints at each step
        logging.debug("Closing stream_0")
        await stream_0.close()
        logging.debug("stream_0 closed, sleeping")
        await trio.sleep(0.01)
        logging.debug("After sleep, checking states")
        # ... rest of test
  3. Investigate Platform Differences:

    • Compare test execution times between Windows and Linux
    • Check if the issue occurs with different Trio versions
    • Test with different Python versions (3.11, 3.12)
  4. Stress Test the Hypothesis:

    # Create a stress test to reproduce race conditions
    @pytest.mark.trio
    async def test_mplex_stream_both_close_stress(mplex_stream_pair):
        """Stress test to reproduce race conditions"""
        for i in range(100):
            stream_0, stream_1 = mplex_stream_pair
            await stream_0.close()
            await trio.sleep(0.001)  # Very short sleep
            await stream_1.close()
            await trio.sleep(0.001)
  5. Analyze the Actual Error:

    • Capture the full error traceback when it occurs
    • Check if it's consistently the same error or varies
    • Determine if it's a timeout, assertion failure, or exception

Recommended Solutions:

  1. Add Platform-Specific Test Markers:

    @pytest.mark.skipif(
        sys.platform == "win32",
        reason="Known race condition on Windows - needs investigation"
    )
    async def test_mplex_stream_both_close(mplex_stream_pair):
  2. Improve Test Synchronization:

    • Replace trio.sleep(0.01) with proper event-based synchronization
    • Use wait_all_tasks_blocked() more consistently
    • Add explicit state checks before proceeding
  3. Add Retry Logic for Flaky Tests:

    @pytest.mark.flaky(reruns=3, reruns_delay=0.1)
    async def test_mplex_stream_both_close(mplex_stream_pair):
  4. Investigate ReadWriteLock Integration:

    • Ensure the ReadWriteLock properly protects the close/reset operations
    • Consider adding additional synchronization around stream cleanup
    • Review the interaction between close_lock and rw_lock
  5. Add Windows-Specific Test Variants:

    • Create Windows-specific test cases with longer timeouts
    • Add more explicit state verification steps
    • Consider using different synchronization primitives for Windows

Priority: Low-Medium - The core ReadWriteLock functionality works correctly. This is a hypothesis about a Windows-specific test failure that needs validation before any action is taken.

Next Steps:

  1. Validate the hypothesis by reproducing the issue on Windows
  2. Investigate the root cause if the issue is confirmed
  3. Implement targeted fixes only after understanding the actual problem
  4. Consider if it's a test issue vs. implementation issue

Files Modified/Added

  • libp2p/stream_muxer/mplex/mplex_stream.py - Added ReadWriteLock class and integration
  • tests/core/stream_muxer/test_read_write_lock.py - Comprehensive test suite

Branch Information

  • Branch: add-read-write-lock
  • Status: Ready for review/merge with known Windows issue
  • Test Status: All tests passing on Unix-like systems, one known flaky test on Windows

@Jineshbansal
Copy link
Contributor Author

@acul71 I am using Mac maybe therefore in my system it is passing.
Also @seetadev I had already added newfragment in my PR, as @lla-dane have already suggested it. Also what do you mean by adding developer docs and demo to the discussion page. Can you kindly give me some docs or example about it

@acul71
Copy link
Contributor

acul71 commented Jul 17, 2025

Also what do you mean by adding developer docs and demo to the discussion page. Can you kindly give me some docs or example about it

The discussion page I think is here: #723
You have to prepare something (a md document, a short demo video) in witch you explain what you did, how's the fix is beneficial, and relevant documentation that shows where relevant part of code have been changed, new parameters, other relevant changes, maybe the test you wrote what they do,
See this example documentation for you PR here:
#723 (comment)

@Jineshbansal
Copy link
Contributor Author

@acul71 you have already make one discussion page that lgtm. Can't we only use that?

@acul71
Copy link
Contributor

acul71 commented Jul 18, 2025

@acul71 you have already make one discussion page that lgtm. Can't we only use that?

Yes sorry can you give me the right link?
Thanks.

@Jineshbansal
Copy link
Contributor Author

Jineshbansal commented Jul 18, 2025

@acul71 I am talking about this discussion page

@acul71
Copy link
Contributor

acul71 commented Jul 18, 2025

@acul71 I am talking about this discussion page

All right then. proceed and ask for merging

@Jineshbansal
Copy link
Contributor Author

@seetadev Please take a final look if it is ready to merge

@seetadev
Copy link
Contributor

seetadev commented Jul 19, 2025

@Jineshbansal : Appreciate your efforts on this PR with a clean and well-reasoned implementation of read/write synchronization in MplexStream, and it's great to see that the locking logic allows concurrent reads while maintaining write exclusivity — very true to Mplex's design intent.

The introduction of a custom ReadWriteLock class and its seamless integration into the read() and write() flows (without altering their internal logic) makes this both elegant and easy to maintain. Also, the new test case in test_read_write_lock.py provides a solid validation of concurrency behavior.

Thank you for working on our feedback points. Also, wish to thank @acul71 for his guidance and pointers, and for reviewing the PR. Since this PR touches core stream safety and concurrency (critical for real-world performance and stability), would you be open to adding the following as part of the PR discussion or documentation?

Developer Notes / Guide

To help new contributors understand and extend this work, a short write-up covering:

  • Motivation behind the ReadWriteLock and why it's needed for MplexStream.
  • Summary of how the lock works:
    • Multiple concurrent reads
    • Writes block reads and vice versa
    • Only one write at a time
  • How this differs from Python’s standard locks (and why a custom one was needed).
  • Any caveats or future improvement paths (e.g., fairness, starvation).

Screenshots / Screencasts

If you can spare the time, adding:

  • A screenshot or terminal output showing concurrent reads + blocked writes (maybe from the test log).
  • A short screencast or animated GIF demonstrating the lock in action (e.g., spawning multiple readers/writers and watching the blocking behavior live).

These additions would be super helpful for onboarding new developers to libp2p.stream_muxer, and make this PR a great reference for future concurrency-related improvements.

Really appreciate your clean and focused work on this — locking logic is tricky, and this one is well done.

@seetadev
Copy link
Contributor

seetadev commented Jul 19, 2025

@Jineshbansal : This looks ready to merge. As discussed above, please prepare the developer docs and add them to the discussion page.

Wish to also thank @lla-dane for his review and feedback. Appreciate the pointers and guidance.

@seetadev seetadev merged commit e6a355d into libp2p:main Jul 19, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants