Skip to content

Conversation

@ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Oct 14, 2025

This patch extends the fdlock to serialize reads and writes by extending the reference counted lock with a read lock and a write lock, so taking a reference and locking acts as a single operation instead of two (1. acquire/release the lock; 2. take/return a reference). This avoids a race condition in the polling event loops:

  • Fiber 1 then Fiber 2 try to read from fd;
  • Since fd isn't ready, both fibers start waiting;
  • When fd becomes ready then Fiber 1 is resumed;
  • Fiber 1 doesn't read everything and returns;
  • Since events are edge-triggered, Fiber 2 won't be resumed!!!

With the read lock, fiber 2 will wait on the lock then be resumed by fiber 1 when it returns. A concrete example is multiple fibers waiting to accept on a socket where fiber 1 would keep handling connections, while fiber 2 sits idle.

The other benefit is that it can help to simplify the evloops that will now only deal with a single reader + single writer per IO and is required for the io_uring evloop (the MT version requires it).

NOTE: While this patch only serializes reads/writes on UNIX at the Crystal::System, which is where the bugs are, we may want to move it into stdlib for all targets at some point, for example to serialize reads and writes around IO::Buffered.

Depends on #16288 and #16289.
Required by #16264.

@ysbaddaden
Copy link
Contributor Author

I split the fdlock in two different commits (refcount then serial R/W) that outline the different steps for merging as individual PRs.

# acquire spinlock + forward declare pending waiter
m, success = @m.compare_and_set(m, (m | xspin | xwait) & ~clear, :acquire, :relaxed)
if success
waiters.value.push(pointerof(waiter))
Copy link
Contributor Author

@ysbaddaden ysbaddaden Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another missing trick from nsync: only new waiters shall be pushed to the end the queue, awoken fibers that still failed to lock shall be pushed to the start of the queue (another mechanism against starvation).

It's not implemented because Crystal::PointerLinkedList doesn't have an #unshift method (yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method added in #16287.

ysbaddaden added a commit to ysbaddaden/crystal that referenced this pull request Oct 18, 2025
ysbaddaden added a commit to ysbaddaden/crystal that referenced this pull request Oct 24, 2025
@ysbaddaden ysbaddaden force-pushed the feature/add-crystal-fd-lock-with-refcount-and-serial-rw branch from 904dd95 to 6be2dd7 Compare October 28, 2025 16:16
@ysbaddaden ysbaddaden changed the title Fix: closing fd is thread unsafe on UNIX targets UNIX: ensure single reader and writer to system fd Oct 28, 2025
Serializes reads and writes so we can assume any IO object will only
have at most one read op and one write op. The benefits are:

1. it avoids a race condition in the polling event loops:

   - Fiber 1 then Fiber 2 try to read from fd;
   - Since fd isn't ready so both are waiting;
   - When fd becomes ready then Fiber 1 is resumed;
   - Fiber 1 doesn't read everything and returns;
   - Fiber 2 won't be resumed because events are edge-triggered;

2. we can simplify the UNIX event loops (epoll, kqueue, io_uring) that
   are guaranteed to only have at most one reader and one writer at any
   time.
@ysbaddaden ysbaddaden force-pushed the feature/add-crystal-fd-lock-with-refcount-and-serial-rw branch from 6be2dd7 to b92814a Compare October 30, 2025 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

1 participant