Skip to content

UDPSocket#close hangs #368

@jscheid

Description

@jscheid

Hi, I'm trying to debug an issue where a call to statsd.timing (see https://github.com/reinh/statsd) sometimes hangs when invoked from within an Async task, with latest everything (see environment below).

I've tracked it down to the following Fiber backtrace:

/usr/local/bundle/gems/async-2.21.1/lib/async/scheduler.rb:157:in `transfer'
/usr/local/bundle/gems/async-2.21.1/lib/async/scheduler.rb:157:in `transfer'
/usr/local/bundle/gems/async-2.21.1/lib/async/scheduler.rb:245:in `kernel_sleep'
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:453:in `close'
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:453:in `block in connect'
(SNIP)
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:451:in `connect'
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:485:in `send_to_socket'
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:502:in `send_stats'
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:413:in `timing'

(This is the last line executed in "user" code.)

It just sits there indefinitely. I'm not sure that this happens every time that close is called in an Async block, but I can reproduce it reliably. Outside of Async it never hangs.

The UDPSocket connects to a port without listener. Looking at statsd source, the sequence of events should be something like:

# The following three lines are executed outside the Async block (possibly on a different thread)
socket = UDPSocket.new(Socket::AF_INET)
socket.connect('127.0.0.1', 45678) # any port that's not listening
socket.write('...') # first message goes OK (socket doesn't know yet nobody is listening)

# possibly more writes/reconnects outside Async block here (from multiple threads)

Sync do
  Async do
    socket.write('...') rescue nil # second message raises Errno::ECONNREFUSED
    # the error causes statsd to try and reconnect
    socket.flush
    socket.close # hangs here
  end
end

I've tried creating a test case based on the above but so far I can't reproduce it in isolation.

I'm a bit lost now - I might keep trying to reduce it to an isolated repro. In the meantime, do you have any ideas that could point me in the right direction?

My environment:
async (2.21.1)
io-event (1.7.5) (also tested with 1.7.4)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [aarch64-linux] (also tested with ruby 3.3.1) - via ruby:3.4.1-bullseye Docker image
Docker version 27.3.1, build ce12230
Linux df0dd8712c2d 6.12.5-orbstack-00287-gf8da5d508983 #19 SMP Tue Dec 17 08:07:20 UTC 2024 aarch64 GNU/Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions