-
-
Notifications
You must be signed in to change notification settings - Fork 103
Description
Hi, I'm trying to debug an issue where a call to statsd.timing (see https://github.com/reinh/statsd) sometimes hangs when invoked from within an Async task, with latest everything (see environment below).
I've tracked it down to the following Fiber backtrace:
/usr/local/bundle/gems/async-2.21.1/lib/async/scheduler.rb:157:in `transfer'
/usr/local/bundle/gems/async-2.21.1/lib/async/scheduler.rb:157:in `transfer'
/usr/local/bundle/gems/async-2.21.1/lib/async/scheduler.rb:245:in `kernel_sleep'
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:453:in `close'
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:453:in `block in connect'
(SNIP)
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:451:in `connect'
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:485:in `send_to_socket'
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:502:in `send_stats'
/usr/local/bundle/gems/statsd-ruby-1.5.0/lib/statsd.rb:413:in `timing'
(This is the last line executed in "user" code.)
It just sits there indefinitely. I'm not sure that this happens every time that close is called in an Async block, but I can reproduce it reliably. Outside of Async it never hangs.
The UDPSocket connects to a port without listener. Looking at statsd source, the sequence of events should be something like:
# The following three lines are executed outside the Async block (possibly on a different thread)
socket = UDPSocket.new(Socket::AF_INET)
socket.connect('127.0.0.1', 45678) # any port that's not listening
socket.write('...') # first message goes OK (socket doesn't know yet nobody is listening)
# possibly more writes/reconnects outside Async block here (from multiple threads)
Sync do
Async do
socket.write('...') rescue nil # second message raises Errno::ECONNREFUSED
# the error causes statsd to try and reconnect
socket.flush
socket.close # hangs here
end
endI've tried creating a test case based on the above but so far I can't reproduce it in isolation.
I'm a bit lost now - I might keep trying to reduce it to an isolated repro. In the meantime, do you have any ideas that could point me in the right direction?
My environment:
async (2.21.1)
io-event (1.7.5) (also tested with 1.7.4)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [aarch64-linux] (also tested with ruby 3.3.1) - via ruby:3.4.1-bullseye Docker image
Docker version 27.3.1, build ce12230
Linux df0dd8712c2d 6.12.5-orbstack-00287-gf8da5d508983 #19 SMP Tue Dec 17 08:07:20 UTC 2024 aarch64 GNU/Linux