-
Notifications
You must be signed in to change notification settings - Fork 63
Description
Hi gang,
I experienced a really interesting with some spurious errors with Socket::accept. After scratching my head for weeks I finally figured out what's going on.
I've been seeing random errors like this from my application:
E1106 08:47:24.286124 27054 socket.cpp:777] 0x3089600 accept(12): -1 (0, "Success") F1106 08:47:24.289738 27054 scheduler.cpp:427] mordor/socket.cpp(779): Throw in function void Mordor::Socket::accept(Mordor::Socket&) Dynamic exception type: boost::exception_detail::clone_impl std::exception::what: std::exception [Mordor::tag_backtrace*] = /opt/adfin/bin/petabucket(_ZN6Mordor6Socket6acceptERS0_+0xc49) [0xd94279] /opt/adfin/bin/petabucket(_ZN6Mordor6Socket6acceptEv+0x83) [0xd97483] ... [boost::errinfo_errno_*] = 11, "Resource temporarily unavailable" [boost::errinfo_api_function_*] = accept
Notice how the first error printed shows error being as 0 but the throw shows it as being EAGAIN.
Turns out that in Socket::accept when the error value gets assigned from errno. That errno gets evaluated to the old errno of the thread that first running this function before we switched context (via yield) and got switched back.
This is due to how errno is defined on Linux. If you look at the libc headers you'll see this. The macro for errno turns out to be a function call, and the function call is defined as const (http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html) So the compiler is allowed to aggressively cache the return value of that function.
extern int *__errno_location (void) __THROW __attribute__ ((__const__)); # if !defined _LIBC || defined _LIBC_REENTRANT /* When using threads, errno is a per-thread value. */ # define errno (*__errno_location ()) # endif # endif /* !__ASSEMBLER__ */ #endif /* _ERRNO_H */
The reason the exception has the right value is that the throwing of the exception is a few function calls that do not get inlined and call Mordor::getLastError().
We've only experienced this in one place, but this is going be problem for any linux system that performs the following series of calls on a scheduler that uses more the one thread as long as fibers are allowed to migrate among threads. In fact any kind of functions marked with attributes of const or pure will misbehave this way.
error_t error; error = errno; Scheduler::yeild(); error = errno;
I'm not sure what to do about this issue.
Thanks,
Milosz