-
Notifications
You must be signed in to change notification settings - Fork 7
Wait methods fixes [part 2] #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Wait methods fixes [part 2] #145
Conversation
214c640
to
aa5e506
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the patch! I believe it improves connector usability quite a lot.
LOG_DEBUG("waitAny() called on connector without connections"); | ||
return std::nullopt; | ||
} | ||
for (auto *conn : m_Connections) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think waitAny
still should trigger net provider to send pending requests (m_NetProvider.wait(0)
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have this logic in other wait methods, why should we have this logic in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, you are right. I need to think about it.
The problem is:
- You've sent some requests and received 10 responses.
- While processing these responses, you are sending new requests.
- Until you've processed all 10 responses, no new requests will be sent, so you'll observe latency spike after you're done with those 10 responses.
I'm not sure other methods are affected by this problem. But, IMHO, all wait methods should trigger connector to send pending requests (or we should send them right on operation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, IMHO, all wait methods should trigger connector to send pending requests (or we should send them right on operation).
I don't think you are right here. We should try to minimize the number of system calls, where possible, because system calls cause context switches and kernel- to user- space transition. This is exactly what this patch is trying to do.
Instead, you propose to add an unconditional system call on each wait
method call, even when it is not necessary and a user has something to process.
Moreover, doing this system call may be useless in the case when a user does not have any other pending requests.
Until you've processed all 10 responses, no new requests will be sent, so you'll observe latency spike after you're done with those 10 responses.
I think the problem you describe should be solved the other way around: APIs that create requests should actively send them rather than postponing the sends to wait
method calls.
14767b4
to
9a9401c
Compare
`Connection`s model shared pointers on `ConnectionImpl` and are intended for external usage by library consumers. Using them internally in the Connector can cause bugs. For instance, we internally store `Connection` objects in the LibevNetProvider, which adds an additional reference to them and prevents them from being deleted, even when all user objects are dead. Also we leak connections in `close`, since we do not erase them from the `m_ReadyToSend` and `m_ReadyToDecode` sets. To fix this, let's internally use `ConnectionImpl` to model weak pointers. We rely on the fact that the lifetime of these weak pointers is tied to the lifetime of the shared user objects. The ideal solution would be to remove the `Connection`<-`ConnectionImpl` constructor, but we still need to return new Connection objects in methods like waitAny, so let's just enforce as a rule that we should not use `Connection` objects internally. While we are here, let's also fix all the formating issues that the linter is reporting for the refactored code. Closes tarantool#140 Closes tarantool#147
Currently, to avoid double-close of a connection we check for the `SS_DEAD` status on its stream. However, the `SS_DEAD` status can be set for a multitude of reasons without the connection being closed or even connected. For instance, we set it if a send/recv failed unrecoverably. To fix this, let's rely on the fact that a connection is open iff the file descriptor of its stream is a valid (i.e., not -1). This approach seems to be portable to the Windows platform too. Closes tarantool#142
Currently, all connection bookkeeping is done opaquely through the network providers which, in turn, also do this bookkeeping opaquely using system interfaces (e.g., libev, epoll). Because of this, we cannot handle cases when waitAny is called and there are no connections (tarantoolgh-51) or when a connection has ready responses (tarantoolgh-132). In order to improve `waitAny` robustness, we need to add connection bookkeeping to Connector. We should move the timer start to the beginning of the waiting loop, since the preceding checking overhead should not be accounted for the waiting time. Closes tarantool#51 Needed for tarantool#132
Currently, `waitAny` does not account for connections that already have ready responses. Let's handle this by going through the set of owned connections and checking them for ready responses. We should do this before the timer start, since the checking overhead should not be accounted for the waiting time. Closes tarantool#132
9a9401c
to
0465dee
Compare
@mkostoevr @drewdzzz I got pissed off by #150, and discovered a bug #151 while fixing it. Please take a look at the two new commits I have added fixing those and at an additional bugfix for #147 that I added in the scope of |
Currently, if we timeout waiting for `connect` using `poll`, we set `connect_errno` to `ETIMEOUT`. However, after we exit the `connect` interruption loop, we unconditionally set to `errno`, which we do not even clear after checking the `connect` result. As a result, the timeout error gets masked by the in-progress connection error. To fix this, let's: 1. Clear `errno` after checking the `connect` result; 2. Check the value of `connect_errno` after exiting the `connect` interrupt loop before setting it to `errno`; 3. Zero out `socket_errno`, `connect_errno` each `addr_info` loop iteration. Closes tarantool#151
`ClientMultithreadTest` fails on MacOS because the `connect`s are timing out. To fix this, let's increase the `connect_timeout` for `ClientMultithreadTest` to 10 seconds. Closes tarantool#150
d19648c
to
a956eeb
Compare
template <class BUFFER, class NetProvider> | ||
void | ||
hasSentBytes(Connection<BUFFER, NetProvider> &conn, size_t bytes) | ||
hasSentBytes(ConnectionImpl<BUFFER, NetProvider> *conn, size_t bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about moving all these functions to ConnectionImpl
?
- It's weird to see non-method functions here at all.
- These utility functions will be hidden in
ConnectionImpl
so that only advanced user will see them (only implementor of its own NetProvider will work withConnectionImpl
directly.
LOG_DEBUG("waitAny() called on connector without connections"); | ||
return std::nullopt; | ||
} | ||
for (auto *conn : m_Connections) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, you are right. I need to think about it.
The problem is:
- You've sent some requests and received 10 responses.
- While processing these responses, you are sending new requests.
- Until you've processed all 10 responses, no new requests will be sent, so you'll observe latency spike after you're done with those 10 responses.
I'm not sure other methods are affected by this problem. But, IMHO, all wait methods should trigger connector to send pending requests (or we should send them right on operation).
This patchset performs a major refactoring of the client code to replace all internal
Connection
usage withConnectionImpl
, and fixes several related bugs.Closes #51
Closes #132
Closes #140
Closes #142
Closes #147
Closes #150
Closes #151