-
Notifications
You must be signed in to change notification settings - Fork 5
Fixing leaked RPC authentication error #883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing leaked RPC authentication error #883
Conversation
|
So far I have made some changes which should help the resources to be freed upon an abort. Basically, this should again help the agent to shut down without holding itself open for too long. I will merge these changes alongside the ones in the issue. |
|
Theoretically, reworking network authentication to be duplex streams seems to be the more reliable approach here. I will rework the network authentication step for this. Sometimes, if the authentication is rejected, then the node will not re-attempt any connections at all. There needs to be retry limit and maybe some nodes like the seednode can have a perpetual re-authentication attempt. I will make an issue tracking this spec, and after the rework, might just implement it in this PR as well. |
|
We are still using unary calls? |
|
I'm still reading and understanding the code, so my understanding might not be complete yet. Yes. At the time of adding a new connection, both nodes make a RPC call to the other node with a unary call to the authentication handler, which triggers the reverse authentication on the receiver node. If both nodes return a successful reverse authentication response, then the connection becomes authenticated. The trigger for forward authentication happens simultaneously when the connection gets added. However, they both can end at different times. This can lead to a race condition where the connection is considered authenticated from a node, and sends RPC requests to the node, but the other node has not yet completed the authorisation, leading to this issue. With the duplex streaming rework, this would be more like a handshake instead, and subsequent code will only run after the authentication has either been successful or has failed. |
|
You should review your understanding with @tegefaulkes first. |
411b3c6 to
de01723
Compare
|
This is my plan on implementing the duplex stream for authentication. In the RPC method for authentication, the connecting side will call this. It will give a forward auth message token, which the other node will apply the reverse auth to verify the authorisation. Then, after accepting, the node will return another token from the other node which will be reverse-authenticated by the current node. Once both nodes have returned a success authentication result, then the connection will be authenticated. This prevents any race conditions which can happen from state desync, as the RPC method can only resolve once both the sides have authenticated their connections. |
Can this be done more optimally? Is there a more efficient way to achieve this with minimal back and forth? Is there some crypto tricks here to use - including taking advantage of context of the comms? |
What do you mean by crypto tricks? I asked brian and even he was unsure about how we can do this more optimally. At minimum, we need the following steps in the caller side:
The RPC handler also does similar steps.
This is the minimum requirement of this communication. How else would you propose we handle this? |
|
https://github.com/MatrixAI/Polykey/actions/runs/14396425288/job/40372816693 @brynblack the linting job fails here, but the workflow still passes. This shouldn't happen. |
|
This PR is otherwise done. I have asked Brian to do a final review before merging it. |
|
Is this before or after esm migration? |
Do you mean this issue or the rpc version bump? This issue was encountered prior to ESM migration. The RPC version bump is after ESM. |
tegefaulkes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small stuff but otherwise good.
chore: rebased onto staging for esm fix: build wip: cleaning up duplex auth handler [ci skip]
chore: updated documentation fix: lint fix: import order fix: all tests now passing fix: auth handler hanging until timeout if errored
2ade4ec to
7d4506e
Compare
|
I have addressed all review comments and added all the required features. I have also rebased it to latest staging, so this PR is now basically done. After the checks pass, this can be merged. |



Description
Sometimes, it is possible for a system to finish authentication before another system, and it can start sending RPC messages before the other system has fully processed the authentication. This will crash the program. This PR aims to resolve the crash.
Issues Fixed
Tasks
Final checklist