forked from meta-pytorch/monarch
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit 773266a
Allow NetRx to explicitly reject connections from NetTx (meta-pytorch#962)
Summary:
Pull Request resolved: meta-pytorch#962
The purpose of this diff is to handle the following scenario:
1. Process A starts serving a NetRx.
2. Process B creates a NetTx that connects to process A's NetRx.
3. B sends a few messages to A, and the messages are acked.
4. Process A dies/is killed, while B stays alive.
5. A new Process C starts serving a NetRx on the same channel as from step 1.
6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*.
7. B sends messages to C, starting from where it left off with A.
8. C rejects all of B's messages because of invalid sequence numbers.
9. B's NetTx eventually times out after a long time with no acks.
This diff expedites the `NetTx` failure from step 9 by allowing `NetRx` to explicitly reject a connection when it sees an out-of-sequence message. Instead of a simple `u64` ack, the `NetRx` response is now an enum with two variants: `Reject` and `Ack(u64)`. The enum is serialized with bincode.
ghstack-source-id: 305073647
Reviewed By: mariusae
Differential Revision: D80640441
fbshipit-source-id: 7a32f6538081091e0e852f86427b63f58301c1741 parent 7fd1028 commit 773266aCopy full SHA for 773266a
File tree
Expand file treeCollapse file tree
1 file changed
+200
-77
lines changedFilter options
- hyperactor/src/channel
Expand file treeCollapse file tree
1 file changed
+200
-77
lines changed
0 commit comments