-
Notifications
You must be signed in to change notification settings - Fork 65
Make NetTx<->NetRx handshake two-way to prevent errant reconnections #793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D79607092 |
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
8302f53
to
de5a4da
Compare
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
de5a4da
to
9b7a8c8
Compare
This pull request was exported from Phabricator. Differential Revision: D79607092 |
…eta-pytorch#793) Summary: Pull Request resolved: meta-pytorch#793 The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
9b7a8c8
to
070b3ad
Compare
This pull request was exported from Phabricator. Differential Revision: D79607092 |
…eta-pytorch#793) Summary: Pull Request resolved: meta-pytorch#793 The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
c817b33
to
2241ac2
Compare
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
2241ac2
to
2794dd2
Compare
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
This pull request was exported from Phabricator. Differential Revision: D79607092 |
…eta-pytorch#793) Summary: Pull Request resolved: meta-pytorch#793 The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
2794dd2
to
7667e2c
Compare
This pull request was exported from Phabricator. Differential Revision: D79607092 |
…eta-pytorch#793) Summary: Pull Request resolved: meta-pytorch#793 The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
7667e2c
to
03c06e8
Compare
03c06e8
to
efce7a5
Compare
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Reviewed By: mariusae Differential Revision: D79607092
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Reviewed By: mariusae Differential Revision: D79607092
efce7a5
to
b2495c2
Compare
This pull request was exported from Phabricator. Differential Revision: D79607092 |
…eta-pytorch#793) Summary: Pull Request resolved: meta-pytorch#793 The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Reviewed By: mariusae Differential Revision: D79607092
b2495c2
to
4f2b17b
Compare
This pull request was exported from Phabricator. Differential Revision: D79607092 |
…eta-pytorch#793) Summary: Pull Request resolved: meta-pytorch#793 The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Reviewed By: mariusae Differential Revision: D79607092
31d9002
to
1c87225
Compare
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
…eta-pytorch#793) Summary: The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
This pull request was exported from Phabricator. Differential Revision: D79607092 |
…eta-pytorch#793) Summary: Pull Request resolved: meta-pytorch#793 The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
1c87225
to
2711802
Compare
…eta-pytorch#793) Summary: Pull Request resolved: meta-pytorch#793 The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092 Reviewed By: mariusae
…eta-pytorch#793) Summary: Pull Request resolved: meta-pytorch#793 The purpose of this diff is to handle the following scenario: 1. Process A starts serving a NetRx. 2. Process B creates a NetTx that connects to process A's NetRx. 3. B sends a few messages to A, and the messages are acked. 4. Process A dies/is killed, while B stays alive. 5. A new Process C starts serving a NetRx on the same channel as from step 1. 6. B's NetTx connects to C's NetRx, *with no way of knowing it has connected to a different process than before*. 7. B sends messages to C, starting from where it left off with A. 8. C rejects all of B's messages because of invalid sequence numbers. 9. B's NetTx eventually times out after a long time with no acks. In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages. Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller. Differential Revision: D79607092
This pull request was exported from Phabricator. Differential Revision: D79607092 |
2711802
to
de46bac
Compare
Summary:
The purpose of this diff is to handle the following scenario:
In order to distinguish among connections from different NetTx instances to the same NetRx instance, each NetTx generates a random unique session id. This session id gets sent as part of an initial handshake from NetTx -> NetRx before the NetTx starts sending normal messages.
Currently, though, NetTx doesn't wait for any handshake before starting to send messages. To resolve the issue described above, this diff introduces a global (per-process) "rx session id". When a NetTx first connects to a NetRx, the NetRx responds with its rx session id as part of the handshake. The NetTx waits for the handshake response and extracts the rx session id. If this is the first time the NetTx is connecting, the NetTx stores the rx session id. On subsequent connection attempts, the NetTx will validate the rx session id it receives from the handshake against the rx session id it previously stored; if there is a mismatch, the NetTx returns the appropriate error to its caller.
Differential Revision: D79607092