Skip to content

Commit e6d71b4

Browse files
D-Wythedavem330
authored andcommitted
net/smc: avoid data corruption caused by decline
We found a data corruption issue during testing of SMC-R on Redis applications. The benchmark has a low probability of reporting a strange error as shown below. "Error: Protocol error, got "\xe2" as reply type byte" Finally, we found that the retrieved error data was as follows: 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 It is quite obvious that this is a SMC DECLINE message, which means that the applications received SMC protocol message. We found that this was caused by the following situations: client server ¦ clc proposal -------------> ¦ clc accept <------------- ¦ clc confirm -------------> wait llc confirm send llc confirm ¦failed llc confirm ¦ x------ (after 2s)timeout wait llc confirm rsp wait decline (after 1s) timeout (after 2s) timeout ¦ decline --------------> ¦ decline <-------------- As a result, a decline message was sent in the implementation, and this message was read from TCP by the already-fallback connection. This patch double the client timeout as 2x of the server value, With this simple change, the Decline messages should never cross or collide (during Confirm link timeout). This issue requires an immediate solution, since the protocol updates involve a more long-term solution. Fixes: 0fb0b02 ("net/smc: adapt SMC client code to use the LLC flow") Signed-off-by: D. Wythe <[email protected]> Reviewed-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 84d2db9 commit e6d71b4

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

net/smc/af_smc.c

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -598,8 +598,12 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc)
598598
struct smc_llc_qentry *qentry;
599599
int rc;
600600

601-
/* receive CONFIRM LINK request from server over RoCE fabric */
602-
qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
601+
/* Receive CONFIRM LINK request from server over RoCE fabric.
602+
* Increasing the client's timeout by twice as much as the server's
603+
* timeout by default can temporarily avoid decline messages of
604+
* both sides crossing or colliding
605+
*/
606+
qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
603607
SMC_LLC_CONFIRM_LINK);
604608
if (!qentry) {
605609
struct smc_clc_msg_decline dclc;

0 commit comments

Comments
 (0)