-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Labels
Description
As a general rule, all state transitions inside TClientEndpoint are performed by a single thread. But CQ could potentially receive RDMA_CM_EVENT_DISCONNECTED at the same time CM encounters an error. Both would invoke Disconnect as a result
Example
2026-02-14T07:24:37.434707Z :BLOCKSTORE_RDMA DEBUG: start client
2026-02-14T07:24:37.434947Z :BLOCKSTORE_RDMA INFO: [10347107252814959233] start endpoint :: [send_magic=8F984F39 recv_magic=A0E75281]
2026-02-14T07:24:37.435243Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] Disconnected -> ResolvingAddress
2026-02-14T07:24:37.435289Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] resolve server address
2026-02-14T07:24:37.448673Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] received RDMA_CM_EVENT_ADDR_RESOLVED
2026-02-14T07:24:37.448733Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] resolve route
2026-02-14T07:24:37.448765Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] ResolvingAddress -> ResolvingRoute
2026-02-14T07:24:37.448808Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] received RDMA_CM_EVENT_ROUTE_RESOLVED
2026-02-14T07:24:37.448846Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] ResolvingRoute -> Connecting
2026-02-14T07:24:37.448969Z :BLOCKSTORE_RDMA INFO: [10347107252814959233] connect [private_data=0x00007FF81CAF5CF0 private_data_len=16 responder_resources=255 initiator_depth=255 flow_control=1 retry_count=7 rnr_retry_count=7 srq=0 qp_num=0]
2026-02-14T07:24:37.449008Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] received RDMA_CM_EVENT_ESTABLISHED
2026-02-14T07:24:37.449054Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] validate [private_data=0x0000602000006210 private_data_len=16 responder_resources=255 initiator_depth=255 flow_control=1 retry_count=7 rnr_retry_count=7 srq=0 qp_num=0]
2026-02-14T07:24:37.449089Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] Connecting -> Connected
2026-02-14T07:24:37.449145Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] RECV A0E75281.0.9 posted
2026-02-14T07:24:37.449207Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] RECV A0E75281.0.8 posted
2026-02-14T07:24:37.449243Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] RECV A0E75281.0.7 posted
2026-02-14T07:24:37.449283Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] RECV A0E75281.0.6 posted
2026-02-14T07:24:37.449319Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] RECV A0E75281.0.5 posted
2026-02-14T07:24:37.449351Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] RECV A0E75281.0.4 posted
2026-02-14T07:24:37.449387Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] RECV A0E75281.0.3 posted
2026-02-14T07:24:37.449423Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] RECV A0E75281.0.2 posted
2026-02-14T07:24:37.449453Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] RECV A0E75281.0.1 posted
2026-02-14T07:24:37.449483Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] RECV A0E75281.0.0 posted
2026-02-14T07:24:37.449599Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] received RDMA_CM_EVENT_DISCONNECTED
2026-02-14T07:24:37.449799Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] SEND A0E75281.0.9 completed with IBV_WC_WR_FLUSH_ERR
2026-02-14T07:24:37.449889Z :BLOCKSTORE_RDMA INFO: [10347107252814959233] disconnect
2026-02-14T07:24:37.449948Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] Connected -> Disconnecting
2026-02-14T07:24:37.450003Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] flush queues
2026-02-14T07:24:37.450074Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] SEND A0E75281.0.8 completed with IBV_WC_WR_FLUSH_ERR
2026-02-14T07:24:37.450124Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] SEND A0E75281.0.7 completed with IBV_WC_WR_FLUSH_ERR
2026-02-14T07:24:37.450176Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] SEND A0E75281.0.6 completed with IBV_WC_WR_FLUSH_ERR
2026-02-14T07:24:37.450230Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] SEND A0E75281.0.5 completed with IBV_WC_WR_FLUSH_ERR
2026-02-14T07:24:37.450285Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] SEND A0E75281.0.4 completed with IBV_WC_WR_FLUSH_ERR
2026-02-14T07:24:37.450335Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] SEND A0E75281.0.3 completed with IBV_WC_WR_FLUSH_ERR
2026-02-14T07:24:37.450385Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] SEND A0E75281.0.2 completed with IBV_WC_WR_FLUSH_ERR
2026-02-14T07:24:37.450463Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] SEND A0E75281.0.1 completed with IBV_WC_WR_FLUSH_ERR
2026-02-14T07:24:37.450530Z :BLOCKSTORE_RDMA TRACE: [10347107252814959233] SEND A0E75281.0.0 completed with IBV_WC_WR_FLUSH_ERR
2026-02-14T07:24:37.450583Z :BLOCKSTORE_RDMA DEBUG: [10347107252814959233] Disconnecting -> Disconnected
2026-02-14T07:24:37.456892Z :BLOCKSTORE_RDMA INFO: [10347107252814959233] disconnect
VERIFY failed (2026-02-14T07:24:37.456976Z): invalid state transition (new: Disconnecting, expected: Connected, actual: Disconnected)
cloud/blockstore/libs/rdma/impl/client.cpp:617
ChangeState(): requirement actualState == expectedState failed
NPrivate::InternalPanicImpl(int, char const*, char const*, int, int, int, TBasicStringBuf<char, std::__y1::char_traits<char>>, char const*, unsigned long)+1217 (0x15E0E41)
NPrivate::Panic(NPrivate::TStaticBuf const&, int, char const*, char const*, char const*, ...)+705 (0x15CBAB1)
??+0 (0x3BAAD23)
??+0 (0x3BA6E22)
??+0 (0x3B9B819)
??+0 (0x3BACCF0)
??+0 (0x3BAD3FD)
??+0 (0x15F0C04)
??+0 (0x7FF822094AC3)
??+0 (0x7FF8221268D0)
Reactions are currently unavailable