-
Notifications
You must be signed in to change notification settings - Fork 405
[Bugfix]Fix 'invalid argument' error when setting QP state to INIT vi… #722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…a ibv_modify_qp on some rdma devices.
|
@hezhiqian01 Could you make it specific for Intel E810 RDMA NIC? |
|
@hezhiqian01 An alternative solution is to check whether current is in RESET state. Thanks for your feedback. |
|
@stmatengss My RDMA device is the Intel E810. I'm not sure whether similar issues occur on other RDMA devices, but what is certain is that the QP is already in the RESET state upon creation. Therefore, there's no need to set it to RESET again. If possible, this step can be removed to prevent such errors. |
After testing another RDMA device, I still believe we cannot skip this step (change to RESET status). As @alogfans mentioned, could you check the status first and then modify_qp? |
|
More specifically, you can check the current state using |
|
@hezhiqian01 Could you update the codes based on the suggestions from @alogfans? Thank you for your contribution! |
| // Any state -> RESET (removed) | ||
| // Modifying the QP to RESET is not necessary here, as it is already in the | ||
| // RESET state. This avoids the 'invalid argument' issue on some RDMA | ||
| // devices, such as the Intel E810. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Any state -> RESET (removed) | |
| // Modifying the QP to RESET is not necessary here, as it is already in the | |
| // RESET state. This avoids the 'invalid argument' issue on some RDMA | |
| // devices, such as the Intel E810. | |
| // 1. Query QP State | |
| ibv_qp_attr cur_attr; | |
| ibv_qp_init_attr init_attr; | |
| int cur_state = IBV_QPS_ERR; | |
| if (ibv_query_qp(qp, &cur_attr, IBV_QP_STATE, &init_attr) == 0) { | |
| cur_state = cur_attr.qp_state; | |
| } else { | |
| PLOG(WARNING) << "[Handshake] ibv_query_qp failed, proceed to RESET anyway"; | |
| } | |
| // 2. If current state is not reset | |
| if (cur_state != IBV_QPS_RESET) { | |
| ibv_qp_attr attr{}; | |
| attr.qp_state = IBV_QPS_RESET; | |
| int ret = ibv_modify_qp(qp, &attr, IBV_QP_STATE); | |
| if (ret) { | |
| std::string message = "Failed to modify QP to RESET"; | |
| PLOG(ERROR) << "[Handshake] " << message; | |
| if (reply_msg) *reply_msg = message + ": " + strerror(errno); | |
| return ERR_ENDPOINT; | |
| } | |
| } |
This PR fix the issue as below:
When using the Intel E810 RDMA NIC, calling the ibv_modify_qp function to change the QP state to INIT results in an 'Invalid argument (22)' error.
The error might be caused by the QP already being in the RESET state, and setting it to RESET again could lead to an internal state inconsistency.
The solution is to remove the step that sets the QP to RESET.