Skip to content

Commit cc668a1

Browse files
yishaihrleon
authored andcommitted
RDMA/mlx5: Fix a race for DMABUF MR which can lead to CQE with error
This patch addresses a potential race condition for a DMABUF MR that can result in a CQE with an error on the UMR QP. During the __mlx5_ib_dereg_mr() flow, the following sequence of calls occurs: mlx5_revoke_mr() mlx5r_umr_revoke_mr() mlx5r_umr_post_send_wait() At this point, the lkey is freed from the hardware's perspective. However, concurrently, mlx5_ib_dmabuf_invalidate_cb() might be triggered by another task attempting to invalidate the MR having that freed lkey. Since the lkey has already been freed, this can lead to a CQE error, causing the UMR QP to enter an error state. To resolve this race condition, the dma_resv_lock() which was hold as part of the mlx5_ib_dmabuf_invalidate_cb() is now also acquired as part of the mlx5_revoke_mr() scope. Upon a successful revoke, we set umem_dmabuf->private which points to that MR to NULL, preventing any further invalidation attempts on its lkey. Fixes: e6fb246 ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()") Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Artemy Kovalyov <[email protected]> Link: https://patch.msgid.link/70617067abbfaa0c816a2544c922e7f4346def58.1738587016.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
1 parent 12d0447 commit cc668a1

File tree

1 file changed

+12
-1
lines changed
  • drivers/infiniband/hw/mlx5

1 file changed

+12
-1
lines changed

drivers/infiniband/hw/mlx5/mr.c

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1550,7 +1550,7 @@ static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
15501550

15511551
dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
15521552

1553-
if (!umem_dmabuf->sgt)
1553+
if (!umem_dmabuf->sgt || !mr)
15541554
return;
15551555

15561556
mlx5r_umr_update_mr_pas(mr, MLX5_IB_UPD_XLT_ZAP);
@@ -2022,11 +2022,16 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
20222022
struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
20232023
struct mlx5_cache_ent *ent = mr->mmkey.cache_ent;
20242024
bool is_odp = is_odp_mr(mr);
2025+
bool is_odp_dma_buf = is_dmabuf_mr(mr) &&
2026+
!to_ib_umem_dmabuf(mr->umem)->pinned;
20252027
int ret = 0;
20262028

20272029
if (is_odp)
20282030
mutex_lock(&to_ib_umem_odp(mr->umem)->umem_mutex);
20292031

2032+
if (is_odp_dma_buf)
2033+
dma_resv_lock(to_ib_umem_dmabuf(mr->umem)->attach->dmabuf->resv, NULL);
2034+
20302035
if (mr->mmkey.cacheable && !mlx5r_umr_revoke_mr(mr) && !cache_ent_find_and_store(dev, mr)) {
20312036
ent = mr->mmkey.cache_ent;
20322037
/* upon storing to a clean temp entry - schedule its cleanup */
@@ -2054,6 +2059,12 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
20542059
mutex_unlock(&to_ib_umem_odp(mr->umem)->umem_mutex);
20552060
}
20562061

2062+
if (is_odp_dma_buf) {
2063+
if (!ret)
2064+
to_ib_umem_dmabuf(mr->umem)->private = NULL;
2065+
dma_resv_unlock(to_ib_umem_dmabuf(mr->umem)->attach->dmabuf->resv);
2066+
}
2067+
20572068
return ret;
20582069
}
20592070

0 commit comments

Comments
 (0)