Skip to content

Commit b6e0b6b

Browse files
shayshdSaeed Mahameed
authored andcommitted
net/mlx5: Fix fatal error handling during device load
Currently, in case of fatal error during mlx5_load_one(), we cannot enter error state until mlx5_load_one() is finished, what can take several minutes until commands will get timeouts, because these commands can't be processed due to the fatal error. Fix it by setting dev->state as MLX5_DEVICE_STATE_INTERNAL_ERROR before requesting the lock. Fixes: c1d4d2e ("net/mlx5: Avoid calling sleeping function by the health poll thread") Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
1 parent 42ea9f1 commit b6e0b6b

File tree

1 file changed

+11
-3
lines changed
  • drivers/net/ethernet/mellanox/mlx5/core

1 file changed

+11
-3
lines changed

drivers/net/ethernet/mellanox/mlx5/core/health.c

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -192,15 +192,23 @@ static bool reset_fw_if_needed(struct mlx5_core_dev *dev)
192192

193193
void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force)
194194
{
195+
bool err_detected = false;
196+
197+
/* Mark the device as fatal in order to abort FW commands */
198+
if ((check_fatal_sensors(dev) || force) &&
199+
dev->state == MLX5_DEVICE_STATE_UP) {
200+
dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
201+
err_detected = true;
202+
}
195203
mutex_lock(&dev->intf_state_mutex);
196-
if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
197-
goto unlock;
204+
if (!err_detected && dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
205+
goto unlock;/* a previous error is still being handled */
198206
if (dev->state == MLX5_DEVICE_STATE_UNINITIALIZED) {
199207
dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
200208
goto unlock;
201209
}
202210

203-
if (check_fatal_sensors(dev) || force) {
211+
if (check_fatal_sensors(dev) || force) { /* protected state setting */
204212
dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
205213
mlx5_cmd_flush(dev);
206214
}

0 commit comments

Comments
 (0)