Skip to content

Commit ae405ea

Browse files
jiriwiesnergregkh
authored andcommitted
bonding: fix active-backup failover for current ARP slave
[ Upstream commit 0410d07 ] When the ARP monitor is used for link detection, ARP replies are validated for all slaves (arp_validate=3) and fail_over_mac is set to active, two slaves of an active-backup bond may get stuck in a state where both of them are active and pass packets that they receive to the bond. This state makes IPv6 duplicate address detection fail. The state is reached thus: 1. The current active slave goes down because the ARP target is not reachable. 2. The current ARP slave is chosen and made active. 3. A new slave is enslaved. This new slave becomes the current active slave and can reach the ARP target. As a result, the current ARP slave stays active after the enslave action has finished and the log is littered with "PROBE BAD" messages: > bond0: PROBE: c_arp ens10 && cas ens11 BAD The workaround is to remove the slave with "going back" status from the bond and re-enslave it. This issue was encountered when DPDK PMD interfaces were being enslaved to an active-backup bond. I would be possible to fix the issue in bond_enslave() or bond_change_active_slave() but the ARP monitor was fixed instead to keep most of the actions changing the current ARP slave in the ARP monitor code. The current ARP slave is set as inactive and backup during the commit phase. A new state, BOND_LINK_FAIL, has been introduced for slaves in the context of the ARP monitor. This allows administrators to see how slaves are rotated for sending ARP requests and attempts are made to find a new active slave. Fixes: b2220ca ("bonding: refactor ARP active-backup monitor") Signed-off-by: Jiri Wiesner <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
1 parent d676b22 commit ae405ea

File tree

1 file changed

+16
-2
lines changed

1 file changed

+16
-2
lines changed

drivers/net/bonding/bond_main.c

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2773,6 +2773,9 @@ static int bond_ab_arp_inspect(struct bonding *bond)
27732773
if (bond_time_in_interval(bond, last_rx, 1)) {
27742774
bond_propose_link_state(slave, BOND_LINK_UP);
27752775
commit++;
2776+
} else if (slave->link == BOND_LINK_BACK) {
2777+
bond_propose_link_state(slave, BOND_LINK_FAIL);
2778+
commit++;
27762779
}
27772780
continue;
27782781
}
@@ -2883,6 +2886,19 @@ static void bond_ab_arp_commit(struct bonding *bond)
28832886

28842887
continue;
28852888

2889+
case BOND_LINK_FAIL:
2890+
bond_set_slave_link_state(slave, BOND_LINK_FAIL,
2891+
BOND_SLAVE_NOTIFY_NOW);
2892+
bond_set_slave_inactive_flags(slave,
2893+
BOND_SLAVE_NOTIFY_NOW);
2894+
2895+
/* A slave has just been enslaved and has become
2896+
* the current active slave.
2897+
*/
2898+
if (rtnl_dereference(bond->curr_active_slave))
2899+
RCU_INIT_POINTER(bond->current_arp_slave, NULL);
2900+
continue;
2901+
28862902
default:
28872903
netdev_err(bond->dev, "impossible: new_link %d on slave %s\n",
28882904
slave->link_new_state, slave->dev->name);
@@ -2932,8 +2948,6 @@ static bool bond_ab_arp_probe(struct bonding *bond)
29322948
return should_notify_rtnl;
29332949
}
29342950

2935-
bond_set_slave_inactive_flags(curr_arp_slave, BOND_SLAVE_NOTIFY_LATER);
2936-
29372951
bond_for_each_slave_rcu(bond, slave, iter) {
29382952
if (!found && !before && bond_slave_is_up(slave))
29392953
before = slave;

0 commit comments

Comments
 (0)