Skip to content

SROS (SR-SIM) DUAL CPM mgmt IP race condition #3093

@jcpvdm

Description

@jcpvdm

When a SR-SIM CPM container starts up, the following process happens (simplified):

  1. python script reads the mgmt IP from linux eth0 interface,
  2. bof config is populated with the mgmt IP.
  3. sros process is started, and the IP is deleted from linux eth0 interface.

Now... the problem may happen when a node includes two CPM containers (Dual-CPM). Because both CPMs share the same eth0 on the same namespace, if one CPM container reach step 3 before the other CPM finish step 1, then the later CPM won't be able to fetch the mgmt IP address, which results in a broken node.

The issue is more likely to happen on a loaded server, where many containers start concurrently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions