Skip to content

GUID leaks after operator restart due to inconsistent podNetworkID computation (init vs add/delete), breaks multi‑interface pods #207

@wackyspellcaster

Description

@wackyspellcaster

Summary
I believe there’s a bug in how podNetworkID is computed across init/add/delete in pkg/daemon/daemon.go. The ID is built differently in initGUIDPool() vs processNetworkGUID() vs DeletePeriodicUpdate(), which causes GUIDs to leak on operator restarts and to mismatch for pods with multiple interfaces (e.g., annotations like ib-sriov-1, ib-sriov-n). This is based on tag network-operator-v26.1.0-beta.7

Why this matters
If the operator restarts or crashes, the GUIDs that were allocated can’t be matched to the pod on delete, so they don’t get released. In a busy cluster this accumulates. The mismatch also shows up for multiple interfaces because init uses a global index/network.Name, while add/delete use different identifiers.

Evidence (current code)

  • initGUIDPool() uses:
    • podNetworkID := string(pod.UID) + network.Name
    • No _ separator, no interface identity, no namespace, no stable per‑interface key.
    • pkg/daemon/daemon.go around line 897.
  • processNetworkGUID() uses:
    • GeneratePodNetworkInterfaceID(pod, networkID, interfaceName)
    • Includes interface name or idx_i.
    • pkg/daemon/daemon.go around lines 439–447.
  • DeletePeriodicUpdate() uses:
    • GeneratePodNetworkID(pod, networkName)
    • No interface identity.
    • pkg/daemon/daemon.go around line 746.

So the key in guidPodNetworkMap is inconsistent across code paths.

Looping difference (init vs add/delete)
There’s also a structural mismatch in how the code iterates networks:

  • initGUIDPool() iterates all networks in the pod annotation in a single pass:

    networks, _ := ParsePodNetworkAnnotation(...)
    for _, network := range networks {
        podNetworkID := string(pod.UID) + network.Name
        ...
    }

    This does not group by network name and does not compute per‑interface identity (InterfaceRequest or idx_i).

  • Add/Delete paths first filter to the given network name, then loop only those matching entries:

    matchingNetworks := GetAllPodNetworks(networks, networkName)
    for i, network := range matchingNetworks {
        interfaceName := network.InterfaceRequest
        if interfaceName == "" { interfaceName = fmt.Sprintf("idx_%d", i) }
        ...
    }

    Here, the index i is per‑network‑name, not global across the pod’s entire annotation list.

So even if init used an index (it doesn’t today), it would be a different index domain (global vs per‑network), which makes per‑interface identities inconsistent across restart and causes GUIDs to be unreleasable in delete.

Impact

  • GUID leak after restart: init seeds guidPodNetworkMap with a key that delete will never match, so GUIDs are not released.
  • Multi‑interface network annotations: delete path cannot match per‑interface entries, so it won’t release GUIDs when a pod with repeated attachments is deleted.

Suggested fix

  • Compute a single canonical podNetworkID across init/add/delete:
    GeneratePodNetworkInterfaceID(pod, networkID, interfaceName)
    where interfaceName is InterfaceRequest or idx_N (per‑network occurrence).
  • Ensure initGUIDPool() uses the same logic and per‑network occurrence index.
  • Ensure delete uses per‑interface IDs too (not just GeneratePodNetworkID).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions