-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Summary
I believe there’s a bug in how podNetworkID is computed across init/add/delete in pkg/daemon/daemon.go. The ID is built differently in initGUIDPool() vs processNetworkGUID() vs DeletePeriodicUpdate(), which causes GUIDs to leak on operator restarts and to mismatch for pods with multiple interfaces (e.g., annotations like ib-sriov-1, ib-sriov-n). This is based on tag network-operator-v26.1.0-beta.7
Why this matters
If the operator restarts or crashes, the GUIDs that were allocated can’t be matched to the pod on delete, so they don’t get released. In a busy cluster this accumulates. The mismatch also shows up for multiple interfaces because init uses a global index/network.Name, while add/delete use different identifiers.
Evidence (current code)
initGUIDPool()uses:podNetworkID := string(pod.UID) + network.Name- No
_separator, no interface identity, no namespace, no stable per‑interface key. pkg/daemon/daemon.goaround line 897.
processNetworkGUID()uses:GeneratePodNetworkInterfaceID(pod, networkID, interfaceName)- Includes interface name or
idx_i. pkg/daemon/daemon.goaround lines 439–447.
DeletePeriodicUpdate()uses:GeneratePodNetworkID(pod, networkName)- No interface identity.
pkg/daemon/daemon.goaround line 746.
So the key in guidPodNetworkMap is inconsistent across code paths.
Looping difference (init vs add/delete)
There’s also a structural mismatch in how the code iterates networks:
-
initGUIDPool()iterates all networks in the pod annotation in a single pass:networks, _ := ParsePodNetworkAnnotation(...) for _, network := range networks { podNetworkID := string(pod.UID) + network.Name ... }
This does not group by network name and does not compute per‑interface identity (
InterfaceRequestoridx_i). -
Add/Delete paths first filter to the given network name, then loop only those matching entries:
matchingNetworks := GetAllPodNetworks(networks, networkName) for i, network := range matchingNetworks { interfaceName := network.InterfaceRequest if interfaceName == "" { interfaceName = fmt.Sprintf("idx_%d", i) } ... }
Here, the index
iis per‑network‑name, not global across the pod’s entire annotation list.
So even if init used an index (it doesn’t today), it would be a different index domain (global vs per‑network), which makes per‑interface identities inconsistent across restart and causes GUIDs to be unreleasable in delete.
Impact
- GUID leak after restart: init seeds
guidPodNetworkMapwith a key that delete will never match, so GUIDs are not released. - Multi‑interface network annotations: delete path cannot match per‑interface entries, so it won’t release GUIDs when a pod with repeated attachments is deleted.
Suggested fix
- Compute a single canonical
podNetworkIDacross init/add/delete:
GeneratePodNetworkInterfaceID(pod, networkID, interfaceName)
whereinterfaceNameisInterfaceRequestoridx_N(per‑network occurrence). - Ensure
initGUIDPool()uses the same logic and per‑network occurrence index. - Ensure delete uses per‑interface IDs too (not just
GeneratePodNetworkID).