Skip to content

Concurrent map access in GetFQDNCache can crash antrea-agent #7793

@Ady0333

Description

@Ady0333

Describe the bug

Controller.GetFQDNCache() iterates over fqdnController.dnsEntryCache without acquiring fqdnSelectorMutex, while other code paths (such as DNS response handling and FQDN selector updates) modify the same map under that mutex. Because Go maps are not safe for concurrent read/write access, this can result in a fatal runtime error and terminate the antrea-agent process.


To Reproduce

  1. Deploy an AntreaClusterNetworkPolicy with an FQDN rule (for example: matchName: "*.example.com").
  2. Ensure selected Pods generate DNS traffic so that DNS responses are being processed by the agent.
  3. While DNS responses are being handled, run:
    antctl get fqdn-cache
    or query the /fqdncache API endpoint.
  4. Under concurrent DNS activity, the agent may crash with:
    fatal error: concurrent map read and map write

This can occur during normal cluster operation when FQDN policies are active and the FQDN cache is queried.


Expected

Access to dnsEntryCache should be consistently synchronized using fqdnSelectorMutex, preventing concurrent read/write on the underlying map.


Actual behavior

GetFQDNCache() directly ranges over dnsEntryCache without holding fqdnSelectorMutex, while other functions such as onDNSResponse() and cleanupFQDNSelectorItem() modify the same map under lock. This introduces a concurrent map access path and can crash the antrea-agent process.


Versions:

  • Antrea version: current main branch (observed in latest code)
  • Kubernetes version: any
  • Container runtime: any
  • Linux kernel version: any
  • OVS kernel module: any

This issue is code-level and not environment-specific.


Additional context

All other accessors of dnsEntryCache use fqdnSelectorMutex, but GetFQDNCache() does not. Since this method is invoked from the agent API handler, it can execute concurrently with DNS response processing goroutines, introducing a crash path under normal usage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions