Skip to content

Nil pointer panic in BackendTrafficPolicy when HTTPRoute has invalid cross-namespace reference #7847

@nicholasklem

Description

@nicholasklem

Description:

The Envoy Gateway controller panics with a nil pointer dereference when processing BackendTrafficPolicies for HTTPRoutes that have cross-namespace backend references without a matching ReferenceGrant.

The invalid reference is correctly detected and logged as an error, but the controller then panics instead of gracefully skipping the route. This causes the gateway-api reconciliation loop to restart repeatedly (~every 5 seconds), which can delay xDS updates to Envoy proxies.

Expected behavior: When an HTTPRoute has an invalid cross-namespace reference, the controller should log the error, skip applying BackendTrafficPolicy features to that route, and continue processing other routes without panicking.


Repro steps:

  1. Create an HTTPRoute in namespace envoy-gateway-mtls-app that references a Service in namespace default:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: reflector
  namespace: envoy-gateway-mtls-app
spec:
  parentRefs:
    - name: gateway-mtls-reflector
      namespace: envoy-gateway-mtls-app
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: "/"
      backendRefs:
        - name: reflector
          namespace: default  # cross-namespace reference
          port: 80
  1. Do NOT create a ReferenceGrant allowing this cross-namespace reference

  2. Have any BackendTrafficPolicy in the cluster (doesn't need to target this route)

  3. Observe controller logs - panic occurs on every reconciliation


Environment:

  • Envoy Gateway version: v1.6.0
  • Kubernetes: GKE 1.31
  • Go version (from stack trace): 1.25.3
  • Single controller managing multiple Gateways across namespaces

Logs:

First the ReferenceGrant error is logged:

ERROR provider kubernetes/routes.go:269 failed to process BackendRef for HTTPRoute
{"runner": "provider",
 "httpRoute": {"name":"reflector","namespace":"envoy-gateway-mtls-app"},
 "backendRef": {"group":"","kind":"Service","name":"reflector","namespace":"default","port":80},
 "error": "no matching ReferenceGrants found: from HTTPRoute/envoy-gateway-mtls-app to Service/default"}

Then immediately the panic:

ERROR watchable message/watchutil.go:57 observed a panic
{"runner": "gateway-api",
 "error": "runtime error: invalid memory address or nil pointer dereference",
 "stackTrace": "goroutine 216 [running]:
runtime/debug.Stack()
    /opt/hostedtoolcache/go/1.25.3/x64/src/runtime/debug/stack.go:26 +0x5e
github.com/envoyproxy/gateway/internal/message.handleWithCrashRecovery[...].func1()
    /home/runner/work/gateway/gateway/internal/message/watchutil.go:58 +0x1fe
panic({0x3552d00?, 0xb615380?})
    /opt/hostedtoolcache/go/1.25.3/x64/src/runtime/panic.go:783 +0x132
github.com/envoyproxy/gateway/internal/gatewayapi.(*Translator).applyTrafficFeatureToRoute(...)
    /home/runner/work/gateway/gateway/internal/gatewayapi/backendtrafficpolicy.go:765 +0x768
github.com/envoyproxy/gateway/internal/gatewayapi.(*Translator).translateBackendTrafficPolicyForRoute(...)
    /home/runner/work/gateway/gateway/internal/gatewayapi/backendtrafficpolicy.go:635 +0x2ca
github.com/envoyproxy/gateway/internal/gatewayapi.(*Translator).processBackendTrafficPolicyForRoute(...)
    /home/runner/work/gateway/gateway/internal/gatewayapi/backendtrafficpolicy.go:301 +0xa0b
github.com/envoyproxy/gateway/internal/gatewayapi.(*Translator).ProcessBackendTrafficPolicies(...)
    /home/runner/work/gateway/gateway/internal/gatewayapi/backendtrafficpolicy.go:107 +0x197c
github.com/envoyproxy/gateway/internal/gatewayapi.(*Translator).Translate(...)
    /home/runner/work/gateway/gateway/internal/gatewayapi/translator.go:284 +0x848
github.com/envoyproxy/gateway/internal/gatewayapi/runner.(*Runner).subscribeAndTranslate.func1(...)
    /home/runner/work/gateway/gateway/internal/gatewayapi/runner/runner.go:176 +0x571"}

Workaround: Create a ReferenceGrant to allow the cross-namespace reference, or move the HTTPRoute and Service to the same namespace.

Observability:

The panic is observable via the watchable_panics_recovered_total metric:

rate(watchable_panics_recovered_total{runner="gateway-api", status="failure"}[5m]) > 0

Note that standard controller-runtime metrics don't capture this panic:

  • controller_runtime_reconcile_panics_total stays at 0 (different code path)
  • controller_runtime_reconcile_errors_total stays at 0
  • Pod does not restart

The watchable_panics_recovered_total metric only increments when reconciliation is triggered (e.g., on resource changes). A cluster can be in a broken steady state with a flat counter if no changes occur.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions