Skip to content

Nginx didn't work as expected, all requests failed with 403 #13920

@kingonion

Description

@kingonion

What happened:
All requests failed with 403 for one replica, while there was no such issue for another replica

<html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx</center>
</body>
</html>

found following logs from ingress-nginx-controller

1757171414218	I0906 15:10:14.218599       7 controller.go:196] "Configuration changes detected, backend reload required"
1757171414276	I0906 15:10:14.276895       7 controller.go:216] "Backend successfully reloaded"
1757171414277	I0906 15:10:14.277043       7 event.go:377] Event(v1.ObjectReference{Kind:"Pod", Namespace:"test", Name:"nginx-ingress-controller-f454d46db-fl5zk", UID:"c7e7ed06-de43-4330-b3ad-51dc290b9494", APIVersion:"v1", ResourceVersion:"1215888925", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
1757171414283	I0906 15:10:14.283794       7 store.go:645] "secret was updated and it is used in ingress annotations. Parsing" secret="test/client.ca.truststore"
1757171414284	2025/09/06 15:10:14 [emerg] 27#27: cannot load certificate "/etc/ingress-controller/ssl/test-client-certificate.pem": PEM_read_bio_X509_AUX() failed (SSL: error:0480006C:PEM routines::no start line:Expecting: TRUSTED CERTIFICATE)

The issue was resolved after deleting the pod, no such issue for the new pod.

What you expected to happen:
Nginx should handle requests normally same with another replica.

our ingress has following annotations and spec

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/auth-tls-pass-certificate-to-upstream: "true"
    nginx.ingress.kubernetes.io/auth-tls-secret: test/client.ca.truststore
    nginx.ingress.kubernetes.io/auth-tls-verify-client: optional
  name: test-ingress
  namespace: test
spec:
  rules:
  - host: <host>
    http:
      paths:
      - backend:
          service:
            name: front
            port:
              number: 8080
        pathType: ImplementationSpecific
  tls:
  - hosts:
    - <host>
    secretName: client-certificate

From my analysis, it seems one of the certificate secret changed and caused all certificate files were wrote again.

https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/controller/store/store.go#L648

https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/controller/store/store.go#L1027

https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/controller/store/backend_ssl.go#L38

https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/controller/store/backend_ssl.go#L76

https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/controller/store/store.go#L958~L1000

while the reload happened due to configuration change from https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/controller/controller.go#L213

It's race condition to me. Nginx read empty certificate (truncated when write the file from

err := os.WriteFile(pemFileName, []byte(sslCert.PemCertKey), file.ReadWriteByUser)
) file and caused issue.

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version):

/etc/nginx $ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.13.1
  Build:         c8ce0d146a53fbdb94848548068001909767e2de
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.27.1

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"32", GitVersion:"v1.32.7", GitCommit:"158eee9fac884b429a92465edd0d88a43f81de34", GitTreeState:"clean", BuildDate:"2025-07-15T18:00:33Z", GoVersion:"go1.23.10", Compiler:"gc", Platform:"linux/amd64"}

How to reproduce this issue:

It's hard to reproduce. We have applied same change to many clusters (above 200), only two had such issue.

Anything else we need to know:

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions