You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am unsure whether I should report a bug or ask for help, but as it seems no one else is experiencing this issue, I will assume I did something wrong.
About 2 weeks ago I installed linkerd stable-2.11.4 on a k8s cluster v1.23.3 using helm 3.5.1.
I followed the processes outlined here and here to automate Rotating Control Plane TLS Credentials and Rotating Webhook TLS Credentials since we use cert-manager (1.6.2) on the cluster.
I enabled auto-injection on a few pods and namespaces and it was working great until yesterday.
Symptoms
The logs from the linkerd-proxy container sidecar inject in one of our pods:
vem {"level":"info","time":1661181478,"message":"commit: d00c1d3, build time: 2022-08-09_07:33:41, release: 1.1.2-alpha"}
vem ⇨ http server started on :8080
vem {"level":"debug","time":1661181507,"message":"Sub_query is : ' Nodes (func: eq(type,namespace)) {uid}'"}
vem {"level":"debug","time":1661181507,"message":"The query is : ' query interface( $uid_target: string, $uid_source: string,$sid: string, $ns: string, $st: string, $tid: string, $tt: string, $sub_type: string, $label:string, $subtype_source:string, $offset: int, $first: int,$attr: string){ Nodes (func: eq(type,namespace)) {uid}} '"}
vem {"level":"debug","time":1661181507,"message":"Sub_query is : ' Nodes (func: eq(type,namespace)) {uid}'"}
vem {"level":"debug","time":1661181507,"message":"The query is : ' query interface( $uid_target: string, $uid_source: string,$sid: string, $ns: string, $st: string, $tid: string, $tt: string, $sub_type: string, $label:string, $subtype_source:string, $offset: int, $first: int,$attr: string){ Nodes (func: eq(type,namespace)) {uid}} '"}
vem {"level":"error","error":"rpc error: code = Unavailable desc = HTTP Server service in fail-fast","endpoint":"health","project":"vem","time":1661181510,"message":"rpc error: code = Unavailable desc = HTTP Server service in fail-fast"}
vem {"level":"error","user":"unknown","remote_ip":"10.4.38.215","host":"10.4.38.215:8080","method":"GET","path":"/health","protocol":"HTTP/1.1","user_agent":"kube-probe/1.23","status":500,"latency":3055.708589,"request":"EM9SSVq8KKGdop9pslw8B9mNcR1G6GNt","time":1661181510}
vem {"level":"error","error":"rpc error: code = Unavailable desc = HTTP Server service in fail-fast","endpoint":"health","project":"vem","time":1661181510,"message":"rpc error: code = Unavailable desc = HTTP Server service in fail-fast"}
vem {"level":"error","user":"unknown","remote_ip":"10.4.38.215","host":"10.4.38.215:8080","method":"GET","path":"/health","protocol":"HTTP/1.1","user_agent":"kube-probe/1.23","status":500,"latency":3089.095059,"request":"jPEl47h5iR7HNXPNQQ5LoRQ5r9iUZXzi","time":1661181510}
linkerd-proxy [ 0.000636s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
linkerd-proxy [ 0.001021s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
linkerd-proxy [ 0.001028s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
linkerd-proxy [ 0.001030s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
linkerd-proxy [ 0.001032s] INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190
linkerd-proxy [ 0.001034s] INFO ThreadId(01) linkerd2_proxy: Local identity is default.default.serviceaccount.identity.linkerd.cluster.local
linkerd-proxy [ 0.001040s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
linkerd-proxy [ 0.001042s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
linkerd-proxy [ 0.021968s] INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity: default.default.serviceaccount.identity.linkerd.cluster.local
linkerd-proxy [ 29.164678s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: rustls::session: Sending fatal alert BadCertificate
linkerd-proxy [ 29.164762s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: linkerd_reconnect: Failed to connect error=invalid certificate: CertExpired
linkerd-proxy [ 29.272196s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: rustls::session: Sending fatal alert BadCertificate
linkerd-proxy [ 29.272284s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: linkerd_reconnect: Failed to connect error=invalid certificate: CertExpired
linkerd-proxy [ 29.481385s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: rustls::session: Sending fatal alert BadCertificate
linkerd-proxy [ 29.481433s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: linkerd_reconnect: Failed to connect error=invalid certificate: CertExpired
linkerd-proxy [ 29.899839s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: rustls::session: Sending fatal alert BadCertificate
linkerd-proxy [ 29.899882s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: linkerd_reconnect: Failed to connect error=invalid certificate: CertExpired
linkerd-proxy [ 30.406763s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: rustls::session: Sending fatal alert BadCertificate
linkerd-proxy [ 30.406815s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: linkerd_reconnect: Failed to connect error=invalid certificate: CertExpired
linkerd-proxy [ 30.912893s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: rustls::session: Sending fatal alert BadCertificate
linkerd-proxy [ 30.912931s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: linkerd_reconnect: Failed to connect error=invalid certificate: CertExpired
linkerd-proxy [ 31.420402s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: rustls::session: Sending fatal alert BadCertificate
linkerd-proxy [ 31.420593s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: linkerd_reconnect: Failed to connect error=invalid certificate: CertExpired
linkerd-proxy [ 31.925121s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: rustls::session: Sending fatal alert BadCertificate
linkerd-proxy [ 31.925156s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: linkerd_reconnect: Failed to connect error=invalid certificate: CertExpired
linkerd-proxy [ 32.161447s] WARN ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}: linkerd_stack::failfast: HTTP Server entering failfast after 3s
linkerd-proxy [ 32.161483s] INFO ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}:rescue{client.addr=10.4.38.215:42374}: linkerd_app_core::errors::respond: Request failed error=HTTP Server service in fail-fast
linkerd-proxy [ 32.161506s] INFO ThreadId(01) outbound:server{orig_dst=10.5.139.182:9080}:rescue{client.addr=10.4.38.215:42374}: linkerd_app_core::errors::respond: Request failed error=HTTP Server service in fail-fast
The logs from the identity, and destination pods also seem off:
The logs from the linkerd-proxy injector pod seem fine though:
time="2022-08-22T15:15:46Z" level=info msg="running version stable-2.11.4"
time="2022-08-22T15:15:46Z" level=info msg="waiting for caches to sync"
time="2022-08-22T15:15:46Z" level=info msg="listening at :8443"
time="2022-08-22T15:15:46Z" level=info msg="caches synced"
time="2022-08-22T15:15:46Z" level=info msg="starting admin server on :9995"
[ 0.000904s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[ 0.001579s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[ 0.001590s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[ 0.001593s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[ 0.001596s] INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190
[ 0.001599s] INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local
[ 0.001607s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.001610s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.052824s] INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity: linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local
Checks seem "okay":
linkerd check --proxy
Linkerd core checks
===================
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
√ cluster networks can be verified
√ cluster networks contains all node podCIDRs
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ proxy-init container runs as root user if docker container runtime is used
linkerd-cni-plugin
------------------
√ cni plugin ConfigMap exists
√ cni plugin ClusterRole exists
√ cni plugin ClusterRoleBinding exists
√ cni plugin ServiceAccount exists
√ cni plugin DaemonSet exists
√ cni plugin pod is running on all nodes
linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
‼ issuer cert is valid for at least 60 days
issuer certificate will expire on 2022-08-24T14:18:19Z
see https://linkerd.io/2.11/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
√ issuer cert is issued by the trust anchor
linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
‼ proxy-injector cert is valid for at least 60 days
certificate will expire on 2022-08-22T18:30:19Z
see https://linkerd.io/2.11/checks/#l5d-proxy-injector-webhook-cert-not-expiring-soon for hints
√ sp-validator webhook has valid cert
‼ sp-validator cert is valid for at least 60 days
certificate will expire on 2022-08-22T18:30:19Z
see https://linkerd.io/2.11/checks/#l5d-sp-validator-webhook-cert-not-expiring-soon for hints
√ policy-validator webhook has valid cert
‼ policy-validator cert is valid for at least 60 days
certificate will expire on 2022-08-22T18:30:18Z
see https://linkerd.io/2.11/checks/#l5d-policy-validator-webhook-cert-not-expiring-soon for hints
linkerd-identity-data-plane
---------------------------
√ data plane proxies certificate match CA
linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date
linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
√ control plane proxies are up-to-date
√ control plane proxies and cli versions match
linkerd-data-plane
------------------
√ data plane namespace exists
\ waiting for check to complete (the check does not complete because of the failures on the injected pods)
Cluster POV
All of the certificates associated with linkerd resources are not expired
The corresponding secrets are present
The issuers (one for control plane and the other for webhooks) are there
Installation Process
Create the linkerd namespace
Generate step certificate and create linkerd-trust-anchor secret from them:
Apply the yaml templates found here (without modification)
Modify the values.yaml file in accordance with the secrets, certs and issuers we just generated and conforming to the doc (e.g. setting the scheme to kubernetes.io/tls instead of linkerd.io/tls)
Values.yaml
I've only posted the things I have changed (Assume all other fields are the ones found in the default values.yaml)
linkerdVersion: stable-2.11.4
enablePSP: true
imagePullSecrets:
- name: <REDACTED>
installNamespace: false
identityTrustAnchorsPEM: |
-----BEGIN CERTIFICATE-----
MIIBjTCCATSgAwIBAgIRAIA/moBnVTH/fq2Ra8jYAtEwCgYIKoZIzj0EAwIwJTEj
MCEGA1UEAxMacm9vdC5saW5rZXJkLmNsdXN0ZXIubG9jYWwwHhcNMjIwODAyMTM1
--------------------------REDACTED-----------------------------
c3Rlci5sb2NhbDBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IABDBF+Br4v954Ne0h
--------------------------REDACTED-----------------------------
sH9AH0GjRTBDMA4GA1UdDwEB/wQEAwIBBjASBgNVHRMBAf8ECDAGAQH/AgEBMB0G
A1UdDgQWBBQn9M2AyVNBVuAw3N2zUhudM6buLTAKBggqhkjOPQQDAgNHADBEAiBZ
--------------------------REDACTED-----------------------------
hYOm1v1l91gaZtX6Pu7Ma4k=
-----END CERTIFICATE-----
identity:
# -- If the linkerd-identity-trust-roots ConfigMap has already been created
externalCA: false
issuer:
scheme: kubernetes.io/tls
# -- Amount of time to allow for clock skew within a Linkerd cluster
clockSkewAllowance: 20s
# -- Amount of time for which the Identity issuer should certify identity
issuanceLifetime: 48h0m0s
# policy validator configuration
policyValidator:
externalSecret: true
caBundle: |
-----BEGIN CERTIFICATE-----
MIIBkzCCATqgAwIBAgIRAIxayAis3KX9ZzKY/y+1Bf0wCgYIKoZIzj0EAwIwKDEm
--------------------------REDACTED-----------------------------
MTQyNzUzWhcNMzIwNzMwMTQyNzUzWjAoMSYwJAYDVQQDEx13ZWJob29rLmxpbmtl
cmQuY2x1c3Rlci5sb2NhbDBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IABH2KoDyZ
--------------------------REDACTED-----------------------------
7jMNf5eTE2C+xdSjRTBDMA4GA1UdDwEB/wQEAwIBBjASBgNVHRMBAf8ECDAGAQH/
--------------------------REDACTED-----------------------------
ADBEAiBUnHrnplsJ2UQsoesysf54VMh2FyXTSINNpz8BOV+PbQIgNhBTSzFONZbr
H+sSRWeU3zDH9CdUim5GVoySPYq3hm0=
-----END CERTIFICATE-----
# proxy injector configuration
proxyInjector:
externalSecret: true
caBundle: |
-----BEGIN CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIBkzCCATqgAwIBAgIRAIxayAis3KX9ZzKY/y+1Bf0wCgYIKoZIzj0EAwIwKDEm
--------------------------REDACTED-----------------------------
MTQyNzUzWhcNMzIwNzMwMTQyNzUzWjAoMSYwJAYDVQQDEx13ZWJob29rLmxpbmtl
cmQuY2x1c3Rlci5sb2NhbDBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IABH2KoDyZ
--------------------------REDACTED-----------------------------
7jMNf5eTE2C+xdSjRTBDMA4GA1UdDwEB/wQEAwIBBjASBgNVHRMBAf8ECDAGAQH/
--------------------------REDACTED-----------------------------
ADBEAiBUnHrnplsJ2UQsoesysf54VMh2FyXTSINNpz8BOV+PbQIgNhBTSzFONZbr
H+sSRWeU3zDH9CdUim5GVoySPYq3hm0=
-----END CERTIFICATE-----
# service profile validator configuration
profileValidator:
externalSecret: true
caBundle: |
-----BEGIN CERTIFICATE-----
MIIBkzCCATqgAwIBAgIRAIxayAis3KX9ZzKY/y+1Bf0wCgYIKoZIzj0EAwIwKDEm
--------------------------REDACTED-----------------------------
MTQyNzUzWhcNMzIwNzMwMTQyNzUzWjAoMSYwJAYDVQQDEx13ZWJob29rLmxpbmtl
cmQuY2x1c3Rlci5sb2NhbDBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IABH2KoDyZ
--------------------------REDACTED-----------------------------
7jMNf5eTE2C+xdSjRTBDMA4GA1UdDwEB/wQEAwIBBjASBgNVHRMBAf8ECDAGAQH/
--------------------------REDACTED-----------------------------
ADBEAiBUnHrnplsJ2UQsoesysf54VMh2FyXTSINNpz8BOV+PbQIgNhBTSzFONZbr
H+sSRWeU3zDH9CdUim5GVoySPYq3hm0=
-----END CERTIFICATE-----
nodeSelector:
beta.kubernetes.io/os: linux
### HA part
controllerReplicas: 2
enablePodAntiAffinity: true
cniEnabled: true
#If set to Fail, will prevent deploying workloads that have linkerd annotation if the proxy could not be injected
webhookFailurePolicy: Ignore
# # proxy configuration
proxy:
outboundConnectTimeout: 1000ms
# -- Maximum time allowed for the proxy to establish an inbound TCP
inboundConnectTimeout: 1000ms
resources:
cpu:
request: 100m
limit: 200m
memory:
limit: 250Mi
request: 20Mi
Note: the identityTrustAnchorsPEM is the certificate generated in the step 2.
the policyValidator, proxyInjector and profileValidator caBundle is the certificate generated from step4.
For the 2 weeks when linkerd was working, it was extremely helpful to us, especially to comply with pod-pod encryption requirements from customers and regulators and the GRPC load balancing with which we have struggled for the past few years. Thanks for the great work!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
I am unsure whether I should report a bug or ask for help, but as it seems no one else is experiencing this issue, I will assume I did something wrong.
About 2 weeks ago I installed linkerd stable-2.11.4 on a k8s cluster v1.23.3 using helm 3.5.1.
I followed the processes outlined here and here to automate Rotating Control Plane TLS Credentials and Rotating Webhook TLS Credentials since we use cert-manager (1.6.2) on the cluster.
I enabled auto-injection on a few pods and namespaces and it was working great until yesterday.
Symptoms
The logs from the linkerd-proxy container sidecar inject in one of our pods:
The logs from the identity, and destination pods also seem off:
The logs from the linkerd-proxy injector pod seem fine though:
Checks seem "okay":
Cluster POV
All of the certificates associated with linkerd resources are not expired
The corresponding secrets are present

The issuers (one for control plane and the other for webhooks) are there
Installation Process
Values.yaml
I've only posted the things I have changed (Assume all other fields are the ones found in the default values.yaml)
Note: the identityTrustAnchorsPEM is the certificate generated in the step 2.
the policyValidator, proxyInjector and profileValidator caBundle is the certificate generated from step4.
For the 2 weeks when linkerd was working, it was extremely helpful to us, especially to comply with pod-pod encryption requirements from customers and regulators and the GRPC load balancing with which we have struggled for the past few years. Thanks for the great work!
Beta Was this translation helpful? Give feedback.
All reactions