protocol detection timeout despite opaque port annotation #8761
Unanswered
dwilliams782
asked this question in
Help
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
We've been working through this on Slack with @mateiidavid here but also raising a discussion for more input.
I'm seeing instances of
linkerd_detect: Continuing after timeout: linkerd_proxy_http::version::Version protocol detection timed out after 10s
despite having the opaque port annotation on both the service and pod.Linkerd:
edge-22.6.2
GKE:
v1.21.11-gke.1100
We have Grafana talking to Thanos Query Frontend, via a NodePort service. The NodePort port isn't actually used, instead it uses port 9090. The target service and target deployment are both annotated with
config.linkerd.io/opaque-ports: "9090"
:99.99% of the traffic works fine, but we are seeing very sporadic timeouts from the Grafana linkerd-proxy container:
Where the IP
10.224.44.90
is the the ClusterIP of thethanos-query-frontend
service. We run a single replica of each service, and the timeout logs do not correlate with application/pod start up times etc. They are, as far as I can tell, "random".Each time one of these logs is emitted, we get a corresponding log on the inbound proxy that suggests the requests are opaque:
however the logs on the outbound (where the timestamps also correspond to the protocol detection log timestamps) look like the service:port is not opaque, with
opaque_protocol: false
:It looks like the service is not being detected as opaque, however we've tested this using the destination script and get back an opaque port:
It might not be relevant, but in almost all cases of this protocol detection timeout log being emitted, 10 seconds prior, we see logs around the service becoming unavailable:
Note: I also see this
HTTP Balancer service has become unavailable
far more frequently than we are seeing protocol timeout logs, and I'm unsure why.There is a gist here which contains the manifests for:
If there's any other information I can provide, please let me know. I have debug logs enabled on the outbound and inbound proxies so I can provide more logs as required.
Beta Was this translation helpful? Give feedback.
All reactions