You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For those unfamiliar with Cortex, it has multiple components that can scale independently. Each component kind uses internal routing to decide who to talk to; it doesn't use Kubernetes Services, meaning a Pod from Component A will talk to a specific Pod from Component B, reaching it by its IP address.
In the case of the Frontends and Queriers, the Queriers register themselves against the Frontend instances and then depending on the queries that the Frontend receives, it decides to which Queriers to talk (and sometimes, multiple ones when it has to split queries).
Linkerd detects traffic between Query Frontend and Queriers based on the diagram shown on the dashboards. However, if I tap into them, I cannot see them talking to each other, and there are no statistics. The component that sends requests to the Query Frontend is seeing responses from it (which is how I know the whole solution is working). This happened with the latest Linkerd stable version (also tried the latest edge).
For instance,
Note how the Inbound stats against Queriers are empty.
Similarly, note how the Outbound stats against Query Frontends are empty.
The fact that they are listed might indicate that somehow Linkerd is aware of the traffic, but I don't understand why I cannot see the packets or statistics about them. They must have been talking to each other, as I can make queries and see results.
Using the Debug Container on the Frontend Pods, I can see that the communication via gRPC against Queriers is present and encrypted (using tshark -i any tcp port 9095, or filtering by Pod IP), and apparently the same for HTTP against our gateway application (using tshark -i any tcp port 8080); which is why it is unexpected that there are no stats and the linkerd viz tap command cannot see the traffic.
On top of that, I do not see traffic against all the Memcached instances on the dashboards or via linkerd viz tap (except requests from Prometheus). In this case, the Frontend stores results on a cache, and the Queriers should have access to the Metadata cache (Store Gateways to the Index, Metadata, and Chunks cache). I would expect to see TCP traffic, but that's not happening (when I tap on any of them), but I can see stats and edges for all of them (unlike the problem described above). I know that's working because the Cortex Dashboards for the Read Path are showing cache hits and misses.
From the Frontend, with the Debug container, Memcached traffic seems encrypted based on what I see via tshark, meaning the proxy seems to be applying mTLS; checked with tshark -i any -d tcp.port=11211,ssl host 10.244.0.36 where 10.244.0.36 is the IP of one of the Memcached Pods from a Cortex instance that talks to it.
Other communication scenarios seem to work (talk via Pod IP instead of Kubernetes Service), for instance, communication between Distributors and Ingesters, between Queriers and Ingesters, and between Queriers and Store Gateways.
The problem, for some reason, appears between Queriers and Frontends.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I'm having some issues with Linkerd and Cortex (https://cortexmetrics.io).
For those unfamiliar with Cortex, it has multiple components that can scale independently. Each component kind uses internal routing to decide who to talk to; it doesn't use Kubernetes Services, meaning a Pod from Component A will talk to a specific Pod from Component B, reaching it by its IP address.
In the case of the Frontends and Queriers, the Queriers register themselves against the Frontend instances and then depending on the queries that the Frontend receives, it decides to which Queriers to talk (and sometimes, multiple ones when it has to split queries).
Linkerd detects traffic between Query Frontend and Queriers based on the diagram shown on the dashboards. However, if I tap into them, I cannot see them talking to each other, and there are no statistics. The component that sends requests to the Query Frontend is seeing responses from it (which is how I know the whole solution is working). This happened with the latest Linkerd stable version (also tried the latest edge).
For instance,
Note how the Inbound stats against Queriers are empty.
Similarly, note how the Outbound stats against Query Frontends are empty.
The fact that they are listed might indicate that somehow Linkerd is aware of the traffic, but I don't understand why I cannot see the packets or statistics about them. They must have been talking to each other, as I can make queries and see results.
Using the Debug Container on the Frontend Pods, I can see that the communication via gRPC against Queriers is present and encrypted (using
tshark -i any tcp port 9095
, or filtering by Pod IP), and apparently the same for HTTP against our gateway application (usingtshark -i any tcp port 8080
); which is why it is unexpected that there are no stats and thelinkerd viz tap
command cannot see the traffic.On top of that, I do not see traffic against all the Memcached instances on the dashboards or via
linkerd viz tap
(except requests from Prometheus). In this case, the Frontend stores results on a cache, and the Queriers should have access to the Metadata cache (Store Gateways to the Index, Metadata, and Chunks cache). I would expect to see TCP traffic, but that's not happening (when I tap on any of them), but I can see stats and edges for all of them (unlike the problem described above). I know that's working because the Cortex Dashboards for the Read Path are showing cache hits and misses.From the Frontend, with the Debug container, Memcached traffic seems encrypted based on what I see via tshark, meaning the proxy seems to be applying mTLS; checked with
tshark -i any -d tcp.port=11211,ssl host 10.244.0.36
where10.244.0.36
is the IP of one of the Memcached Pods from a Cortex instance that talks to it.Other communication scenarios seem to work (talk via Pod IP instead of Kubernetes Service), for instance, communication between Distributors and Ingesters, between Queriers and Ingesters, and between Queriers and Store Gateways.
The problem, for some reason, appears between Queriers and Frontends.
Any help would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions