You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello all! We've been doing some load testing in our cluster, which contains 10+ .NET gRPC microservices, and we're running into an issue when we add Linkerd into the mix. We're seeing "HTTP Logical service in fail-fast" errors when our load testing gets to using around 100+ streams.
// Errors seen by the microservice's logs
Message: "One or more errors occurred. (Status(StatusCode=\"Unavailable\", Detail=\"HTTP Logical service in fail-fast\"))", TargetSite: null, Data: [], InnerException: RpcException { Status: Status { StatusCode: Unavailable, Detail: "HTTP Logical service in fail-fast", DebugException: null }, StatusCode: Unavailable <Truncated rest>
// Errors seen in the linkerd-proxy's logs
linkerd_stack::failfast: HTTP Logical service has become unavailable
I believe this might be related to how .NET deals with hitting a concurrent stream limit on a gRPC / HTTP/2 connection, and how that in turn plays with Linkerd. There is a property we can configure in the .NET service called EnableMultipleHttp2Connections, and when set to true it will create additional HTTP/2 connections when the concurrent stream limit is hit on an existing one. Our services use the default stream limit of 100 per connection.
So Microservice A connects to Microservice B, and if it hits the concurrent limit of 100 streams that B has set, then A will create new, additional connections to B. This works when the microservices talk to each other directly but not when Linkerd is added in.
I think what may be happening is that now, with Linkerd, Microservice A is connecting to its own Linkerd proxy instead, and from I have gathered, the Linkerd proxy does not set a maximum concurrent stream limit for HTTP/2 connections, if this is correct? This is based off of #2390.
So because Microservice A is talking to the proxy now, and the proxy doesn't set a limit, the .NET service never creates multiple HTTP/2 connections since there is no limit for it to hit on a connection - it connects to the proxy over a single one and sends everything over that.
However, Microservice B still has a maximum limit of 100 streams, which its Linkerd proxy has to respect. And so things are getting bottlenecked at that end. The connection between the Linkerd proxy and Microservice B hits the concurrent stream limit, no further streams can be made and Linkerd starts reporting the fail-fast errors.
I made a (probably oversimplified haha) chart that I think more readily visualizes this, which I've attached as an image below. The top portion is without Linkerd, and the bottom is with Linkerd.
I wanted to check if this makes sense as far as how the Linkerd proxies are working? It's all sort of predicated on the notion that the microservices are communicating with the proxies now, and the proxies don't set a concurrent stream limit, so the EnableMultipleHttp2Connections feature in .NET doesn't really work as a solution any longer. And the Linkerd proxies don't have a similar function (as far as I'm aware) - when it hits the limit on the connection, it won't create new, additional ones to use.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello all! We've been doing some load testing in our cluster, which contains 10+ .NET gRPC microservices, and we're running into an issue when we add Linkerd into the mix. We're seeing "HTTP Logical service in fail-fast" errors when our load testing gets to using around 100+ streams.
I believe this might be related to how .NET deals with hitting a concurrent stream limit on a gRPC / HTTP/2 connection, and how that in turn plays with Linkerd. There is a property we can configure in the .NET service called EnableMultipleHttp2Connections, and when set to true it will create additional HTTP/2 connections when the concurrent stream limit is hit on an existing one. Our services use the default stream limit of 100 per connection.
https://learn.microsoft.com/en-us/aspnet/core/grpc/performance?view=aspnetcore-6.0#connection-concurrency
So Microservice A connects to Microservice B, and if it hits the concurrent limit of 100 streams that B has set, then A will create new, additional connections to B. This works when the microservices talk to each other directly but not when Linkerd is added in.
I think what may be happening is that now, with Linkerd, Microservice A is connecting to its own Linkerd proxy instead, and from I have gathered, the Linkerd proxy does not set a maximum concurrent stream limit for HTTP/2 connections, if this is correct? This is based off of #2390.
So because Microservice A is talking to the proxy now, and the proxy doesn't set a limit, the .NET service never creates multiple HTTP/2 connections since there is no limit for it to hit on a connection - it connects to the proxy over a single one and sends everything over that.
However, Microservice B still has a maximum limit of 100 streams, which its Linkerd proxy has to respect. And so things are getting bottlenecked at that end. The connection between the Linkerd proxy and Microservice B hits the concurrent stream limit, no further streams can be made and Linkerd starts reporting the fail-fast errors.
I made a (probably oversimplified haha) chart that I think more readily visualizes this, which I've attached as an image below. The top portion is without Linkerd, and the bottom is with Linkerd.
I wanted to check if this makes sense as far as how the Linkerd proxies are working? It's all sort of predicated on the notion that the microservices are communicating with the proxies now, and the proxies don't set a concurrent stream limit, so the EnableMultipleHttp2Connections feature in .NET doesn't really work as a solution any longer. And the Linkerd proxies don't have a similar function (as far as I'm aware) - when it hits the limit on the connection, it won't create new, additional ones to use.
Beta Was this translation helpful? Give feedback.
All reactions