Skip to content

460 Errors on ALB when using spring webflux #34485

@parthutsav9

Description

@parthutsav9

We have a backend service, referred to as 'Client,' written in Spring WebFlux and Java. The Client interacts with another backend service, 'Server,' which is developed using Spring Boot and Java.

Currently, the Client communicates with the Server using an ingress URL: http://server.example.com:80. However, we want to introduce an AWS Application Load Balancer (ALB) between the Client and the Server to enable canary deployments. The goal is to gradually shift traffic to canary pods during the Server's deployment, and once 100% traffic is directed to the canary pods, it will be switched to the main pods.

The ALB URL for the Server is: https://server-alb.example.com:443. When calling the Server through this ALB, we are observing a high number of 4xx (460 HTTP code) errors on the load balancer, along with frequent timeouts between the Client and the Server.

We have configured the WebClient timeouts in the Client as follows:

Response Timeout: 100ms
Connect Timeout: 50ms
Read Timeout: 10ms
Write Timeout: 10ms Additionally, we are using Mono.timeout() in Spring WebFlux with a 100ms timeout.
Our system is handling 1000 TPS with 10 pods (4Gi, 8 CPU cores per pod). The Server is receiving the same load at the same TPS rate, with 10 pods configured with 8Gi, 8 CPU cores per pod. The P999 of the Server's API response time is under 30ms.

Things we have already investigated:

  1. Some requests are not reaching the Server when the 460 HTTP code is returned.
  2. After consulting with the AWS team, they indicated that the issue appears to be on the Client's side, as the 460 error suggests that the Client is closing the connection.
  3. A FIN signal has been observed from the Client to the Server in the TCP dump.
  4. We suspect that Mono.timeout() could be causing the Client to close the connections prematurely.

If anyone has experience working with Spring WebFlux and AWS ALB, could you please share potential reasons for the high ELB 4xx (460 HTTP code) errors and timeouts when calling the Server via the load balancer? Any insights would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    for: stackoverflowA question that's better suited to stackoverflow.comin: webIssues in web modules (web, webmvc, webflux, websocket)status: invalidAn issue that we don't feel is valid

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions