Skip to content

Index Duplication in Gorouter Leading to Routing to Stale Endpoints #520

@Mrizwanshaik

Description

@Mrizwanshaik

Current behavior

Description:
Recently, we encountered an issue in Gorouter where a new instance endpoint was registered on the same index as an existing stale endpoint. This resulted in Gorouter routing requests to the unhealthy endpoint, leading to multiple 502 errors.

Details:

Route-emitter was down and not sending unregister messages. Due to route integrity, gorouter retained the endpoint information and did not prune those stale endpoint.

In the meantime, Diego recreated a new instance on another cell with a new instance_id and canonical address (IP:Port). Gorouter treated this as a new endpoint and added it to the routing pool on the same index where the stale endpoint already existed. The current implementation
in Gorouter
does not validate the index number but only considers the canonical address and instance_id when adding endpoints.

Since mTLS is enabled for Gorouter-to-app container traffic, Gorouter does not prune stale endpoint unless they match one of the prunableClassifiers. Additionally, the requests were non-idempotent, so Gorouter did not retry them on the healthy endpoint. Eventually, we observed the prune-endpoint-failed log in Gorouter, but by then, it was too late.

Desired behavior

We propose introducing new logic in Gorouter to check if an endpoint already exists in the pool for the same index. If a new registration message is received for the same index, the existing endpoint should be updated or replaced to prevent duplicate registrations.

Affected Version

0.351.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Inbox

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions