-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Description
As a user, I hope that for the same service name, different instances can be filtered in different routes, upstreams, and services.
Q. Why do we need metadata filtering?
A. For the same service, we will have different metadata for management. For example, when the recommendation system model changes, the model version will be modified and traffic switching and ab experiments will be performed through client coloring.
Q. Why not use different service names?
A. We will create multiple environments based on the same service template. They only have some different metadata, such as env or lane, and model version. As the metadata dimension increases, the service names we need to locate specific coordinates will increase nonlinearly and client traffic is not colored so we have no way to use multi-routes.
Q. What upstream types need to be supported?
A. Including Consul, Eureka, Nacos and other registration center types, as well as the registration center selection that may be added in the future or within the enterprise can directly inherit this capability
Q. Any other context?
A1. I'm sorry I broke the process. I have implemented filtering in the upstream dimension internally and submitted PR #12448. During the communication process, I conservatively changed to only support Consul's implementation, but following the DRY principle, I think it may be suitable to put it in the upstream instead of implementing it once for each discovery.
A2. Currently, someone in Nacos is also implementing the same function #12392 #12445, but only supports single value matching. However, we hope to support multi-value matching. This is because user requests often do not have any tags, so if we want to allow multiple instances with different metadata to provide services together, we must register a special metadata to group them, but this grouping cannot be changed dynamically.
Real scenarios I have experienced:
In order to control the explosion radius, we deploy services in multiple data centers, multiple availability zones, and even multiple regions. The gateway, as a global load balancer, will forward requests evenly (possibly with a geographic location bias). Ideally, each availability zone is the same and deployed in a unit, but in reality there will always be various accidents that cause a single data center to be unavailable. Due to various human errors, the probability is much greater than "Snakes in a Facebook Datacenter" or "excavators digging up cables" (sounds absurd, but these two also happen frequently). At this time, there needs to be a way to directly remove this data center from the global load balancer to ensure SLA, and it is obvious that removing this data center from the list is the simplest and most efficient way.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status