Skip to content

Commit b8f30b6

Browse files
authored
Merge pull request #1222 from sap-contributions/rfc-hash-based-routing
RFC: Hash-based routing
2 parents 9c7183f + 0915866 commit b8f30b6

File tree

5 files changed

+220
-0
lines changed

5 files changed

+220
-0
lines changed
Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# Meta
2+
3+
[meta]: #meta
4+
5+
- Name: Implementing a Hash-Based Load Balancing Algorithm for CF Routing
6+
- Start Date: 2025-04-07
7+
- Author(s): b1tamara, Soha-Albaghdady
8+
- Status: Draft <!-- Acceptable values: Draft, Approved, On Hold, Superseded -->
9+
- RFC Pull Request: https://github.com/cloudfoundry/community/pull/1222
10+
11+
## Summary
12+
13+
Cloud Foundry uses round-robin and least-connection algorithms for load balancing between Gorouters and backends. While
14+
effective in many scenarios, these algorithms may not be ideal for certain use cases. Therefore, this RFC proposes to
15+
introduce a hash-based routing on a per-route basis.
16+
The hash-based load balancing algorithm uses the hash of a request header to make routing decisions, focusing on
17+
distributing users across instances rather than individual requests, thereby improving load balancing in specific
18+
scenarios.
19+
20+
## Motivation
21+
22+
Cloud Foundry offers two load balancing algorithms to manage request distribution between Gorouters and backends. The
23+
round-robin algorithm ensures the number of requests is distributed equally across all available backends, and the
24+
least-connection algorithm tries to keep the number of active requests equal across all backends. A recent enhancement
25+
allows these load balancing algorithms to be configured on the application route level.
26+
27+
However, these existing algorithms are not ideal for scenarios that require routing based on specific identifiers.
28+
29+
One use case is optimizing resource management of complex in-memory caches. While 12-factor apps are stateless and can
30+
retrieve necessary information from backing services, it is often useful to cache data and reduce latency. When a cache
31+
is limited in size (e.g., Least Recently Used), exposing each app instance to all users may lead to thrashing and lower
32+
cache efficiency. By "pinning" users to a particular instance, the cache can remain effective. In the event of an
33+
instance exchange (up or downscaling, evacuation, rolling update), another instance can still provide a response and
34+
fill its cache without interruption for the user. For most users, subsequent requests can be processed at lower latency
35+
by utilizing a warm and effective cache.
36+
37+
Another use case: users from different tenants send requests to application instances that establish connections to
38+
tenant-specific databases.
39+
40+
![](rfc-draft-hash-based-routing/problem.drawio.png)
41+
42+
With the current load balancing algorithms, each tenant eventually creates a connection to
43+
each application instance, which then creates connection pools to every customer database. As a result, all tenants
44+
might span up a full mesh, leading to too many open connections to the customer databases, impacting performance. This
45+
limitation highlights a gap in achieving efficient load distribution, particularly when dealing with limited or
46+
memory-intensive resources in backend services, and can be addressed through hash-based routing. In short, hash-based
47+
routing is an algorithm that facilitates the distribution of requests to application instances by using a stable hash
48+
derived from request identifiers, such as headers.
49+
50+
## Proposal
51+
52+
We propose introducing hash-based routing as a load balancing algorithm for use on a per-route basis to address the
53+
issues described in the earlier use cases.
54+
55+
The approach leverages an HTTP header, which is associated with each incoming request and contains the specific
56+
identifier. This one is used to compute a hash value, which will serve as the basis for routing decisions.
57+
58+
In the previously mentioned use cases, the specific identifier included in the header can serve as the basis for hash
59+
calculation. This hash value determines the appropriate application instance for each request, ensuring
60+
that all requests with this identifier are consistently routed to the same instance or might be routed to another
61+
instance when the instance is saturated. Consequently, the load balancing algorithm effectively directs requests for a
62+
single tenant to a particular application instance, so that instance can minimize database connection overhead and
63+
optimize connection pooling, enhancing efficiency and system performance.
64+
65+
### Requirements
66+
67+
#### Only Application Per-Route Load Balancing
68+
69+
Hash-based load balancing solves a particular load pattern, rather than serving as a general-purpose load balancing
70+
algorithm. Consequently, it will be configured exclusively as a per-route option for applications and will not be
71+
offered as a global setting.
72+
73+
#### Minimal rehashing over all Gorouter VMs
74+
75+
Rehashing should be minimized, especially when the number of application instances changes over time.
76+
77+
For the scenario when a new application instance (e.g. app_instance3) is added, Gorouter updates the mapping so that it
78+
maps part of the hashes to the new instance.
79+
80+
| Hash | Application instance(s) before | Application instance(s) after a new instance added |
81+
|-------|--------------------------------|----------------------------------------------------|
82+
| Hash1 | app_instance1 | app_instance1 |
83+
| Hash2 | app_instance1 | app_instance3 |
84+
| Hash3 | app_instance2 | app_instance2 |
85+
| ... | ... | ... |
86+
| HashN | app_instance2 | app_instance3 |
87+
88+
For the scenario when the application is scaled down, Gorouter updates the mapping immediately after routes update, so
89+
that it remaps hashes associated with the app_instance3:
90+
91+
| Hash | Application instance(s) before | Application instance(s) after the app_instance_3 removed |
92+
|-------|--------------------------------|----------------------------------------------------------|
93+
| Hash1 | app_instance1 | app_instance1 |
94+
| Hash2 | app_instance3 | app_instance1 |
95+
| Hash3 | app_instance2 | app_instance2 |
96+
| ... | ... | ... |
97+
| HashN | app_instance3 | app_instance2 |
98+
99+
100+
#### Considering a balance factor
101+
102+
Before routing a request, the current load on each application instance must be evaluated using a balance factor. This
103+
load is measured by the number of in-flight requests. For example, with a balance factor of 1.5, no application instance
104+
should exceed 150% of the average number of in-flight requests across all application instances. Consequently, requests
105+
must be distributed to different application instances that are not overloaded.
106+
107+
Example:
108+
109+
| Application instance | Current request count | Current request count / Average number of in-flight requests |
110+
|----------------------|-----------------------|--------------------------------------------------------------|
111+
| app_instance1 | 10 | 20% |
112+
| app_instance2 | 50 | 100% |
113+
| app_instance3 | 90 | 180% |
114+
115+
Based on the average number of 50 requests, the current request count to app_instance3 exceeds the balance factor. As a
116+
result, new requests to app_instance3 must be distributed to different application instances.
117+
118+
#### Deterministic handling of overflow traffic to the next application instance
119+
120+
The application instance is considered overloaded when the current request load of this application exceeds the balance
121+
factor. Overflow traffic should always be directed to the same next instance rather than to a random one.
122+
123+
A possible presentation of deterministic handling can be a ring like:
124+
125+
![](rfc-draft-hash-based-routing/HashRing.drawio.png)
126+
127+
### Required Changes
128+
129+
#### Gorouter
130+
131+
- Gorouter MUST be extended to take a specific identifier via the request header
132+
- Gorouter MUST implement hash calculation, based on the provided header
133+
- Gorouter MAY store the mapping between computed hash values and application instances locally to avoid
134+
expensive recalculations for each incoming request
135+
- Gorouters SHOULD NOT implement a distributed shared cache
136+
- Gorouter MUST assess the current number of in-flight requests across all application instances mapped to a
137+
particular route to consider overload situations
138+
- Gorouter MAY update its local hash table following the registration or deregistration of an endpoint, ensuring
139+
minimal rehashing
140+
- Gorouter SHOULD NOT incur any performance hit when 0 apps use hash routing.
141+
142+
For a detailed understanding of the workflows on Gorouter's side, please refer to the [activity diagrams](#diagrams).
143+
144+
#### Cloud Controller
145+
146+
- The `loadbalancing` property of
147+
the [route object](https://v3-apidocs.cloudfoundry.org/version/3.190.0/index.html#the-route-options-object) MUST be
148+
updated to include `hash` as an acceptable value
149+
- The [route options object](https://v3-apidocs.cloudfoundry.org/version/3.190.0/index.html#the-route-options-object)
150+
MUST include two new properties, `hash_header` and `hash_balance`, to configure a request header as the hashing key
151+
and the balance factor
152+
- It MUST implement the validation of the following requirements:
153+
- The `hash_header` property is mandatory when load balancing is set to hash
154+
- The `hash_balance` property is optional when load balancing is set to hash. Leaving out `hash_balance` or setting
155+
it explicitly to 0 means the load situation will not be considered
156+
- To account for overload situations, `hash_balance` values should be greater than 1.1. During the implementation
157+
phase, the values will be evaluated to identify the best fit for the recommended range
158+
- For load balancing algorithms other than hash, the `hash_balance` and `hash_header` properties MUST not be set
159+
160+
An example for manifest with these properties:
161+
162+
```yaml
163+
version: 1
164+
applications:
165+
- name: test
166+
routes:
167+
- route: test.example.com
168+
options:
169+
loadbalancing: hash
170+
hash_header: tenant-id
171+
hash_balance: 1.25
172+
- route: anothertest.example.com
173+
options:
174+
loadbalancing: least-connection
175+
```
176+
177+
The decision to introduce plain keys was influenced by the following points:
178+
179+
- Simple to use
180+
- It allows for easy addition of more load-balancing-related properties if new requirements arise in the future
181+
- It complies with
182+
the [RFC #0027 that introduced per-route options](https://github.com/cloudfoundry/community/blob/main/toc/rfc/rfc-0027-generic-per-route-features.md#proposal),
183+
which states that the map must use strings as keys and can use numbers, strings, and the literals true and false as
184+
values
185+
186+
### Components Where No Changes Are Required
187+
188+
#### CF CLI
189+
190+
The [current implementation of route option in the CF CLI](https://github.com/cloudfoundry/cli/blob/main/resources/options_resource.go)
191+
supports the use of `--option KEY=VALUE`, where the key and value are sent directly to CC for validation. Consequently,
192+
the `create-route`, `update-route`, and `map-route` commands require no modifications, as they already accept the
193+
proposed properties.
194+
Example:
195+
196+
```bash
197+
cf create-route MY-APP example.com -n test -o loadbalancing=hash -o hash_header=tenant-id -o hash_balance=1.25
198+
cf update-route MY-APP example.com -n test -o loadbalancing=hash -o hash_header=tenant-id -o hash_balance=1.25
199+
cf update-route MY-APP example.com -n test -o loadbalancing=hash -o hash_header=tenant-id
200+
cf update-route MY-APP example.com -n test -o loadbalancing=hash -o hash_balance=1.25
201+
cf map-route MY-APP example.com -n test -o loadbalancing=hash -o hash_header=tenant-id -o hash_balance=1.25
202+
```
203+
204+
#### Route-Emitter
205+
206+
The options are raw JSON and will be passed directly to the Gorouter without any modifications.
207+
208+
#### Route-Registrar
209+
210+
In the scope of this RFC, it is not planned to implement hash-based routing in route-registrar for platform-routes.
211+
212+
### Diagrams
213+
214+
#### An activity diagram for routing decision for an incoming request
215+
216+
![](rfc-draft-hash-based-routing/ActivityDiagram.drawio.png)
217+
218+
#### A simplified activity diagram for Gorouter's endpoint registration process
219+
220+
![](rfc-draft-hash-based-routing/EndpointRegistration.drawio.png)
116 KB
Loading
59.5 KB
Loading
16.2 KB
Loading
70.2 KB
Loading

0 commit comments

Comments
 (0)