-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add session algorithm to tablet balancer #19007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
72 commits
Select commit
Hold shift + click to select a range
1732eb7
Update `TabletBalancer.Pick` signature to accept options
mhamza15 f37cdc8
Initial implementation of the hash ring
mhamza15 114fece
Set up initial health check subscription
mhamza15 4742e8d
Rename to `SessionBalancer` and create initial tests
mhamza15 5be7ac7
Improvements and add more tests
mhamza15 80eb92f
Clarify comment
mhamza15 c421c3d
Remove brackets
mhamza15 34ca9e9
Clarify comment
mhamza15 92abecb
Add session hash to session proto
mhamza15 41141d7
Clarify comment
mhamza15 70ed65f
Update `SessionHash` to be optional
mhamza15 0698c2b
Pass in invalid tablets to tablet balancer
mhamza15 7159adf
Build initial hash rings
mhamza15 b77e758
Set up health check subscription first to avoid missing changes
mhamza15 b38c28f
Hash uuid on pick
mhamza15 df48aa1
Add invalid tablets to `PickOpts`
mhamza15 a18bab7
Make session uuid not pointer
mhamza15 f9956e2
Remove unused import
mhamza15 d60c593
Remove old tablets when a tablet's target changes
mhamza15 5833a54
Fetch initial tablet state after subscribing to health check
mhamza15 5bf666d
Revert "Remove old tablets when a tablet's target changes"
mhamza15 d8723b3
Remove primary tablets from rings
mhamza15 479cde4
undo formatting
mhamza15 475408c
Pass session uuid to `withRetry`
mhamza15 98a7c8b
Fix new types
mhamza15 f59a14f
Fix tests
mhamza15 2620b76
Pass `WrapOpts` by value
mhamza15 e62c88f
Pass `PickOpts` by value
mhamza15 0db53cd
Change `ExecuteOptions` in `WrapOpts` as a pointer
mhamza15 cde392c
Fix `balancer-type` help text
mhamza15 eadf1ae
Fix some bugs
mhamza15 8fae28c
Get cell from tablet alias rather than target
mhamza15 cd30d9d
Initial e2e test
mhamza15 37b7e62
Add more e2e tests
mhamza15 0a8ce4b
undo auto fmt
mhamza15 bdc2ba5
remove tablet from old ring if its target changes
mhamza15 fa18203
Switch to rendezvous hashing
mhamza15 b4ecebc
Remove health check
mhamza15 8dc63cc
Merge remote-tracking branch 'upstream/main' into session-balancer
mhamza15 01b5bb5
Update balancer.go
mhamza15 b758326
more conflict changes
mhamza15 b30017c
remove some allocations
mhamza15 89455ec
adjust tests to match previous tests
mhamza15 27a48d8
add benchmarks comparing all balancers
mhamza15 8236141
add session mode to help text
mhamza15 f45a400
remove benchmark
mhamza15 3f4c17e
remove PickOpts.InvalidTablets
mhamza15 6abc3f7
add promptless changelog suggestion
mhamza15 1b9b7d9
temp: add session to queryservice
mhamza15 5f497bf
Revert "temp: add session to queryservice"
mhamza15 dc61e95
remove random changelog file
mhamza15 ee8f2b6
Update `QueryService` to include session
mhamza15 859babc
Merge remote-tracking branch 'upstream/main' into session-balancer
mhamza15 0060112
fix e2e test
mhamza15 c00ef7b
fix import cycle
mhamza15 2ade6a9
try to fix actions
mhamza15 6531827
fix scatter conn test
mhamza15 74dbe47
try to fix actions
mhamza15 61439c5
remove debug message
mhamza15 b3fa8a6
Merge remote-tracking branch 'upstream/main' into session-balancer
mhamza15 62d289d
undo ci changes for flakiness
mhamza15 091aba7
add nil session checks
mhamza15 6f706d6
remove nil safe sessions
mhamza15 53664df
simplify pick opts
mhamza15 de491d1
simplify empty sessions
mhamza15 19e11c1
goimports
mhamza15 a4eb6e1
Update go/vt/vtgate/balancer/balancer.go
mhamza15 f63dcd6
switch pick opts to functional options
mhamza15 a3b213a
Update options.go
mhamza15 ef62ea0
Merge branch 'main' into session-balancer
mhamza15 5d6d428
Test connection stickiness
mhamza15 8fccc83
Merge branch 'main' into session-balancer
mhamza15 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,223 @@ | ||
| /* | ||
| Copyright 2025 The Vitess Authors. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| */ | ||
|
|
||
| package tabletbalancer | ||
|
|
||
| import ( | ||
| "context" | ||
| "fmt" | ||
| "testing" | ||
| "time" | ||
|
|
||
| "github.com/stretchr/testify/require" | ||
|
|
||
| "vitess.io/vitess/go/mysql" | ||
| "vitess.io/vitess/go/test/endtoend/cluster" | ||
| "vitess.io/vitess/go/vt/vtgate/planbuilder/plancontext" | ||
| ) | ||
|
|
||
| // TestSessionModeBalancer tests the "session" mode routes each session consistently to the same tablet. | ||
| func TestSessionModeBalancer(t *testing.T) { | ||
| vtgateProcess, vtParams, _, _ := setupCluster(t) | ||
| defer vtgateProcess.TearDown() | ||
|
|
||
| // Create 2 session connections that route to different tablets | ||
| conns := createSessionConnections(t, &vtParams, 2) | ||
| for conn := range conns { | ||
| defer conn.Close() | ||
| } | ||
|
|
||
| verifyStickiness(t, conns, 20) | ||
| } | ||
|
|
||
| // TestSessionModeRemoveTablet tests that when a tablet is killed, connections switch to remaining tablets | ||
| func TestSessionModeRemoveTablet(t *testing.T) { | ||
| vtgateProcess, vtParams, replicaTablets, aliases := setupCluster(t) | ||
| defer vtgateProcess.TearDown() | ||
|
|
||
| // Create 2 connections to different tablets | ||
| conns := createSessionConnections(t, &vtParams, 2) | ||
| for conn := range conns { | ||
| defer conn.Close() | ||
| } | ||
|
|
||
| // Find the first replica tablet that one of our connections is using | ||
| var tabletToKill *cluster.Vttablet | ||
| var affectedConn *mysql.Conn | ||
| var killedServerID int64 | ||
|
|
||
| for _, tablet := range replicaTablets { | ||
| tabletServerID := aliases[tablet.Alias] | ||
|
|
||
| // Check if any connection is using this tablet | ||
| for conn, connServerID := range conns { | ||
| if connServerID != tabletServerID { | ||
| continue | ||
| } | ||
|
|
||
| // We found a connection that's using this tablet, let's kill this tablet | ||
| tabletToKill = tablet | ||
| affectedConn = conn | ||
| killedServerID = tabletServerID | ||
| break | ||
| } | ||
|
|
||
| // We found a tablet, no need to check other tablets | ||
| if tabletToKill != nil { | ||
| break | ||
| } | ||
| } | ||
|
|
||
| require.NotNil(t, tabletToKill, "Should find a tablet to kill") | ||
|
|
||
| // Kill the tablet immediately | ||
| err := tabletToKill.VttabletProcess.Kill() | ||
| require.Error(t, err) | ||
|
|
||
| // Wait for the connection to switch to a new tablet and update the map | ||
| require.Eventually(t, func() bool { | ||
| newServerID := getServerID(t, affectedConn) | ||
| if newServerID != killedServerID { | ||
| conns[affectedConn] = newServerID | ||
| return true | ||
| } | ||
|
|
||
| return false | ||
| }, 10*time.Millisecond, 1*time.Millisecond, "Connection should switch to a different tablet") | ||
|
|
||
| verifyStickiness(t, conns, 20) | ||
| } | ||
|
|
||
| // setupCluster sets up a cluster with a vtgate using the session balancer. | ||
| func setupCluster(t *testing.T) (*cluster.VtgateProcess, mysql.ConnParams, []*cluster.Vttablet, map[string]int64) { | ||
| t.Helper() | ||
|
|
||
| // Start vtgate in cell1 with session mode | ||
| vtgateProcess := cluster.VtgateProcessInstance( | ||
| clusterInstance.GetAndReservePort(), | ||
| clusterInstance.GetAndReservePort(), | ||
| clusterInstance.GetAndReservePort(), | ||
| cell1, | ||
| fmt.Sprintf("%s,%s", cell1, cell2), | ||
| clusterInstance.Hostname, | ||
| replicaStr, | ||
| clusterInstance.TopoProcess.Port, | ||
| clusterInstance.TmpDirectory, | ||
| []string{ | ||
| "--vtgate-balancer-mode", "session", | ||
| }, | ||
| plancontext.PlannerVersion(0), | ||
| ) | ||
| require.NoError(t, vtgateProcess.Setup()) | ||
| require.True(t, vtgateProcess.WaitForStatus()) | ||
|
|
||
| vtParams := mysql.ConnParams{ | ||
| Host: clusterInstance.Hostname, | ||
| Port: vtgateProcess.MySQLServerPort, | ||
| } | ||
|
|
||
| allTablets := clusterInstance.Keyspaces[0].Shards[0].Vttablets | ||
| shardName := clusterInstance.Keyspaces[0].Shards[0].Name | ||
| replicaTablets := replicaTablets(allTablets) | ||
|
|
||
| conn, err := mysql.Connect(context.Background(), &vtParams) | ||
| require.NoError(t, err) | ||
| defer conn.Close() | ||
|
|
||
| // Wait for tablets to be discovered | ||
| err = vtgateProcess.WaitForStatusOfTabletInShard(fmt.Sprintf("%s.%s.primary", keyspaceName, shardName), 1, 30*time.Second) | ||
| require.NoError(t, err) | ||
|
|
||
| err = vtgateProcess.WaitForStatusOfTabletInShard(fmt.Sprintf("%s.%s.replica", keyspaceName, shardName), len(replicaTablets), 30*time.Second) | ||
| require.NoError(t, err) | ||
|
|
||
| aliases := mapTabletAliasToMySQLServerID(t, allTablets) | ||
|
|
||
| // Insert test data | ||
| testValue := fmt.Sprintf("session_test_%d", time.Now().UnixNano()) | ||
| _, err = conn.ExecuteFetch(fmt.Sprintf("INSERT INTO balancer_test (value) VALUES ('%s')", testValue), 1, false) | ||
| require.NoError(t, err) | ||
| waitForReplication(t, replicaTablets, testValue) | ||
|
|
||
| return vtgateProcess, vtParams, replicaTablets, aliases | ||
| } | ||
|
|
||
| // getServerID returns the server ID that the connection is currently routing to. | ||
| func getServerID(t *testing.T, conn *mysql.Conn) int64 { | ||
| t.Helper() | ||
|
|
||
| res, err := conn.ExecuteFetch("SELECT @@server_id", 1, false) | ||
mattlord marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| require.NoError(t, err) | ||
| require.Equal(t, 1, len(res.Rows), "expected one row from server_id query") | ||
|
|
||
| serverID, err := res.Rows[0][0].ToInt64() | ||
| require.NoError(t, err) | ||
|
|
||
| return serverID | ||
| } | ||
|
|
||
| // createSessionConnections creates `n` connections that route to different tablets. | ||
| // Returns a map of mysql.Conn -> serverID. | ||
| func createSessionConnections(t *testing.T, vtParams *mysql.ConnParams, numConnections int) map[*mysql.Conn]int64 { | ||
| t.Helper() | ||
|
|
||
| conns := make(map[*mysql.Conn]int64) | ||
| seenServerIDs := make(map[int64]bool) | ||
|
|
||
| // Try up to 50 times to get numConnections with different server IDs | ||
| for range 50 { | ||
| conn, err := mysql.Connect(context.Background(), vtParams) | ||
| require.NoError(t, err) | ||
|
|
||
| _, err = conn.ExecuteFetch("USE @replica", 1, false) | ||
| require.NoError(t, err) | ||
|
|
||
| // Get the server ID this connection routes to | ||
| serverID := getServerID(t, conn) | ||
|
|
||
| // If this is a new tablet, keep the connection | ||
| if !seenServerIDs[serverID] { | ||
| seenServerIDs[serverID] = true | ||
| conns[conn] = serverID | ||
|
|
||
| // If we have enough connections, return | ||
| if len(conns) == numConnections { | ||
| return conns | ||
| } | ||
|
|
||
| continue | ||
| } | ||
|
|
||
| // Already seen this tablet, close and try again | ||
| conn.Close() | ||
| } | ||
|
|
||
| t.Fatalf("could not create %d connections with different tablets after 50 attempts, only got %d", numConnections, len(conns)) | ||
| return nil | ||
| } | ||
|
|
||
| // verifyStickiness validates whether the given connections remain connected to the same | ||
| // server `n` times in a row. | ||
| func verifyStickiness(t *testing.T, conns map[*mysql.Conn]int64, n uint) { | ||
| t.Helper() | ||
|
|
||
| for conn, expectedServerID := range conns { | ||
| for range n { | ||
| currentServerID := getServerID(t, conn) | ||
| require.Equal(t, expectedServerID, currentServerID, "Connection should stick to tablet %d, got %d", expectedServerID, currentServerID) | ||
| } | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't break the serializable guarantees of the new mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. This test case mimics the case a tablet goes down/goes out of serving, ensuring that the connections move over accordingly. The alternative is to error until the tablet is back, but that is likely less than ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we shouldn't return an error forever, we should only do so once. The client should know that their session — which provided serializable consistency — has been terminated and that they need to reconnect in order to get a new view with the same serializable property. We can't silently move their session to another replica that could be minutes behind the previous one IMO. Doesn't that defeat the purpose here, since you can never really count on the core property holding true and have to maintain checks and handling for going back in time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can ensure that the newly selected replica is at least at the same GTID snapshot/position that the old one was, then it would be fine (it's OK to go forward in time).
Or do you think I'm missing or misunderstanding something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope you're not missing anything, those are all fair points. I suppose it comes down to how much of a guarantee we'd like to make to clients. In my head at least, I was envisioning it being a large step up from the normal selection logic of a random tablet in the local cell, but not quite a 100% guarantee of consistency across the board. Even if we do stick with the approach I've already taken, we'd definitely need to clarify in the docs that in the case of tablets being added or removed, there is a possibility to break the consistency guarantee.
Do you feel strongly about "going all the way" in terms of consistency? I'll give it some deeper thought on my end as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that this feature came from your own experience? If so, what would you prefer?
As long as we clearly document it, I think it would be OK. It would also be something we could change later on.
Also may be worth bringing up with the community in the Vitess Slack to see what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a desire for this behavior at GitHub from Arthur and I, but I wouldn't quite say there was an explicit need for this, more of a nice to have. I'd imagine we would have initially sacrificed the additional consistency for the ease of adoption/less disruption that the "automatic connection shifting" would've afforded us. I think this can be a start, with eyes on guaranteeing more consistency if the need arises in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified the docs: vitessio/website@f207553
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At GitHub (and I'm sure at many other places), queries made outside of transactions are automatically retried a few times in case of a failure at the connection level.
This means that the application is actually unaware that a query failed, and due to the auto-reconnect that's happening, the consistency guarantee is already sacrificed for replica connections in these cases for better availability / transparent error handling.
A host going down for maintenance or unexpectedly becoming unavailable was a pretty rare occurrence when compared to the frequency of queries being executed.