-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add session algorithm to tablet balancer #19007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add `PickOpts` which allow balancers to accept options specific to their implementation. This allows the `Pick` signature not to get overly long as implementations and their options are added. Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
This reverts commit bd6a1b792d69b238df90b5bf979e0911087ada16. Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Update the signature of the wrapper func to accept a new `WrapOpts` struct, which currently contains `ExecuteOptions`, which now contains the session UUID so that it can be passed into `Pick` for the balancer. Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
| return false | ||
| }, 10*time.Millisecond, 1*time.Millisecond, "Connection should switch to a different tablet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't break the serializable guarantees of the new mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. This test case mimics the case a tablet goes down/goes out of serving, ensuring that the connections move over accordingly. The alternative is to error until the tablet is back, but that is likely less than ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we shouldn't return an error forever, we should only do so once. The client should know that their session — which provided serializable consistency — has been terminated and that they need to reconnect in order to get a new view with the same serializable property. We can't silently move their session to another replica that could be minutes behind the previous one IMO. Doesn't that defeat the purpose here, since you can never really count on the core property holding true and have to maintain checks and handling for going back in time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can ensure that the newly selected replica is at least at the same GTID snapshot/position that the old one was, then it would be fine (it's OK to go forward in time).
Or do you think I'm missing or misunderstanding something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope you're not missing anything, those are all fair points. I suppose it comes down to how much of a guarantee we'd like to make to clients. In my head at least, I was envisioning it being a large step up from the normal selection logic of a random tablet in the local cell, but not quite a 100% guarantee of consistency across the board. Even if we do stick with the approach I've already taken, we'd definitely need to clarify in the docs that in the case of tablets being added or removed, there is a possibility to break the consistency guarantee.
Do you feel strongly about "going all the way" in terms of consistency? I'll give it some deeper thought on my end as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that this feature came from your own experience? If so, what would you prefer?
As long as we clearly document it, I think it would be OK. It would also be something we could change later on.
Also may be worth bringing up with the community in the Vitess Slack to see what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a desire for this behavior at GitHub from Arthur and I, but I wouldn't quite say there was an explicit need for this, more of a nice to have. I'd imagine we would have initially sacrificed the additional consistency for the ease of adoption/less disruption that the "automatic connection shifting" would've afforded us. I think this can be a start, with eyes on guaranteeing more consistency if the need arises in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified the docs: vitessio/website@f207553
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At GitHub (and I'm sure at many other places), queries made outside of transactions are automatically retried a few times in case of a failure at the connection level.
This means that the application is actually unaware that a query failed, and due to the auto-reconnect that's happening, the consistency guarantee is already sacrificed for replica connections in these cases for better availability / transparent error handling.
A host going down for maintenance or unexpectedly becoming unavailable was a pretty rare occurrence when compared to the frequency of queries being executed.
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
mattlord
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can change the options as noted, that will get rid of a lot more code changes here. Let me know what you think. It's looking good so far!
go/vt/vtgate/balancer/balancer.go
Outdated
| // by a weighted random sample so that over time the system will achieve the | ||
| // desired balanced allocation. | ||
| func (b *flowBalancer) Pick(target *querypb.Target, tablets []*discovery.TabletHealth) *discovery.TabletHealth { | ||
| func (b *flowBalancer) Pick(target *querypb.Target, tablets []*discovery.TabletHealth, _ PickOpts) *discovery.TabletHealth { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use this pattern for these options?
Lines 297 to 344 in 749fe54
| // lockOptions configure a Lock call. lockOptions are set by the LockOption | |
| // values passed to the lock functions. | |
| type lockOptions struct { | |
| lockType LockType | |
| ttl time.Duration | |
| } | |
| // LockOption configures how we perform the locking operation. | |
| type LockOption interface { | |
| apply(*lockOptions) | |
| } | |
| // funcLockOption wraps a function that modifies lockOptions into an | |
| // implementation of the LockOption interface. | |
| type funcLockOption struct { | |
| f func(*lockOptions) | |
| } | |
| func (flo *funcLockOption) apply(lo *lockOptions) { | |
| flo.f(lo) | |
| } | |
| func newFuncLockOption(f func(*lockOptions)) *funcLockOption { | |
| return &funcLockOption{ | |
| f: f, | |
| } | |
| } | |
| // WithTTL allows you to specify how long the underlying topo server | |
| // implementation should hold the lock before releasing it — even if the caller | |
| // has not explicitly released it. This provides a way to override the global | |
| // ttl values that are set via --topo-consul-lock-session-ttl and | |
| // --topo-etcd-lease-ttl. | |
| // Note: This option is ignored by the ZooKeeper implementation as it does not | |
| // support TTLs. | |
| func WithTTL(ttl time.Duration) LockOption { | |
| return newFuncLockOption(func(o *lockOptions) { | |
| o.ttl = ttl | |
| }) | |
| } | |
| // WithType determines the type of lock we take. The options are defined | |
| // by the LockType type. | |
| func WithType(lt LockType) LockOption { | |
| return newFuncLockOption(func(o *lockOptions) { | |
| o.lockType = lt | |
| }) | |
| } |
Then these functions can take a variadic number of options and those that don't use any do not need to be modified. This is always preferable when passing options is not the norm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! I like the functional options pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in f63dcd6
Co-authored-by: Matt Lord <[email protected]> Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
go/vt/vtgate/balancer/options.go
Outdated
| @@ -0,0 +1,62 @@ | |||
| /* | |||
| Copyright 2024 The Vitess Authors. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New files should have the current year in the Copyright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Darn Claude 👊 . Fixed!
mattlord
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice work on this, @mhamza15 ! ❤️
| // Pick should prefer local cell | ||
| picked1 := b.Pick(target, tablets, WithSessionUUID("a")) | ||
| require.NotNil(t, picked1) | ||
| require.Equal(t, "local", picked1.Target.Cell) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth doing this in a loop 10 times or something, no? To better test the preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep good call, I did it in the end to end tests but it looks like I missed it in the unit tests.
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
|
Website docs: vitessio/website#2043 |
Signed-off-by: Mohamed Hamza <[email protected]>
|
@mhamza15 Does this PR need to be backported to 22.0 or was that label added by mistake? 😅 |
Signed-off-by: Mohamed Hamza <[email protected]>
|
📝 Documentation updates detected! Updated existing suggestion: Add changelog entry for session balancer mode |
vitessio#19007 updated the `QueryService` interface with a new `Session` interface that contains the session UUID and the existing `ExecuteOptions`. Since the `options` parameter was now redundant and can be pulled from the `Session` directly, it was removed. This caused a lot of unnecessary changes, but also introduced a race condition in `scatter_conn.go`: ```diff allErrors := stc.multiGoTransaction( // ... func(rs *srvtopo.ResolvedShard, i int, info *shardActionInfo) (*shardActionInfo, error) { // ... // Goroutine A reading session if session != nil && session.Session != nil { opts = session.Options } // Goroutine B writing session if opts == nil && fetchLastInsertID { opts = &querypb.ExecuteOptions{FetchLastInsertId: fetchLastInsertID} + session = econtext.NewSafeSession(&vtgatepb.Session{Options: opts}) } // ... } ) ``` This PR removes `GetOptions` from the new `Session` interface and adds back the `options` parameter to the `QueryService` interface. This allows each goroutine in that `scatter_conn.go` logic to keep its own local `opts` and not overwrite the shared `session`, reverting the logic to how it was before. Signed-off-by: Mohamed Hamza <[email protected]>
Description
This PR is a continuation of #18552. It's been slightly refactored to match the changes introduced in #18787.
This PR introduces a "session" tablet balancer, which will direct a session to the same tablet for the duration of the session. This protects against cases where a session reads a value from one tablet, and then gets routed to a different, higher-lag tablet and reads an older value.
The session balancer initially used consistent hashing, but was later switched to use rendezvous hashing for simplicity. The tradeoff there is that rendezvous hashing has an
O(n)runtime,nbeing the number of tablets per target (keyspace + shard + tablet type + cell). As the number of tablets per target is expected to be small, the performance should be reasonable.In order for the tablet balancer to have access to the session UUID, the
QueryServiceinterface was updated to include a newSessioninterface, which provides the session UUID and the execute options. The oldoptionsparameter was removed and is now pulled from the session. As a result, you'll see a lot of changes to all the implementations ofQueryServiceto fit the new methods, and to wrap execute options in sessions in cases where a session isn't already provided (mostly tests).Benchmark using sysbench
oltp_read_only(8 threads, 3 replicas per cell):Related Issue(s)
Checklist
Deployment Notes
AI Disclosure