-
Notifications
You must be signed in to change notification settings - Fork 140
CBG-4747: remove serialisation on critical path #7873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR removes the pre-computed serialization field from the channels.ID struct to improve performance on the critical path. Instead of serializing channel IDs eagerly during creation, the serialization now happens lazily when the String() method is called (typically during logging operations).
Key Changes:
- Removed the
serializationfield from theIDstruct - Updated
String()method to compute serialization on-demand
channels/set.go
Outdated
| var sb strings.Builder | ||
| sb.WriteString(strconv.FormatUint(uint64(c.CollectionID), 10)) | ||
| sb.WriteByte('.') | ||
| sb.WriteString(base.UserDataPrefix) | ||
| sb.WriteString(c.Name) | ||
| sb.WriteString(base.UserDataSuffix) | ||
| return sb.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted by a sync.Once here as well, there's a bit more storage here at the expense of speed.
sync.Once will allocate 1 bool, 1 sync.Mutex to run
It isn't just in logging that the channel id is used, we use this for lookup when you call channelCacheImpl.channelCaches.Get
I think testing is the way to find this, but I'd be surprised if this didn't perform worse in your tests, and if it doesn't perform worse then this probably isn't in a critical path.
One thing I'd do is turn off all logging in your test environment and see when the string function is actually called to see where it is used.
The reason this serialization format exists is when the channel cache was extended to be collection aware a channel A had to now be distinguished in different collections. We wanted to share a single cache so that the cache size wouldn't ballon per collection, but the lookup keys to the cache are all strings, thus the serialization.
Ultimately the question is whether there actually ever is a case where String() isn't called, because I'm not sure there would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the channel ID struct is used to key channel caches for lookup but if we remove the serialisation this will be removed from the keys. I am sure that channel name and collection id on the struct is enough to keep each channel cache distinct/unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From some discussion offline, I think that if we are constructing a channels.ID then channels.ID.String() is always going to be called, notably in the Notify function.
sync_gateway/db/change_listener.go
Line 254 in 9722cde
| listener.keyCounts[key.String()] = listener.counter |
Notify, I did not look exhaustively.
The reason that it is serialized to a string is that the change waiters use a string based notification system which was written before collections. To add collections support, this string based serialization of collection ID is only needed for the change waiters #5815 (comment)
For the case of logging, we actually could have logging not log the serialization. I think in all cases for logging we have (or should have) a collection ID
sync_gateway/db/change_cache.go
Line 364 in 9722cde
| ctx = collection.AddCollectionContext(ctx) |
Any or all of these options can be used:
- We could use
String()for logging and only log the channel name without doing an int to ascii conversion. This might even be preferable anyway for log readbility reasons since 1:chanName doesn't have meaning since1represents the collection ID which isn't easy to translate back to a real collection name. We have/should have the collection name on the log context. - Continue to use serialization 1:chanName for the change waiter, and change the types for the changeWaiter to be something like
channelKeywhich is a string (but you could imagine potentially changing to a struct later). - Keep the serialization but do it under sync.Once so it is delayed until needed.
- Make sure that the construction of channel serialization is done outside holding any mutexes (maybe delay until
Notify.
I'm also OK pushing changes, watching what happens in a perf run, then keeping/reverting the change.
CBG-4747
Pre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/apiDependencies (if applicable)
Integration Tests
GSI=true,xattrs=truehttps://jenkins.sgwdev.com/job/SyncGatewayIntegration/0000/