Skip to content

Conversation

@gregns1
Copy link
Contributor

@gregns1 gregns1 commented Nov 11, 2025

CBG-4747

  • Remove serialisation on channel ID struct, pay for it at logging time
  • This is to get some perf results form Dragos on how this looks

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Dependencies (if applicable)

  • Link upstream PRs
  • Update Go module dependencies when merged

Integration Tests

Copilot AI review requested due to automatic review settings November 11, 2025 13:28
@gregns1 gregns1 requested review from torcolvin and removed request for Copilot November 11, 2025 13:29
Copilot AI review requested due to automatic review settings November 11, 2025 13:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the pre-computed serialization field from the channels.ID struct to improve performance on the critical path. Instead of serializing channel IDs eagerly during creation, the serialization now happens lazily when the String() method is called (typically during logging operations).

Key Changes:

  • Removed the serialization field from the ID struct
  • Updated String() method to compute serialization on-demand

channels/set.go Outdated
Comment on lines 40 to 46
var sb strings.Builder
sb.WriteString(strconv.FormatUint(uint64(c.CollectionID), 10))
sb.WriteByte('.')
sb.WriteString(base.UserDataPrefix)
sb.WriteString(c.Name)
sb.WriteString(base.UserDataSuffix)
return sb.String()
Copy link
Collaborator

@torcolvin torcolvin Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted by a sync.Once here as well, there's a bit more storage here at the expense of speed.

sync.Once will allocate 1 bool, 1 sync.Mutex to run

It isn't just in logging that the channel id is used, we use this for lookup when you call channelCacheImpl.channelCaches.Get

I think testing is the way to find this, but I'd be surprised if this didn't perform worse in your tests, and if it doesn't perform worse then this probably isn't in a critical path.

One thing I'd do is turn off all logging in your test environment and see when the string function is actually called to see where it is used.

The reason this serialization format exists is when the channel cache was extended to be collection aware a channel A had to now be distinguished in different collections. We wanted to share a single cache so that the cache size wouldn't ballon per collection, but the lookup keys to the cache are all strings, thus the serialization.

Ultimately the question is whether there actually ever is a case where String() isn't called, because I'm not sure there would be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the channel ID struct is used to key channel caches for lookup but if we remove the serialisation this will be removed from the keys. I am sure that channel name and collection id on the struct is enough to keep each channel cache distinct/unique.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some discussion offline, I think that if we are constructing a channels.ID then channels.ID.String() is always going to be called, notably in the Notify function.

listener.keyCounts[key.String()] = listener.counter
It could be possible that there are places this is called without Notify, I did not look exhaustively.

The reason that it is serialized to a string is that the change waiters use a string based notification system which was written before collections. To add collections support, this string based serialization of collection ID is only needed for the change waiters #5815 (comment)

For the case of logging, we actually could have logging not log the serialization. I think in all cases for logging we have (or should have) a collection ID

ctx = collection.AddCollectionContext(ctx)
.

Any or all of these options can be used:

  • We could use String() for logging and only log the channel name without doing an int to ascii conversion. This might even be preferable anyway for log readbility reasons since 1:chanName doesn't have meaning since 1 represents the collection ID which isn't easy to translate back to a real collection name. We have/should have the collection name on the log context.
  • Continue to use serialization 1:chanName for the change waiter, and change the types for the changeWaiter to be something like channelKey which is a string (but you could imagine potentially changing to a struct later).
  • Keep the serialization but do it under sync.Once so it is delayed until needed.
  • Make sure that the construction of channel serialization is done outside holding any mutexes (maybe delay until Notify.

I'm also OK pushing changes, watching what happens in a perf run, then keeping/reverting the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants