Skip to content

Conversation

@teskje
Copy link
Contributor

@teskje teskje commented Dec 4, 2025

This PR removes a potentially expensive confirm_leadership call on the sequencing thread. The call needs to potentially block for a long time, as it waits for the persist catalog to catch up and then applies all the observed changes. During that time, the environment would be unresponsive to commands, so it seems like a good idea to get rid of the call.

Relevant Slack thread.

Motivation

  • This PR fixes a previously unreported bug.

We've been seeing slow group_commit_initiate messages lately. I hope this removes at least one cause of those.

Tips for reviewer

I don't have enough context to understand why the confirm_leadership call was necessary before, but the comment above it makes it seem fine to delete it now. Let's hope we can trust it 😅

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

This commit removes a potentially expensive `confirm_leadership` call on
the sequencing thread. The call needs to potentially block for a long
time, as it waits for the persist catalog to catch up and then applies
all the observed changes. During that time, the environment would be
unresponsive to commands, so it seems like a good idea to get rid of the
call.
@teskje
Copy link
Contributor Author

teskje commented Dec 4, 2025

Well, this won't be that simple. There is a scary zippy panic:

src/storage-controller/src/persist_handles.rs:365:17: cannot register [...] at 1764850133585 because txns is at 1764850133726

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant