Skip to content

Conversation

Wondertan
Copy link
Member

This PR shaves another 30s when we prune the whole chain. It achieves this by applying the same technique as in #328, but for reads. Previously, the underlying datastore would create a read-only transaction for every singular read. As shown in pyroscope profiles, this puts pressure on badger as creating new transactions ain't cheap and involves complex synchronization to ensure atomicity between parallel transactions.

With this insight in mind, I also integrated read batching in other places where this must be beneficial: advancing, receding, range, and singular header reads.

More noticeable changes:

  • Fixes a silent bug that wouldn't delete samples on the Pruner side
  • Fixes a bug in the upstream contextds change
  • Adds tracking for missing headers that were not found during deletion

Comment on lines +766 to +769
bds, ok := s.ds.Children()[0].(datastore.Batching)
if !ok {
return ctx, func() error { return nil }
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note for an advanced reader: we need to unwrap the underlying datastore from the keytransformer one, otherwise key transformation is going to happen twice in the datastore itself and the tx wrapper.

Basically, a combination of keytransformer and contextds may lead to unexpected and silent double transformation, which is annoying to debug. There might be a general way to untangle these on the library side, but this is the simplest solution on our side I could think of.

@Wondertan Wondertan merged commit 38be935 into main Aug 1, 2025
5 checks passed
@Wondertan Wondertan deleted the batch-reads branch August 1, 2025 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants