Skip to content

Conversation

Wondertan
Copy link
Member

Batch and Txs allow grouping multiple writes or reads together to optimize speed or provide atomicity. However, there is no way to benefit from this grouping when you work across or outside the datastore interface boundaries. Example:

  ctx := context.Background()
  ds := ds.NewMapDatastore()
  bs := blockstore.NewBlockstore(ds)

  // in this case, blocks are deleted independently, and there is no way to group them
  for _, key := range keys {
    bs.DeleteBlock(ctx, key)
  }

The context datastore solves the problem above, enabling the datastore underneath Blockstore to be wrapped. The wrapper then intercepts individual operations and reroutes them into the respective Read or Write. Example:

  ctx := context.Background()
  ds := contextds.WrapDatastore(ds.NewMapDatastore())
  bs := blockstore.NewBlockstore(ds)

  batch, _  := ds.Batch(ctx)
  ctx = contextds.WithWrite(ctx, batch)

  // even if blocks are deleted independently, they are still grouped in the same batch
  for _, key := range keys {
    bs.DeleteBlock(ctx, key)
  }
  
  Batch.Commit()

Copy link

welcome bot commented Jul 25, 2025

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions
    and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the
    rest) and in its best form. Follow the code contribution
    guidelines

    if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on
    any missing things and potentially assigning a reviewer for high
    priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

@Wondertan
Copy link
Member Author

Wondertan commented Jul 25, 2025

This change allowed us to save minutes(341.9s vs 129.4s) of deleting(pruning) chain data with the badger datastore.

@gammazero
Copy link
Contributor

gammazero commented Jul 29, 2025

This is a very nice way to generically handle using batching and transaction functionality of an underlying database. This could replace the PutMany method in blockstore.NewBlockstore. That would mean that NewBlockstore would not need to take a Batching datastore, and could just take a plain Datastore. This seems more flexible because it allows the same approach to supporting transactions.

@lidel Now we need to decide if we want to support the generaic approach only and get rid of PutMany and similar, keep both, or something else.

@Wondertan An alternative solution to your delete problem is to add a DeleteMany method to Blockstore and then batch delete using the Batching interface of the internal Datastore, similar to how blockstore.PutMany works.

@Wondertan
Copy link
Member Author

The DeleteMany wouldnt work outside of Blockstore. The blockstore is an example I gave to be familiar for shipyard folks, but its not exactly my case, so DeleteMany wouldnt be sufficient.

@Wondertan
Copy link
Member Author

I made a discovery that I am currently confused about. The performance improvement results from this change are wrong. The results are actually measured with hacky poc for this change. The final, supposed to be the production version in this PR, actually adds time overhead of ~20s compared to the baseline.

@Wondertan
Copy link
Member Author

Wondertan commented Jul 30, 2025

False alarm. I triple checked, and there was a bug on the context datastore usage side, which made this change effectively a noop. After the fix, the results are on par with the hacky version I referenced earlier.

Wondertan added a commit to celestiaorg/go-header that referenced this pull request Jul 31, 2025
Allows batch deleting across API boundaries. 

Depends on #320 
Depends on ipfs/go-datastore#238, but we
shouldn't block on it being released upstream. It should be fine to
temporarily live with the replace so rollout pruning on time

w/o: `{"from_height": 1, "to_height": 6500189, "took": 341.992917359}`
w/:  `{"from_height": 1, "to_height": 6500617, "took": 129.445211431}`
@gammazero gammazero requested review from aschmahmann and removed request for gammazero August 6, 2025 20:01
@gammazero
Copy link
Contributor

This is a modification to low-level interfaces that will require many potential changes. We need to take more time to consider the larger implications of this change. This is currently outside of the scope of our current maintenance workload agreed with the IPFS foundation in 2025, so we need to defer this until a later time. This can be discussed for inclusion in future dev grant work.

@gammazero gammazero marked this pull request as draft August 12, 2025 14:37
@gammazero gammazero added status/deferred Conscious decision to pause or backlog status/blocked Unable to be worked further until needs are met labels Aug 12, 2025
@Wondertan
Copy link
Member Author

Wondertan commented Aug 13, 2025

@gammazero, I am a bit confused. How is this change may require changes? Its an independent feature that doesn't have any connections with any existing code in kubo/boxo/rainbow/libp2p/etc. It doesn't hurt to include this and then later on IPFS representatives can decide if they want to integrate this feature. Meanwhile, merging and releasing this would allow us to remove the ugly replace in go.mod across our repositories. If you are worry about maintenance overhead, I can reassure you that you can ping me and I will there to help with whatever issue there is with this code. However, I highly doubt there will be unless, you actually put efforts into integrating this.

@gammazero
Copy link
Contributor

gammazero commented Aug 15, 2025

@Wondertan It is not that this change itself requires any changes, but about the potential consequences leading to changes. What I mean by this is that we would need to decide how to recommend that implementers use the new context datastore, particularly with other competing APIs like PutMany and such. Do we convert our existing code to use the context datastore so that we provide the preferred way of doing things? We just have not had time to consider the indirect implications of having the context datastore.

Here are some of the issues we have not had sufficient time to consider:

High level

  • Yes context wrappers are a pretty convenient way to add extensibility for anything that was going to have async or tracing behavior anyway (both of which tend to carry through contexts). It also really helps in minimizing changes across codebases (e.g. particularly for something super low level like go-datastore).
    • Note: This is one of the reasons why it'd be nice to move towards context-based sessions for data transfer
  • The downsides are generally that:
    • The set of extensibility options and what they can do is hard to track / document because they're everywhere
    • You have to worry a bit about clobbering (e.g. what happens if WithWrite gets called twice?)
  • There's some side awkwardness in Go in general where if you wrap a struct to layer on top of it you have to figure out which pieces to re-expose which makes things harder (e.g. datastores can have batches, but blockstores don't have batches, etc.)

Regarding the PR it batches reads + writes but leaves commits out of scope. Is this what everyone wants / expects?

Possible alternatives here look like:

  • Exposing batching everywhere up the stack (Get/Put/DeleteMany in Blockstore, Blockservice, DagService,, ....)
  • Let (perhaps even encourage with some docs or examples?) people use context hacks everywhere but don't put them in go-datastore
  • Propose changes to the context datastore that make it usable for a wider variety of situations
    • e.g. right now it only covers separating a few read operations from some write ones and then bundling those together. Also it hides any "feature" extensions on the base datastore (there are some wrapper tools and examples in go-datastore that might serve as example here)

Given your interest in this, I am going to put this back on our queue for reconsideration. However, I hope this gives a better explanation of why I was saying this was out of scope, at least for now.

@gammazero gammazero added need/triage Needs initial labeling and prioritization need/maintainer-input Needs input from the current maintainer(s) and removed status/deferred Conscious decision to pause or backlog labels Aug 15, 2025
@gammazero gammazero marked this pull request as ready for review August 15, 2025 08:40
@gammazero gammazero removed the need/maintainer-input Needs input from the current maintainer(s) label Aug 15, 2025
@guillaumemichel guillaumemichel requested a review from lidel August 19, 2025 14:37
@Wondertan
Copy link
Member Author

Hey @gammazero, appreciate the elaboration. Generally, I agree that there should be a single canonical way of approaching things, and in this case, having PutMany and the context datastore may add ambiguity. Still, in this particular case, I don't think blocking the merging of this PR until we figure out a single, consistent way of doing things is warranted, mainly because of how huge go-datastore ecosystem is. Even if we all agree on a single approach that covers all the use cases, migrating everyone is an endeavor and has to be done step-by-step, where the first step could be merging this.

What I can say for sure is that the current set of options within the repo does not enable the optimization this PR targets. and simply adding DeleteMany would not work either. While there were ways to solve this selfishly on our side, I decided to opt for a solution that benefits a broader go-datastore ecosystem. It may not be the perfect one, nor would I claim we reached a rough consensus on this being a single canonical solution, but it solves my problem and a large number of adjacent issues in a fairly flexible way, well tested, and production-ready.

My proposal is to extract the discussion on seeking the single canonical way for bathing things into a dedicated issue and unblock this PR. I am happy to add input on the use case we envision on our side.

P.S. One of many things I've learned from IPFS codebases is how to structure dependency graph coherently, so that replaces are never needed. Please, don't force a replace on us 🙏, we are drowning in them already for orthogonal reasons. I had one confused user today who couldn't figure out where to get context pkg in his project.

@gammazero
Copy link
Contributor

@aschmahmann We need to make a decision whether this is within the scope of code we want to maintain and support. We are currently not consuming this code ourselves (in kubo, boxo, rainbow, or IPFS cluster), so we need to consider our policy for accepting things like this into our codebase.

Maybe we should consider turning this into a draft PR until there is something in our software that uses this.

@Wondertan
Copy link
Member Author

Wondertan commented Sep 16, 2025

@aschmahmann, do you think we could bring clarity about next steps here soon? Should we assume that IPFS Shipyard no longer accepts external contributions? Please let us know.

@aschmahmann
Copy link
Contributor

@Wondertan apologies for not getting to this earlier.

Should we assume that IPFS Shipyard no longer accepts external contributions?

No, we accept contributions across the various repositories we maintain


As for this PR in particular, my understanding is the objections to including it are:

  1. You are the only one using it
  2. The only benefit in including it in go-datastore vs elsewhere is that perhaps someone else might discover it. i.e. "Meanwhile, merging and releasing this would allow us to remove the ugly replace in go.mod across our repositories." and "Please, don't force a replace on us 🙏" is not necessary this code could just live in your codebase with no negative ramifications. If there had been some feature, optionality knob, etc. here that would enable you to implement something in your codebase that's a different story.
  3. We can see from the set of "helpful utility datastores" in this repository that the majority of them that were envisioned as "lots of people will probably use this" have not in fact ended up this way (you can see this via github search). This includes mount, delayed, retrystore, etc. so adding more in doesn't seem super valuable unless there are actually multiple parties that want to use it (i.e. if Celestia and any other party were interested and wanted to use go-datastore as a place to add docs/bugfixes/etc. that'd probably be fine).

All this being said you've been around the ecosystem enough that if you feel the objections above are unwarranted and this would be useful to people we're fine to take it and see how it plays out. Lmk if you'd still like us to merge and we'll do it.

@Wondertan
Copy link
Member Author

Yes, we would appreciate a merge here 🙏! Thank you @aschmahmann.

If this datastore causes any form of issue or brings maintenance overhead, don't hesitate to ping me on GH and we will handle it.

@gammazero gammazero merged commit 1ce683b into ipfs:master Sep 18, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization status/blocked Unable to be worked further until needs are met
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants