Skip to content

Conversation

SgtPooki
Copy link
Collaborator

@SgtPooki SgtPooki commented Sep 26, 2025

Get Pieces from PDPVerifier Contract

Overview

This PR adds the ability to fetch pieces directly from the PDPVerifier contract instead of relying on Curio's API. This provides the authoritative source of truth for what pieces exist in a data set. See #249 (comment) for details

Changes

New Methods

  • PDPVerifier.getActivePieces(dataSetId, options) - Low-level contract query with pagination
  • StorageContext.getPieces(options) - Async generator that yields [PieceCID, pieceId] tuples

Updated Methods

  • StorageContext.getDataSetPieces() - DEPRECATED - Now uses getPieces() internally

Migration

// Before
const pieces = await context.getDataSetPieces()

// After
const pieces = []
for await (const [pieceCid] of context.getPieces()) {
  pieces.push(pieceCid)
}

Notes

  • Fetches from PDPVerifier contract (source of truth) instead of Curio
  • Generator provides lazy evaluation for large data sets
  • pieceId included in tuple for operations like deletion
  • No breaking changes - deprecated method still works

@github-project-automation github-project-automation bot moved this to 📌 Triage in FS Sep 26, 2025
@rvagg
Copy link
Collaborator

rvagg commented Sep 26, 2025

Leaf count is nice, and useful for total data set if we can get it, but at the piece level we should have it in the Piece CID because we're using Piece CID v2. The only catch is that the way we currently get the piece list for a data set is a little broken and will be returning the wrong size (but the right multihash).

There's two separate issues to resolve here:

  1. That Curio bug in our pdpv0 branch needs to be fixed regardless so it returns the Piece CID as it should be (I had a brief look this week and it wasn't obvious where the counting was wrong, I'm slightly worried it's recording the wrong size on the way in to the database rather than coming out. (For now I'd just pretend they are correct and use them with the understanding that we're going to fix this).
  2. We really shouldn't be asking Curio for the piece list for our data set in the first place. PDPVerifier has the piece list, we just need an accessor for it, it'll probably have to be paginated to account for very large data sets but this is the best version here so we go to the chain, which we can trust, not Curio, which we shouldn't. Then we have the v2 CIDs and we can decode them to get the size (there should be some code in Synapse to help with this in piece.ts, or at least it should point to how to do it if someone wants to give it a go).

Total leaf count for a whole data set though would be useful too if we can get that off the chain.

@rjan90 rjan90 moved this from 📌 Triage to 🔎 Awaiting review in FS Sep 29, 2025
@BigLep
Copy link
Contributor

BigLep commented Sep 29, 2025

My understanding of the situation is that the ball is in @SgtPooki court to:

  1. Get leaf count from the PieceCIDv2 (just assume it's correct indepenendent of Curio bug)
  2. Update the PDPVerifier wrapper in synapse-sdk to expose getActivePieces from the PDPVerifier contract. For now, synapse-sdk would expose a getAllActivePieces which then walks the PDPVerifier.getActivePieces until hasMore=false.

Copy link
Collaborator Author

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self review.. some comments pointing reviewers to places where i'm not sure about things

* @returns The number of leaves for this piece
*/
async getPieceLeafCount(dataSetId: number, pieceId: number): Promise<number> {
// TODO: DO we need to call the contract for leaf count?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callout here.. not sure if piece.ts leafCount calculation is enough?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok, but I don't think it's all that useful here; let's remove it for now and ignore leaf counts -- they mostly shouldn't be a concern to the user other than their fairly close relationship to size

Comment on lines 1348 to 1369
// Parse the piece data as a PieceCID
// The contract stores the full PieceCID multihash digest (including height and padding)
// The data comes as a hex string from ethers, we need to decode it as bytes then as a CID
const pieceDataHex = result.pieces[i].data
const pieceDataBytes = ethers.getBytes(pieceDataHex)

const cid = CID.decode(pieceDataBytes)
const pieceCid = asPieceCID(cid)
if (!pieceCid) {
throw createError(
'StorageContext',
'getAllActivePiecesGenerator',
`Invalid PieceCID returned from contract for piece ${result.pieceIds[i]}`
)
}

yield {
pieceId: result.pieceIds[i],
pieceCid,
subPieceCid: pieceCid,
subPieceOffset: 0, // TODO: figure out how to get the sub piece offset
} satisfies DataSetPieceData
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: a core contributor should really eyeball this to make sure i'm doing things properly

Comment on lines 264 to 269
// Expected leaf count is 2^height where height is calculated from size
const expectedHeight = Size.Unpadded.toHeight(BigInt(size))
const expectedLeafCount = 2 ** expectedHeight

assert.isNotNull(leafCount)
assert.strictEqual(leafCount, expectedLeafCount)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this accurate?

Comment on lines 290 to 293
// Expected raw size is leaf count * 32
const expectedHeight = Size.Unpadded.toHeight(BigInt(size))
const expectedLeafCount = 2 ** expectedHeight
const expectedRawSize = expectedLeafCount * 32
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this accurate?

@@ -4028,4 +4029,423 @@ describe('StorageService', () => {
assert.isUndefined(status.pieceId)
})
})

describe('getAllActivePieces', () => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of tests and mocking setup in here. I tried following existing patterns, but some mock helpers would be nice. especially for contract calls. Something like this would be amazing:

const mockContractContext = createMockContractContext()
mockContractContext.mock(key, singleCallResponse) // key maps to transaction data prefix?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synapse.test.ts has a new mock system that we are trying to move towards

rawSize,
leafCount,
subPieceCid: piece.pieceCid,
subPieceOffset: 0, // TODO: figure out how to get the sub piece offset
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do I fulfill this value accurately?

@hugomrdias
Copy link
Collaborator

Couple of comments here:

  • i dont think we should have any get all method in the sdk that hammers a contract, developer should be able to paginated to their liking and not be forced into the full list of pieces.
  • im not sure if we need to have this in the core sdk seems very filecoin-pin specific

Anyway i dont feel like i can fully review this properly, we probably need rod to answers some of the question you added

@SgtPooki
Copy link
Collaborator Author

SgtPooki commented Oct 6, 2025

i dont think we should have any get all method in the sdk that hammers a contract, developer should be able to paginated to their liking and not be forced into the full list of pieces.

That makes sense. Can remove.

im not sure if we need to have this in the core sdk seems very filecoin-pin specific

I'm guessing you're talking about the getAllActivePiecesGenerator, getPiecesWithDetails, and getAllActivePieces methods?

We still need some methods exposed from synapse-sdk that aren't currently. I'm all ears for what we want to handle in the sdk and can handle other in filecoin-pin..

I think we should at least export these methods:

pdp/verifier.ts: getActivePieces, getPieceLeafCount
piece/piece.ts: getLeafCount, getRawSize

Will wait for update from @rvagg before doing anything else

* @param pieceCid - The PieceCID to extract raw size from
* @returns The raw size in bytes or null if invalid
*/
export function getRawSize(pieceCid: PieceCID | CID | string): number | null {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite right because height only gets us the padded size, we need to unpad using the padding that's also encoded. See #283

Comment on lines 1274 to 1275
// TODO: should we call the contract for leaf count? i.e. pdpVerifier.getPieceLeafCount(this._dataSetId, piece.pieceId)
const leafCount = getLeafCount(piece.pieceCid) ?? 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's ditch this

@rvagg
Copy link
Collaborator

rvagg commented Oct 7, 2025

OK, here's my current thoughts about this:

  1. I really like that you're fetching the pieces from the contract, I've been wanting to get rid of the query to the SP to get the pieces (I thought I had an issue for this somewhere?). It's not trustworthy, we should be getting it off the chain like you're doing here.
  2. Leaves are a problem and we should ignore them wherever possible I think. Leaves are just the raw size rounded up to the nearest 32 because that's the proving unit. But our actual sizes don't have to be a multiple of 32. Now that I'm looking at PDPVerifier I can see that the rawSizes it returns is the leaf count multiplied by 32, i.e. actual raw size rounded up to the nearest 32, which is wrong. I've opened an issue about this @ Fix & clarify all uses of "rawSize" pdp#217, but we can just ignore it entirely for our purposes.
  3. I think that we could just encourage use of feat: getSizeFromPieceCID(cid) to extract size from PieceCIDv2 #283 to devs to work this out themselves from a CID and not bother augmenting here with sizes.

Which leaves us with just "get pieces from contract". So I think we should pivot here slightly, so here's my suggestion:

  • Leave getActivePieces as the heart of this in pdp/verifier.ts, but don't return rawSizes from there, it's not helpful
  • Rename getAllActivePiecesGenerator to just getPieces, it can return an async generator, that's just how you list them, it's the new getDataSetPieces, but just return AsyncGenerator<PieceCID> - i.e. no need to do anything else but query the contract, yield CIDs and keep going as long as you're asked to.
  • Let's change getDataSetPieces - the name is bad, so let's @deprecate that method and replace the guts of it with what you have for getAllActivePieces, leaving the API stable (no additional args, just use defaults), so it uses getAllActivePiecesGenerator to do a collect-all. Eventually we'll remove this and let the user do collection according to their needs.
  • Remove everything else

Then we just consume PieceCIDs in Filecoin Pin, use https://github.com/FilOzone/synapse-sdk/pull/283 over there to get the sizes we want and encourage devs to use that pattern. We probably should document all of this too.

@rvagg
Copy link
Collaborator

rvagg commented Oct 8, 2025

I've just realised that we also need pieceId along with the CIDs, so our getPieces should return an async generator of [PieceCID, number] or { cid: PieceCid, id: number }. We'll just need to document that the number is important for certain other operations.

@rvagg
Copy link
Collaborator

rvagg commented Oct 14, 2025

getSizeFromPieceCID now available as a util in pieces.ts

@SgtPooki
Copy link
Collaborator Author

working on updating this PR now

Copy link

cloudflare-workers-and-pages bot commented Oct 20, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
synapse-dev 732a91f Commit Preview URL

Branch Preview URL
Oct 20 2025, 02:50 PM

@SgtPooki SgtPooki changed the title feat: get piece leaf count fix: get all active pieces for a dataset Oct 20, 2025
@SgtPooki SgtPooki changed the title fix: get all active pieces for a dataset fix: get pieces from contract instead of pdpServer Oct 20, 2025
@SgtPooki SgtPooki requested review from hugomrdias and rvagg October 20, 2025 14:49
@SgtPooki
Copy link
Collaborator Author

it looks like the failing tests are unrelated:

2:49:02 PM [vite] (ssr) Error when evaluating SSR module D:\a\synapse-sdk\synapse-sdk\docs\astro.config.mjs: localStorage.getItem is not a function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🔎 Awaiting review

Development

Successfully merging this pull request may close these issues.

4 participants