Skip to content

Conversation

ZenGround0
Copy link
Collaborator

Backporting the indexing and ipni tasks from mkv2 to the pdpv0 branch.

Code is at least not obviously broken but testing is a work in progress.

@rjan90 rjan90 added the team/fs-wg Items being worked on or tracked by the "FS Working Group". See FilOzone/github-mgmt #10 label Oct 1, 2025
@FilOzzy FilOzzy added this to FS Oct 1, 2025
@github-project-automation github-project-automation bot moved this to 📌 Triage in FS Oct 1, 2025
@rjan90 rjan90 moved this from 📌 Triage to ⌨️ In Progress in FS Oct 1, 2025
@rjan90 rjan90 requested a review from LexLuthr October 1, 2025 15:02
Copy link
Contributor

@LexLuthr LexLuthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some DB changes are required to make this work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This task will only produce IPFS style ads. So, there would be no IPNI lookup path for pieces. Which was a requirement confirmed by Jen and Miro both in Dubai.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 That's fine for now. I just want to get ipfs adds working first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

piece CIDs into IPNI is lower priority; if it can be bundled into the same work without too much effort then that's great, but it doesn't buy us very much so it can be treated as a stretch goal

@BigLep
Copy link
Member

BigLep commented Oct 6, 2025

@ZenGround0 : will https://github.com/filecoin-project/curio/blob/feat/index-and-ipni/deps/config/types.go#L116 be updated to include filecoinpin.contact, or will that be somewhere else?

IPNI: IPNIConfig{
ServiceURL: []string{"https://cid.contact"},
DirectAnnounceURLs: []string{"https://cid.contact/ingest/announce"},
DirectAnnounceURLs: []string{"https://cid.contact/ingest/announce", "https://filecoinpin.contact/announce"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the implications of having both in here? I'm guessing it's inviting both of them to hit us? a failure trying to talk to one won't stop us from trying the other?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is common config. So updating here in not an option without making it common between poRep and PDP and publishing all existing PoRep chain to Filecoin pin. I plan to create a tool for basic PDP stuff in Curio, like a init after Curio init or maybe part of guided-setup. We can use that to insert values for PDP only clusters. A more cleaner approach in my opinion and allows for much more complex config changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concretely having both just means we tell both indexers about our latest IPNI head when publishing advertisements. Then I assume both will hit us when looking for retrieval endpoints.

a failure trying to talk to one won't stop us from trying the other?

From reading libipni code and observing a failure to propagate to filecoinpin.contact and success with cid.contact I'm confident that the answer is yes.

A more cleaner approach in my opinion and allows for much more complex config changes.

I don't like this either, I'd rather have people put this into an ipni layer, but I've been told repeatedly to do this for now. We'll happily get rid of it when we go full main branch.

Copy link
Member

@BigLep BigLep Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is common config

Correct me if I'm wrong here @rvagg, but I think that is fine for now. I believe we are assuming someone who is participating in Filecoin Onchain Cloud at this point to be doing so with a Curio instance that is strictly for PDP. It would be unwise to also be doing mainnet PoRep with this instance. So for now, if this common place is the easiest way to specify multiple indexers, lets do it.

what are the implications of having both in here? I'm guessing it's inviting both of them to hit us? a failure trying to talk to one won't stop us from trying the other?

These are good questions. I assume @ZenGround0 or @LexLuthr can answer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(doh, sorry, I typed before seeing @ZenGround0. The key part is answered, in that the curio node us operationally ok with listing two indexers, even if one fails or is unresponsive).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries, classic collision

order_number BIGSERIAL PRIMARY KEY, -- Unique increasing order number
ad_cid TEXT NOT NULL,
context_id BYTEA NOT NULL, -- abi.PieceInfo in Curio
context_id BYTEA NOT NULL, -- abi.PieceInfo || PDPIPNIContext in Curio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Spark squealing in the distance ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already discussed with Spark about this. They agreed with the new contextID.

indexStore *indexstore.IndexStore
sc *chunker.ServeChunker
keys map[string]*peerInfo // map[peerID String]Private_Key
latest map[string]cid.Cid // map[peerID String]last published head, used to avoid duplicate announce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes in here wrt latest look like bugfixes, or is it just the fact that 30 seconds now make this task more onerous so it's worth being spare with what we provide?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes its a blend of both. It would have been good to do this with 10 minute interval and non-real time constraints but now its become important.

}

// Create a warm storage service viewer.
warmStorageService, err := NewFilecoinWarmStorageServiceStateView(metadataAddr, ethClient)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should not pull in all of warm storage here, have a look at pdp/contract/ListenerServiceWithViewContract.abi and just do the same with FWSS, call it ListenerServiceWithMetadata and then in that file just have the function you need:

[
  {
    "type": "function",
    "name": "getDataSetMetadata",
    "inputs": [
      {
        "name": "dataSetId",
        "type": "uint256",
        "internalType": "uint256"
      },
      {
        "name": "key",
        "type": "string",
        "internalType": "string"
      }
    ],
    "outputs": [
      {
        "name": "exists",
        "type": "bool",
        "internalType": "bool"
      },
      {
        "name": "value",
        "type": "string",
        "internalType": "string"
      }
    ],
    "stateMutability": "view"
  }
]

generate that, then do a NewListenerServiceWithViewContract(metadataAddr, ethClient) and if that fails, just silently return false, "", nil.

one of the goals of ListenerServiceWithViewContract is to keep Curio relatively agnostic of FWSS, hence the err == nil fall-through if it exists. We should continue that and make it happy if it finds it's talking to something that doesn't have a metadata key. Maybe an info log would be good at that point, but I don't think we need to be strict about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I was just being lazy but this is obviously the right thing to do. I hadn't realized it was clean to do this without publishing a whole new contract interface in pdp repo and it sounds like maybe I can avoid that.

Comment on lines +993 to +994
// Hard to differenctiate between unsupported listener type OR internal error
// So we log on debug and skip indexing attempt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what I think you should move up into GetDataSetMetadataAtKey - we can differentiate so let's just do it there

if err != nil {
return false, xerrors.Errorf("parsing announce address domain: %w", err)
}
if build.BuildType != build.BuildMainnet && build.BuildType != build.BuildCalibnet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sad .. I've been using non-TLS on calibnet for testing, but I guess we don't have a way of determining whether we should or shouldn't have an https announce address in the config. I was wondering whether I could get away with testing with IPNI without having to sort out certificates but I think this answers it.

PieceCID cid.Cid

// Payload determines if the IPNI ad is TransportFilecoinPieceHttp or TransportIpfsGatewayHttp
Payload bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who do we expect to consume it this way @LexLuthr? are we still designing for Spark making sense of our contexts? because PieceCID is going to be different for payload vs piece anyway so it should be unique as it is without having to add the additional uniqueness of the bool. Or are we expecting someone to decode this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple reasons:

  1. Early on I was told, we should be able to publish IPFS style and piece multihash only ads for each piece in PDP. So, pieceCID will be same in that case. I am not sure what you mean by "PieceCID is going to be different for payload vs piece anyway"
  2. If we decide to some kind of checker stuff down the line (Spark), this allows decoding and making sense. Better safe then sorry as it will require another ad chain migration which is not cheap.

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing fatal in here, this seems pretty good; but I've left a bunch of inline suggestions, giving approval in the assumption that those will be reasonably considered at least

@github-project-automation github-project-automation bot moved this from ⌨️ In Progress to ✔️ Approved by reviewer in FS Oct 8, 2025
@ZenGround0 ZenGround0 marked this pull request as ready for review October 8, 2025 18:48
@ZenGround0 ZenGround0 requested a review from a team as a code owner October 8, 2025 18:48
CIDGravityTokens: []string{},
},
IPNI: IPNIConfig{
ServiceURL: []string{"https://cid.contact"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZenGround0 : I don't know what this does, but should we also add filecoinpin.contact?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From curio docs this is a curio webUI thing and is not critical for data flowing from place to place. I won't change any defaults here, if curio operators are seeing issues with their view on advertisements they can easily make a config layer including filecoinpin.contact or others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/fs-wg Items being worked on or tracked by the "FS Working Group". See FilOzone/github-mgmt #10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FS: IPNI advertisements for CIDs in PDP pieces in the "PDP node w/ mkv1" branch
5 participants