Skip to content

Conversation

@wjmelements
Copy link
Contributor

@wjmelements wjmelements commented Oct 20, 2025

Reviewer @rvagg
Closes #307
This significantly reduces the size of ServiceProviderRegistry: 21,290 -> 17,751 (-3539)

Motivation

We want to allow all product attributes to be queryable on-chain. Therefore we are removing the encoded productData. Mandatory schema will be loosely enforced on-chain with a bloom filter.

Upgrade guide for synapse and curio

  1. All productInfo previously abi-encoded is removed and everything is now capability key-value store. Keys are string; values are bytes. This affects registerProvider, addProduct, and updateProduct. Unsigned integers should be encoded big-endian. Addresses should be encoded as bytes[20]. Strings should be encoded utf-8. See the examples in test/PDPOffering.sol. We don't validate these values on-chain so if they don't look right, throw.
  2. getPDPOffering is not how you get PDP product info anymore. Now use getAllProductCapabilities which will return all keys and values for a product.
  3. The ProviderWithProduct return type now contains productCapabilityValues so you don't have to fetch them separately.
  4. getProductCapabilities no longer returns the exists bool array. This array was misleading because it only indicated whether the bytes were empty.
  5. getProductCapability is removed. Use productCapabilities instead, which only differs in not having exists.
  6. updatePDPServiceWithCapabilities is removed. Use updateProduct.
  7. Capability values cannot be empty. Exclude the key to signal that the product does not have the capability.
  8. getProvidersByProductType and getActiveProvidersByProductType are merged into one method, getProvidersByProductType, which now has a boolean flag parameter onlyActive.
  9. getProduct is replaced by getProviderWithProduct, which returns ProviderWithProduct
  10. storagePricePerTibPerMonth is now storagePricePerTibPerDay

Changes

  • define and configure BloomSet with k=16
  • move PDPOffering struct to a testing helper library
  • redefine required schema as BloomSet16
  • only enforce required schema probabilistically
  • ensure synapse has a good method for fetching these keys all at once
  • remove misleading exists
  • remove ipni capability flags from required schema
  • fix tests
  • add and test BigEndian helper library for encoding and decoding integers

@wjmelements wjmelements requested a review from rvagg October 20, 2025 22:34
@FilOzzy FilOzzy added this to FS Oct 20, 2025
@github-project-automation github-project-automation bot moved this to 📌 Triage in FS Oct 20, 2025
@wjmelements
Copy link
Contributor Author

wjmelements commented Oct 23, 2025

I'm changing the type of capabilities values to bytes.

Also I have noticed that getProductCapabilities and getProductCapability use bytes length for exists, which is unhelpful and wrong, because value length can be 0. Existence is actually determined by membership in the capability keys array. In fact have been using some empty values in some places.

            bytes memory value = capabilities[keys[i]];
            if (value.length > 0) {
                exists[i] = true;
                values[i] = value;
            }

@wjmelements
Copy link
Contributor Author

wjmelements commented Oct 23, 2025

Remaining tasks:

  • view helper for fetching all of the key-values for a product
  • blackbox testing of bloom filter failure case

Open questions:

  • can we make some of the schema fields optional such as the ipni flags?
  • can we remove the exists bools since they are only reporting whether the returned bytes are empty?

@rvagg
Copy link
Collaborator

rvagg commented Oct 23, 2025

I'm fine with bytes, my main concern with bytes has always been:

  1. Ease of decoding in off-chain tooling - SDK, cast, etc. But I think we have the tools we need in those places to do decoding ..?
  2. Consumption by subgraphs - does it make it harder to consume, display and make assumptions about these fields if they are bytes? If someone puts non-utf8 in here, what does a subgraph do and is it disruptive?

peerId is one case of wanting bytes
others where we expect strings, we just need to do a bit more work on the client side to validate, and I'm fine with that but we should document these expectations really well -- you're essentially making a schema with this filter so you should document it very clearly what a client can expect and what a client should do

@wjmelements wjmelements changed the title perf(ServiceProviderRegistry): Bloom Schema perf!(ServiceProviderRegistry): Bloom Schema Oct 23, 2025
@wjmelements wjmelements marked this pull request as ready for review October 23, 2025 21:31
view
providerExists(providerId)
returns (bytes memory productData, string[] memory capabilityKeys, bool isActive)
returns (string[] memory capabilityKeys, bool isActive)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just return the keys and values here and be done with it? this is one of the most awkward spots - if we don't use key-existence as a signal then we're always going to want the values too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key existence is a signal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we just return ProviderWithProduct here? so this is the single version of getProvidersByProductType

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key existence is a signal

you're the one arguing to make value necessary even for booleans; in that world I can't think of a case where just getting the keys is useful to me, I just want all of it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you still don't understand. If a capability is a boolean, its existence is sufficient. But that existence cannot be the signaled with the empty string because it is indistinguishable when doing a single key lookup in a solidity mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a key's absence unambiguously signals the capability is not supported, then the key's presence can unambiguously signal the capability is supported. Any nonzero length is thus truthy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need many of these methods. I agree they aren't useful on or off chain. I will check how we are using them tomorrow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a key's absence unambiguously signals the capability is not supported, then the key's presence can unambiguously signal the capability is supported. Any nonzero length is thus truthy.

This is what I've been arguing for here and which is why I want an exists boolean return any time I want to ask for a specific key. I don't want to have to put a value in the value map, I just want to know the key exists and then not care about the value, and to work around the limitations of not having a null or non-zero sentinel in solidity. But we agree that the current implementation of doing that is broken - it should do it properly by iterating over keys that it has and figuring out whether it exists or not. But I also now think we can just do away with that entirely. There may be a case for "tell me the value for this key" or "tell me if you have this key", but with the way this is shaping up, I think all I really ever want out of this is be able to get the full product, keys and values, and deal with it on the client side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case for querying a single key is for onchain lookup. It should not loop over all of the keys to do that.

@rvagg
Copy link
Collaborator

rvagg commented Oct 24, 2025

getActiveProvidersByProductType -> let's just make an onlyActive bool argument to getProvidersByProductType and ditch a method

@wjmelements
Copy link
Contributor Author

getActiveProvidersByProductType -> let's just make an onlyActive bool argument to getProvidersByProductType and ditch a method

I believe I suggested something similar in the original PR. Would you believe that synapse is fetching all of them and them filtering by isActive?

@rvagg
Copy link
Collaborator

rvagg commented Oct 24, 2025

Would you believe

Oh yes I would. This whole registry was done way too quick, on both sides.

@rjan90 rjan90 moved this from 📌 Triage to 🔎 Awaiting review in FS Oct 24, 2025
@rvagg
Copy link
Collaborator

rvagg commented Oct 24, 2025

Trying out my own feedback as a PR: #328

@rvagg
Copy link
Collaborator

rvagg commented Oct 24, 2025

I'm just coming to terms with the per-day here vs per-month before and per-month that we have in FWSS, it's a bit odd that we have two versions of this. Now an SP has to divide the per month charge that everyone talks about to figure out how many days. We have a standard for what a "month" is in epochs that we use everywhere, it's not abnormal to encode a month as 30 days.

Anyway, just my 2c, not a big deal but it's jarring as I update Curio to work with this and think through what an SP has to deal with. The default I have to encode is 83333333333333333 to get close to the 2.5 USDFC we have in WarmStorage.

Original thread is #297 (comment)

@rvagg
Copy link
Collaborator

rvagg commented Oct 24, 2025

Curio version on top of #328: filecoin-project/curio#736

Copy link
Contributor

@Kubuxu Kubuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGWM but needs a rebase

@github-project-automation github-project-automation bot moved this from 🔎 Awaiting review to ✔️ Approved by reviewer in FS Oct 24, 2025
}
}
// Enforce minimum schema
require(BloomSet16.mayContain(foundKeys, requiredKeys), Errors.InsufficientCapabilitiesForProduct(productType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, one can find a set of keys that match this bloom filter, but it is fine, since the final decision is on the client side and in the approval list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Bloom filter is used only for that verification, the problem could be avoided by requiring keys to be provided in order. And then stepping through the list of required and provided in order, while allowing extra provided keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we are ultimately validating these fields off-chain. The filter will help prevent accidental omissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem could be avoided by requiring keys to be provided in order

That is a good idea, but this contract's codesize would then scale in the number of required keys rather than the number of products. It can be reduced by using a separate validation library per product. We could then have more specialized on-chain validation of known keys.

@wjmelements wjmelements merged commit 1c54968 into main Oct 24, 2025
6 checks passed
@wjmelements wjmelements deleted the bloom-schema branch October 24, 2025 22:44
@github-project-automation github-project-automation bot moved this from ✔️ Approved by reviewer to 🎉 Done in FS Oct 24, 2025
Kubuxu added a commit to filecoin-project/curio that referenced this pull request Oct 27, 2025
* feat(pdp): deal with new ServiceProviderRegistry changes

Ref: FilOzone/filecoin-services#308
Ref: FilOzone/filecoin-services#328

* fixup! feat(pdp): deal with new ServiceProviderRegistry changes

* fix: treat key presence as truthy for boolean options

Co-authored-by: William Morriss <[email protected]>

* feat(pdp): add IpniPeerID to PDPOfferingData

Signed-off-by: Jakub Sztandera <[email protected]>

* feat(pdp): show IpniPeerID in webui, use IpniPeerID in FSUpdatePDP

Signed-off-by: Jakub Sztandera <[email protected]>

---------

Signed-off-by: Jakub Sztandera <[email protected]>
Co-authored-by: William Morriss <[email protected]>
Co-authored-by: Jakub Sztandera <[email protected]>
rvagg added a commit to filecoin-project/curio that referenced this pull request Oct 29, 2025
* feat(pdp): deal with new ServiceProviderRegistry changes

Ref: FilOzone/filecoin-services#308
Ref: FilOzone/filecoin-services#328

* fixup! feat(pdp): deal with new ServiceProviderRegistry changes

* fix: treat key presence as truthy for boolean options

Co-authored-by: William Morriss <[email protected]>

* feat(pdp): add IpniPeerID to PDPOfferingData

Signed-off-by: Jakub Sztandera <[email protected]>

* feat(pdp): show IpniPeerID in webui, use IpniPeerID in FSUpdatePDP

Signed-off-by: Jakub Sztandera <[email protected]>

---------

Signed-off-by: Jakub Sztandera <[email protected]>
Co-authored-by: William Morriss <[email protected]>
Co-authored-by: Jakub Sztandera <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BREAKING enhancement New feature or request

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

perf(ServiceProviderRegistry): efficient productData Review PDPOffering fields

4 participants