-
Notifications
You must be signed in to change notification settings - Fork 14
docs(pdp): add per-piece security guarantees section #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Added some comments (minor). good to go for me. |
@lucaniz : FYI that I don't see any comments from you in the PR. |
@lucaniz : your comment is "pending", which I think means you haven't submitted your review. |
| p_T = (1-α)^(K×T) | ||
| ``` | ||
|
|
||
| **Example detection rates (K=5 challenges per day):** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, let's say we want a commitment that AFR-1 = 99%. How do we communicate this?
A customer has one data set and that's where they continue to feed all of their data (say files), and get to 1000 files.
Withint a year, we want to say they will only lose 1 file. What is the DealBot threshold to know that SLA can be reached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the numbers, apparently for 1 file lost out of 1000 (α = 0.1%):
- Daily detection: 1 - (0.999)^5 ≈ 0.5%
- 30-day detection: 1 - (0.999)^150 ≈ 14%
- Annual detection: 1 - (0.999)^1825 ≈ 84%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think thresholds for 1% loss detection would be:
- 30 days 77.9%
- 90 days 98.9%
- 180 days 99.99%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timfong888 I think the problem we have matching what PDP can do to what AFR wants to do is that PDP looks backward: "If data is lost, what's the probability we catch it?". But AFR wants more than that: "What's the probability data will be lost in the first place?". We don't control enough of the pipeline and infra to be able to say that without using historical track record and assuming things are consistent going forward. So maybe we can't and shouldn't frame in terms of AFR?
docs/design.md
Outdated
|
|
||
| **What this means for individual pieces:** | ||
|
|
||
| If a storage provider has lost any significant fraction of a data set, they will be caught with high probability regardless of which specific pieces are missing. The random challenge selection ensures that: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's where I could use more clarity using the use case. "significant fraction" will be caught. I am using the 1% because that is the AFR we are using with as a base case. meaning the Annual Failure Rate is 1%. We want to approve an SP based on a period of time/number of successful proofs. How do we reason this on the customer's behalf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want 99% confidence in an SP before approving then we should go with 90 days of successful proofs before approving. That only shows history and we have to judge whether that's enough confidence for us of their ability to not lose data into the future.
For the user, we need to express this in terms of data loss detection and it's a curve, it looks something like:
- Large loss (5%+): Caught within days
- Medium loss (1-5%): Caught within weeks to months
- Small loss (<1%): May take months to a year
I'v make a change on line 194:
As shown in the table above, detection confidence depends on the fraction of data lost and the proving period. For a 1% data loss, detection reaches 77.9% confidence within 30 days and exceeds 99% within 90 days. Larger losses are caught faster—5% loss reaches 99.95% detection in just 30 days.

Also tagging @lucaniz for review here; it's based off https://www.notion.so/filecoindev/PDP-security-and-ProofSet-size-28edc41950c180d088dee430e1249a2a but is user-focused: "My specific piece wasn't challenged in the last X days—how do I know it's still safe?"