Adjust WindowPoSt parameters without damaging network security #537

Pythonac · 2022-11-24T08:07:58Z

Pythonac
Nov 24, 2022

Motivation

WindowPoSt is used as a proof that a copy of the data has been continuously maintained over time,which makes it irrational for a SP to not keep a sealed copy of the data (i.e., it is more expensive to seal a copy of the data every time they are asked to submit a WindowPoSt challenge).
Current WindowPoSt parameters (proving period & challenge count) are overstrict. Deliberative adjustment would be beneficial to the Filecoin network.

Goals

Change proving period from 24 hours to 48 hours.
Use appropriate PoSt challenge count values for 32GiB and 64GiB seal proof respectively.

Proposals

Extend proving period

Based on current network stats (November 10, 2022 )：

RawBytePower is about 15.9 EiB(while QualityAdjPower is about 18.7 EiB)；
Active miner count is 3,967；
Average raw RawBytePower for each miner is about 4.1 PiB.

Setting the proving period at 24 hours means if a miner could complete the proof, he has the capacity to seal 4.1 PiB/day. Yet no matter whether a miner can achieve the capability, the cost for sealing is much larger than keeping a copy. The cost gap is so big that slight changes of the proving period would leave less impact on the network security, unless a very large change, for example, changing to 1 week or 1 month.
At the same time, SubmitWindowedPoSt occupies about 18% of the total network gas fee. If extending the proving period to 48 hours, it will save about 10% network capacity without impacting network security.

Modify challenge count

32GiB & 64 GiB seal proof share the same parameters for PoSt:

NODE_SIZE at 32;
winning post challenge count at 66;
window post challenge count at 10.

While leaf count for 64 GiB seal proof is 2 times of 32 GiB seal proof, the sampling frequency for 32 GiB is doubled. Since the challenge count setting for 64 GiB seal proof was proved to be safe by the network, we should adjust the setting for 32 GiB seal proof to align with 64 GiB seal proof.

Extend miner actor’s DeclareFaultsRecovered() method

There are many reasons for missing window PoSt deadlines. For example, base fee changes drastically;SubmitWindowedPoSt message can’t be published; GPU broke accidentally and can’t finish the computing on time etc. These faults result in SP’s power reduction, which leads to block reward loss. SPs are incentivized to fix these faults as quickly as possible but have to wait for the corresponding deadline in the next proving period to recover the sectors.
We can leverage the existing DeclareFaultsRecovered() method to recover the fault once they are fixed. We can add a “EarlyProvePartition” property in DeclareFaultsRecoveredParams. EarlyProvePartition is an array of <partition, deadline> pairs:

The minimum early proving unit is partition.
Deadline should be no later than the assigned deadline for the partition and no earlier than the first safe deadline when the message is on chain.
EarlyProvePartition value can be changed via different messages before taking effect.

Discussion

Some policy constants need to be updated to align with the proving period changes:

 pub const WPOST_PROVING_PERIOD: ChainEpoch = EPOCHS_IN_DAY;  
 pub const WPOST_CHALLENGE_WINDOW: ChainEpoch = 30 * 60 / EPOCH_DURATION_SECONDS;  
 pub const FAULT_MAX_AGE: ChainEpoch = WPOST_PROVING_PERIOD * 42;  
 pub const CONTINUED_FAULT_PROJECTION_PERIOD: ChainEpoch =  
    (EPOCHS_IN_DAY * CONTINUED_FAULT_FACTOR_NUM) / CONTINUED_FAULT_FACTOR_DENOM;  
change to: 

 pub const WPOST_PROVING_PERIOD: ChainEpoch = EPOCHS_IN_DAY * 2;  
 pub const WPOST_CHALLENGE_WINDOW: ChainEpoch = 60 * 60 / EPOCH_DURATION_SECONDS;  
 pub const FAULT_MAX_AGE: ChainEpoch = WPOST_PROVING_PERIOD * 21;  
 pub const CONTINUED_FAULT_PROJECTION_PERIOD: ChainEpoch =  
    (EPOCHS_IN_DAY * 2 * CONTINUED_FAULT_FACTOR_NUM) / CONTINUED_FAULT_FACTOR_DENOM;

To update the challenge count for 32 GiB seal proof, besides code change, Bls12 Groth params need to be regenerated and published.

lucaniz · 2022-11-25T16:28:16Z

lucaniz
Nov 25, 2022
Collaborator

I'll try to give some context and answer to the first 2 points below (cc @irenegia ):

Extend WindoPost proving period from 24h to 48h**

Setting` the proving period at 24 hours means if a miner could complete the proof, he has the capacity to seal 4.1 PiB/day. Yet no matter whether a miner can achieve the capability, the cost for sealing is much larger than keeping a copy. The cost gap is so big that slight changes of the proving period would leave less impact on the network security, unless a very large change, for example, changing to 1 week or 1 month.

Let me try to give an overview of the whole WindowPost mechanism in order to address the points raised

At the current stage, WindowPost relies on the fact that storing a sealed sector is more rational than re-computing (part of) it in order to answer challenges.

What we get is that storing less than 80% of a sector will lead to an expensive recomputation step in order to answer WindowPost challenges with high probability.

To give a more details, the analysis shows that, due to the many dependencies each node has in the graph, when storing less than 80% of the sealed sector, there is (with some probability) at least one challenged node that, in order to be recomputed, needs the recomputation of the 80% of one of the layer of the graph. This computation is not parallelizable.

We have been really conservative when we designed the cost model, and we are convinced that this approach is more secure than considering the “average sealing cost” as proposed here.

In essence, this is a lower bound on the computation the prover needs to do in order to answer all challenges correctly. Let’s call this computation C.

The cost analysis (which is based on the cost of SHA and the cost of storage) shows that running C is K times more expensive than storgin the entire sealed sector for 24h.

If we want to extend the proving period from 24 to 48 hours, what we would get would be that the the cost assumption becomes “worse” by a factor 2, meaning that running C would now become K/2 times more expensive than storgin the entire sealed sector for 48h, which would make the level of security worse than what we have.

Given all the above, we stress that

Moving from the current cost model to a cost model which considers the average sealing cost is a big (and more risky) change.
Doubling the proving period from 24h to 48h is per se a significant change that would make the cost assumption 2x “worse”, and thus we would recommend to be really careful on that (we would need to re-run the entire analysis, update the costs and check if the left margins are conservative enough, which is not granted. Note that having comparable costs for storing and reconstruction is not something one wants to end up with, given rational security model we have)

Modify Challenge count (halving the number of challenges for 32GB sectors)

32GiB & 64 GiB seal proof share the same parameters for PoSt:

NODE_SIZE at 32;

winning post challenge count at 66;

window post challenge count at 10.
While leaf count for 64 GiB seal proof is 2 times of 32 GiB seal proof, the sampling frequency for 32 GiB is doubled. Since the challenge count setting for 64 GiB seal proof was proved to be safe by the network, we should adjust the setting for 32 GiB seal proof to align with 64 GiB seal proof.

The suggestion here is to cut by 2 the number of challenges we ask for 32GB sectors given that 64GB sectors and 32GB sectors share the same number of challenges.

Unfortunately this is not as straightforward as it seems. Here is why:

Let’s recall that we want to catch a provider who is storing less than the 80% of a sector (whatever size it has) with probability p (in order to apply our cost assumption).

No matter the size of the sector (it is a “percentage check”), with T challenges we have that a provider storing less than 80% of a sector is caught with probability p ≥ 1- (1-0.2)T where

(1-0.2)T is the probability that all the challenges fall in the 80% the provider is storing
1- (1-0.2)T is the probability that at least one of the challenges falls in the portion of sector which has been deleted, leading to the expensive recomputation step)

If now we differentiate between 64GB sectors and 32GB sectors, by asking T/2 challenges for the latter, we would have that:

For 64GB sectors everything stays the same, meaning that a provider storing less than 80% of a sector is caught with probability p ≥ 1- (1-0.2)T
For 32Gb sectors we would have that the lower bound on the probability of catching a provider storing less than 80% of a sector goes down to 1- (1-0.2)T/2

Basically, if we do so, providers storing less than 80% in a 32GB sectors have higher probability to pass the WindoPost step with respect to providers storing less than 80% in a 64GB one.

If there is something which is not clear/needs more clarification, please let us know!

0 replies

jennijuju · 2022-11-25T16:48:20Z

jennijuju
Nov 25, 2022
Maintainer

Thanks @lucaniz @irenegia for the insightful security information. With all those being said,

The cost gap is so big that slight changes of the proving period would leave less impact on the network security, unless a very large change, for example, changing to 1 week or 1 month.

@Pythonac - if you believe that the Filecoin network security will not be reduced by the proposed change, could you please provide the supporting analysis and data that can back up the statement?

0 replies

salstorage · 2022-11-25T18:04:26Z

salstorage
Nov 25, 2022

On behalf of Seal Storage Technology an Enterprise focused Storage Provider, we support this initiative as it would make maintenance easier for infrastructure updates etc.

0 replies

Pythonac · 2022-11-28T02:12:53Z

Pythonac
Nov 28, 2022
Author

@lucaniz @jennijuju @salstorage
Thank you guys for thought-provoking comments and @lucaniz especially for your detailed explanation which leads to our further understanding of the background technological mechanism. Well, I shall continue previous study and keep you guys updated with my latest findings.
Meanwhile, we're open to other opinions on feasible methods to more efficient storage on Filecoin under a secure network.

0 replies

Pythonac · 2022-12-19T09:52:39Z

Pythonac
Dec 19, 2022
Author

@lucaniz @jennijuju
Sorry for late updates. Please check the information below about my latest calculation.

Proving period

If we want to extend the proving period from 24 to 48 hours, what we would get would be that the the cost assumption becomes “worse” by a factor 2, meaning that running C would now become K/2 times more expensive than storing the entire sealed sector for 48h, which would make the level of security worse than what we have.

Given that a SP with P storage power and S active sectors storing 80% of each sector.
Given that Window PoSt challenge count is T, then there will be F failed sectors each day:
F = (1- 0.8T) * S

The loss L for the failures include instant penalty and potential block reward loss:
L =F (FF + 1BR) = F * 4.5 * BR

Supposing sealing computation cost for each sector is C, the total computation cost each day will be D:
D = F * C

So, the total cost for storing only 80% of each sector per day is G:
G = L + D =F * (4.5*BR + C)

Supposing storage price is A, the total cost for storing the additional 20% of each sector per day is K:
K = P * 0.2 * A

Adjusted storage power	Sector Size	Sector type	Proving period	L ($)	D ($)	G ($)	K($)
1PiB	32GiB	Committed Capacity	24h	228	42170	42398	147
1PiB	32GiB	Regular Sector with multiplier 10	24h	228	4217	4445	14.7
1PiB	64GiB	Committed Capacity	24h	228	42170	42398	147
1PiB	64GiB	Regular Sector with multiplier 10	24h	228	4217	4445	14.7
1PiB	32GiB	Committed Capacity	48h	456	42170	42626	294
1PiB	32GiB	Regular Sector with multiplier 10	48h	456	4217	4673	29.4
30PiB	32GiB	Committed Capacity	24h	6840	1265100	1271940	4410
30PiB	32GiB	Committed Capacity	48h	13680	1265100	1278780	8820

Compared with storing 80% of each sector, it seems better for malicious modes to store 80% of complete sectors. The chart below displays the cost comparison.

Adjusted storage power	Sector Size	Sector type	Proving period	L ($)	D ($)	G ($)	K($)
1PiB	32GiB	Committed Capacity	24h	0	9477	9477	147
1PiB	32GiB	Committed Capacity	48h	0	9477	9477	147

Parameter Explanation:
T(challenge count ): 10
BR(projected block reward for one day): 0.0133 FIL/TiB (Filfox, 2022-12-15)
C (computation cost for sealing a 32GiB sector) : $1.446 (aws, g4dn.4xlarge, 3-yr Reserved, 3 hours )
A (storage price for storing a 32GiB sector per day): $0.0224 (aws, S3 Standard, Over 500 TB / Month)
Fil price: $4.184 (https://coinmarketcap.com/, 2022-12-15)

AWS Pricing:
https://aws.amazon.com/ec2/instance-types/g4/
https://aws.amazon.com/s3/pricing/?trk=27620bf8-ab8b-419f-bab7-179a710770c8&sc_channel=ps&s_kwcid=AL!4422!3!639556541234!e!!g!!aws%20s3%20pricing&ef_id=Cj0KCQiAqOucBhDrARIsAPCQL1b69Y2GZ4YCe7d641EmkjEl0f_o5dyEHUED8L1Gr5Py3ZZ-mScVNUMaAgY7EALw_wcB:G:s&s_kwcid=AL!4422!3!639556541234!e!!g!!aws%20s3%20pricing

To sum up, even in the worst case, the cost gap is so big that no one would even choose to seal rather than store. Incentive is the only way to guarantee SPs manipulate the sectors rationally. Based on the cost analysis above, adjusting the proving period deliberately will not impact the incentive model and will not involve risk to the network.

Challenge count

If now we differentiate between 64GB sectors and 32GB sectors, by asking T/2 challenges for the latter, we would have that:

For 64GB sectors everything stays the same, meaning that a provider storing less than 80% of a sector is caught with probability p ≥ 1- (1-0.2)T

For 32Gb sectors we would have that the lower bound on the probability of catching a provider storing less than 80% of a sector goes down to 1- (1-0.2)T/2

I guess you’re right about this: reducing T for 32 GiB sector is not as safe as keeping T for 64 GiB sector, though the Window PoSt cost for 32 GiB sector is doubled for 64 GiB sector. Yet based on the analysis above, reducing challenge count is safe for 32 & 64 GiB sectors both. Let’s assume SP A has 1PiB storage power with 32 GiB sectors, storing less than 80% of each sector. If we reduce the challenge value to 5, there are about 22031 sectors that can't pass Window PoSt. The cost for resealing 22031 sectors is about $31857, while the cost of storing the additional 20% data per day is $147. I don’t think a rational SP will do it like this.

0 replies

lucaniz · 2023-01-04T11:18:52Z

lucaniz
Jan 4, 2023
Collaborator

Hi @Pythonac,
sorry for late reply and thanks for your explanation.

There are a couple of details I want to highlight, in order to better understand where we are coming from.

I understand your point in the number of failed sector (on expectation: I think there is a typo and it is F = (1- 0.8^T) * S, but is really minor :) ).

Let me comment on some parts of your analysis that I'd like to better specify.

Wrt the following:

Supposing sealing computation cost for each sector is C, the total computation cost each day will be D:
D = F * C

unfortunately this is not really the case. Given the security analysis of the construction, the only thing that we can prove is that the cost for a provider who gets caught with challenges is at least 80% of a single layer of the graph (that is, ~ C/10, if C is the cost of the whole sealing procedure). The real issue is that we do not really know which nodes of the graph he is storing and we need to be generic, relying on lower bounds that we can prove in a security analysis for "any adversarial strategy". Of course one can restrict to a precise strategy/rely on different assumption/less conservative approaches, but this is not what we did.

When considering the following

So, the total cost for storing only 80% of each sector per day is G:
G = L + D =F * (4.5*BR + C)

we actually work under the assumption that either you regenerate or you pay the penalty L. Actually, when dealing with windowpost, we generally consider an adversary who either is storing or is regenerating (which means that he is passing the proof in both cases). Again one can consider a less conservative approach/different assumptions, but it is not what we took into account.

Coming back to adversarial strategies, as far as I can tell you consider 2 specific strategies which are the one where the adversary is storing the 80% of each sector and the one where the adversary stores the 80% of sector entirely, and not storing the 20% of them, incurring in the sealing cost for that 20% (in this particular case I agree with you that regeneration cost = sealing cost). Nevertheless, similarly to what I wrote above, unless we prove that those are the 2 optimal strategies, we should not restrict to those 2 (what if there is a more adaptive strategy that the adversary can exploit with the same "storage budget"? This is actually the reason why we use worst case bounds, and also the reason why these analysis are that complicated).

In terms of cost estimation

T(challenge count ): 10
BR(projected block reward for one day): 0.0133 FIL/TiB (Filfox, 2022-12-15)
C (computation cost for sealing a 32GiB sector) : $1.446 (aws, g4dn.4xlarge, 3-yr Reserved, 3 hours )
A (storage price for storing a 32GiB sector per day): $0.0224 (aws, S3 Standard, Over 500 TB / Month)
Fil price: $4.184 (https://coinmarketcap.com/, 2022-12-15)

when we run our cost analysis we considered the scenario of providers buying their own hardware and using them at max capacity for the entire lifecycle (which gave us lower costs).

With respect to the challenge count, again we should not consider the cost of the "whole" sealing given that the only thing we can mathematically prove is that if one need to regenerate, he would incur in a cost which is at least the one of computing the 80% of a single layer of the graph.

I hope this helps, let me know if you have any comment or question

cc @irenegia

0 replies

Pythonac · 2023-03-02T03:01:49Z

Pythonac
Mar 2, 2023
Author

@lucaniz Sorry for late reply! It took me a long time to think about your thoughtful feedback, which I feel much appreciate. Yet I have to put things in another way. I understand your concern about the network security, but I think you are overestimating the impact of this proposal. The proposed change would only affect a small fraction of miners who are facing temporary difficulties due to external factors, such as network congestion or hardware failures. It would not encourage malicious behavior or reduce the overall quality of service of the Filecoin network1. In fact, it would help to maintain a healthy and diverse storage market by preventing unnecessary loss of power and reputation for honest miners2. I believe this proposal is aligned with the vision and principles of Filecoin3, which aims to create a decentralized, efficient, and robust storage network for humanity’s most important information. Looking forward to your feedback on this perspective. Thank you.

0 replies

Adjust WindowPoSt parameters without damaging network security #537

Uh oh!

Uh oh!

Pythonac Nov 24, 2022

Motivation

Goals

Proposals

Discussion

Replies: 7 comments

Uh oh!

lucaniz Nov 25, 2022 Collaborator

Uh oh!

jennijuju Nov 25, 2022 Maintainer

Uh oh!

salstorage Nov 25, 2022

Uh oh!

Uh oh!

Pythonac Nov 28, 2022 Author

Uh oh!

Uh oh!

Pythonac Dec 19, 2022 Author

Uh oh!

lucaniz Jan 4, 2023 Collaborator

Uh oh!

Uh oh!

Pythonac Mar 2, 2023 Author

Pythonac
Nov 24, 2022

lucaniz
Nov 25, 2022
Collaborator

jennijuju
Nov 25, 2022
Maintainer

salstorage
Nov 25, 2022

Pythonac
Nov 28, 2022
Author

Pythonac
Dec 19, 2022
Author

lucaniz
Jan 4, 2023
Collaborator

Pythonac
Mar 2, 2023
Author