A proposal of Proof-of-Quality #167

longmans · 2021-05-23T12:25:33Z

longmans
May 23, 2021

A proposal of Proof-of-Quality

by Extend Labs

Abstract：To promote the storage of meaningful data, Extend Labs suggests the Filecoin community develop a method of Proof-of-Quality and reward the storage miner who has passed the proof. The basic process of the proposed PoQ is as follows. In the training phase, first, the storage miners publish the features of the data under differential privacy; second, the model miners train the PoQ model using distributed machine learning methods on the published features. In the testing phase, the storage miners load the latest PoQ model, submit the proof of quality and get the verified clients reward.

1. Introduction

On the existing Filecoin system, most of the stored data are randomly generated data but not meaningful data, which makes the community difficult to realize its ambition. How to promote meaningful data storage and applications has become a key issue of the Filecoin community. To this end, the Filecoin community has launched the “verified clients” reward.

The key to the success of “verified clients” reward is how to build an efficient, reliable, and adaptable way to verify the quality of the data. In this context, Extended Labs suggests the community develop the “Proof-of-Quality” (PoQ). PoQ is a proof method for the quality verification of the content of the data stored by miners, and it is also the only way for storage miners to obtain the “verified clients” reward.

The main challenges of the PoQ include: 1) Meaningful data storage puts forward higher requirements for data privacy, which means the PoQ methods usually cannot directly access the data; 2) The pattern of meaningful data changes dynamically, which means the PoQ methods should be able to adjust in conjunction with community development; 3) Due to the huge attraction of the “verified clients” reward, the storage miners have greater incentives to forge the meaningful data, which is also a core issue that needs to be considered in PoQ design.

Therefore, an effective PoQ should be able to ensure data privacy, adjust with the development of the community, and prevent malicious attacks by the storage miners. Based on the above considerations, we propose a PoQ scheme based on federated computing based on existing privacy computing and distributed machine learning.

2. A Proof-of-Quality Solution based on Federated Learning

There are two stages of the proposed PoQ method: PoQ model training and PoQ model testing. In the PoQ model training phase, federated learning is used to train the PoQ model, and the miners participating in the model training will receive model training rewards; in the PoQ model testing, each storage miner generates the PoQ certificate on its stored data and obtains the “verified clients” reward.

2.1 Training of PoQ model

To train the PoQ model efficiently, we employ differential privacy to publish the features of the data and use the distributed machine learning methods to train the PoQ model, under the federated computing framework [1]. The framework is shown in Figure 1. It includes three components:

Differential privacy data publish module: In this module, the features of the data will be calculated, desensitized, and finally published under differential privacy. Combined with actual needs, the data features can be the histograms of the data’s n-Gram segments. A possible solution for this module is the "Differentially private histogram publication" method [2], which is a widely used histogram publication method with grantees.
The local model training module. In this module, a specific part of the PoQ model is trained locally on a batch of samples or a subspace of the features on the differential privacy published data. The locally trained models will be reported to the global model training miners.
The global model training model. In this module, we obtain the global PoQ model by merging and tuning the local models. The split strategy of the local and global model, and the merging strategy of the global model can refer to "Scaling distributed machine learning with the parameter server", which is a widely used method in the field of distributed machine learning [3].

In practice, the differential privacy data publish module is deployed and completed by each storage miner, the local model training module is deployed on the storage miner or the model training miner, and the global model training model is completed by several (3, 5, or more) randomly selected model training miners to avoid the Sybil attack.

2.2 Generation of PoQ

The generation of PoQ is shown in Figure 2. Similar to the training process, this phase will also guarantee data privacy. It has two modules as follows:

Differential privacy data publish module: this model is the same as the training phase.
PoQ generation module: In this module, first, obtains the latest PoQ model from the model training miner; second, estimates the PoQ on the features published under differential privacy; finally, submits the PoQ score to get the “verified clients” reward.

2.3 Training Samples Collection

The training samples with accurate labels are the key to the success of PoQ. Possible solutions to collect the training samples include:

In the initial stage, the samples can be collected according to the meaningful data rules. The rules are based on the community consensus.
In the update stage, the samples can be collected according to meaningful actions on the data, such as the retrieve, update and so on.

2.4 Rewards

Model training reward: In order to encourage miners to participate in PoQ model training, PoQ model training rewards are given to miner nodes participating in local model training and global model training. The local model training and global model training rewards are related to the amount of data involved in the training.

Verified clients reward: storage miners that passed the PoQ will get the verified clients reward. The score returned by PoQ can be quality-adjusted power.

3. Conclusion

In summary, we propose to build a PoQ method based on the latest advanced privacy computing and machine learning technologies. The proposal is featured with:

Use both the rules provided by the community and the meaningful behaviors performed by the users, as the criteria of the meaningful data.
Devise a PoQ model training method with privacy grantee. Its characteristics include: using differential privacy to publish the features of the data; training the model using the distributed large-scale machine learning methods.
Use the trained PoQ model to verify the quality of the data. Its characteristics include that the storage miners use the latest PoQ model to verify the quality of the data and get the verified clients reward.

References

Yang Q, Liu Y, Chen T, et al. Federated machine learning: Concept and applications[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2019, 10(2): 1-19.
Xu J, Zhang Z, Xiao X, et al. Differentially private histogram publication[J]. The VLDB Journal, 2013, 22(6): 797-822.
Li M, Andersen D G, Park J W, et al. Scaling distributed machine learning with the parameter server[C]//OSDI. 2014: 583-598.

Fatman13 · 2021-05-24T02:13:26Z

Fatman13
May 24, 2021

Is there a sandbox for a trained model to play with (without the blockchain context)?

1 reply

longmans May 24, 2021
Author

We have already practiced in differential privacy conversion and training algorithms. We are currently looking at whether the official is interested in this solution. If the official does not integrate it, we will have a separate privacy calculation framework to facilitate the loosely coupled call of the filecoin network.

magik6k · 2021-05-24T16:29:15Z

magik6k
May 24, 2021
Maintainer

How does Proof-of-Quality stop miners from just generating random data which the algorithm would recognize as high-quality?

If the model is public, could one just feed random data through it backwards?

1 reply

longmans May 26, 2021
Author

How does Proof-of-Quality stop miners from just generating random data which the algorithm would recognize as high-quality?

If the model is public, could one just feed random data through it backwards?

Thanks for the question.
For the first question, we agree that some randomly generated data may be falsely recognized as meaningful data, but our model can adaptively adjust to fix this problem. Because if the system is attacked, there will a lot of randomly generated data with the same pattern, which can be easily detected.

For the second question, yes, the model is public, but the training samples of meaningful data are selected based on the criteria determined by the community and the user behaviors. This ensures the model cannot be attacked by the randomly generated data.

BTW, we want to say that our team has a lot of experience with antispam systems. In our view, PoQ is just a special case of antispam, :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A proposal of Proof-of-Quality #167

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

A proposal of Proof-of-Quality #167

Uh oh!

Uh oh!

longmans May 23, 2021

A proposal of Proof-of-Quality

1. Introduction

2. A Proof-of-Quality Solution based on Federated Learning

2.1 Training of PoQ model

2.2 Generation of PoQ

2.3 Training Samples Collection

2.4 Rewards

3. Conclusion

References

Replies: 2 comments · 2 replies

Uh oh!

Fatman13 May 24, 2021

Uh oh!

longmans May 24, 2021 Author

Uh oh!

magik6k May 24, 2021 Maintainer

Uh oh!

longmans May 26, 2021 Author

longmans
May 23, 2021

Replies: 2 comments 2 replies

Fatman13
May 24, 2021

longmans May 24, 2021
Author

magik6k
May 24, 2021
Maintainer

longmans May 26, 2021
Author