Replies: 2 comments 2 replies
-
Is there a sandbox for a trained model to play with (without the blockchain context)? |
Beta Was this translation helpful? Give feedback.
1 reply
-
How does Proof-of-Quality stop miners from just generating random data which the algorithm would recognize as high-quality? If the model is public, could one just feed random data through it backwards? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
A proposal of Proof-of-Quality
by Extend Labs
Abstract:To promote the storage of meaningful data, Extend Labs suggests the Filecoin community develop a method of Proof-of-Quality and reward the storage miner who has passed the proof. The basic process of the proposed PoQ is as follows. In the training phase, first, the storage miners publish the features of the data under differential privacy; second, the model miners train the PoQ model using distributed machine learning methods on the published features. In the testing phase, the storage miners load the latest PoQ model, submit the proof of quality and get the verified clients reward.
1. Introduction
On the existing Filecoin system, most of the stored data are randomly generated data but not meaningful data, which makes the community difficult to realize its ambition. How to promote meaningful data storage and applications has become a key issue of the Filecoin community. To this end, the Filecoin community has launched the “verified clients” reward.
The key to the success of “verified clients” reward is how to build an efficient, reliable, and adaptable way to verify the quality of the data. In this context, Extended Labs suggests the community develop the “Proof-of-Quality” (PoQ). PoQ is a proof method for the quality verification of the content of the data stored by miners, and it is also the only way for storage miners to obtain the “verified clients” reward.
The main challenges of the PoQ include: 1) Meaningful data storage puts forward higher requirements for data privacy, which means the PoQ methods usually cannot directly access the data; 2) The pattern of meaningful data changes dynamically, which means the PoQ methods should be able to adjust in conjunction with community development; 3) Due to the huge attraction of the “verified clients” reward, the storage miners have greater incentives to forge the meaningful data, which is also a core issue that needs to be considered in PoQ design.
Therefore, an effective PoQ should be able to ensure data privacy, adjust with the development of the community, and prevent malicious attacks by the storage miners. Based on the above considerations, we propose a PoQ scheme based on federated computing based on existing privacy computing and distributed machine learning.
2. A Proof-of-Quality Solution based on Federated Learning
There are two stages of the proposed PoQ method: PoQ model training and PoQ model testing. In the PoQ model training phase, federated learning is used to train the PoQ model, and the miners participating in the model training will receive model training rewards; in the PoQ model testing, each storage miner generates the PoQ certificate on its stored data and obtains the “verified clients” reward.
2.1 Training of PoQ model
To train the PoQ model efficiently, we employ differential privacy to publish the features of the data and use the distributed machine learning methods to train the PoQ model, under the federated computing framework [1]. The framework is shown in Figure 1. It includes three components:
Differential privacy data publish module: In this module, the features of the data will be calculated, desensitized, and finally published under differential privacy. Combined with actual needs, the data features can be the histograms of the data’s n-Gram segments. A possible solution for this module is the "Differentially private histogram publication" method [2], which is a widely used histogram publication method with grantees.
The local model training module. In this module, a specific part of the PoQ model is trained locally on a batch of samples or a subspace of the features on the differential privacy published data. The locally trained models will be reported to the global model training miners.
The global model training model. In this module, we obtain the global PoQ model by merging and tuning the local models. The split strategy of the local and global model, and the merging strategy of the global model can refer to "Scaling distributed machine learning with the parameter server", which is a widely used method in the field of distributed machine learning [3].
In practice, the differential privacy data publish module is deployed and completed by each storage miner, the local model training module is deployed on the storage miner or the model training miner, and the global model training model is completed by several (3, 5, or more) randomly selected model training miners to avoid the Sybil attack.
2.2 Generation of PoQ
The generation of PoQ is shown in Figure 2. Similar to the training process, this phase will also guarantee data privacy. It has two modules as follows:
Differential privacy data publish module: this model is the same as the training phase.
PoQ generation module: In this module, first, obtains the latest PoQ model from the model training miner; second, estimates the PoQ on the features published under differential privacy; finally, submits the PoQ score to get the “verified clients” reward.
2.3 Training Samples Collection
The training samples with accurate labels are the key to the success of PoQ. Possible solutions to collect the training samples include:
In the initial stage, the samples can be collected according to the meaningful data rules. The rules are based on the community consensus.
In the update stage, the samples can be collected according to meaningful actions on the data, such as the retrieve, update and so on.
2.4 Rewards
Model training reward: In order to encourage miners to participate in PoQ model training, PoQ model training rewards are given to miner nodes participating in local model training and global model training. The local model training and global model training rewards are related to the amount of data involved in the training.
Verified clients reward: storage miners that passed the PoQ will get the verified clients reward. The score returned by PoQ can be quality-adjusted power.
3. Conclusion
In summary, we propose to build a PoQ method based on the latest advanced privacy computing and machine learning technologies. The proposal is featured with:
Use both the rules provided by the community and the meaningful behaviors performed by the users, as the criteria of the meaningful data.
Devise a PoQ model training method with privacy grantee. Its characteristics include: using differential privacy to publish the features of the data; training the model using the distributed large-scale machine learning methods.
Use the trained PoQ model to verify the quality of the data. Its characteristics include that the storage miners use the latest PoQ model to verify the quality of the data and get the verified clients reward.
References
Yang Q, Liu Y, Chen T, et al. Federated machine learning: Concept and applications[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2019, 10(2): 1-19.
Xu J, Zhang Z, Xiao X, et al. Differentially private histogram publication[J]. The VLDB Journal, 2013, 22(6): 797-822.
Li M, Andersen D G, Park J W, et al. Scaling distributed machine learning with the parameter server[C]//OSDI. 2014: 583-598.
Beta Was this translation helpful? Give feedback.
All reactions