Skip to content

Commit c30d9e9

Browse files
committed
Upload chapter 10 federated_learning
1 parent e388f91 commit c30d9e9

File tree

52 files changed

+656
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+656
-0
lines changed
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Conclusions
2+
3+
\(1\) This chapter briefly introduced the background, system
4+
architecture, FedAvg algorithm, privacy encryption algorithms, and
5+
deployment challenges of federated learning. (2) Federated learning is
6+
an AI algorithm that is on the verge of becoming mainstream. It can be
7+
used to establish effective machine learning models under the
8+
constraints of data protection and data silos. (3) In addition, the
9+
distinct characteristics (data on the devices is not uploaded, the
10+
security and privacy requirements are high, and data is non-idd) of
11+
federated learning scenarios increase the difficulties in developing
12+
systems and algorithms. (4) Such difficulties include balancing the
13+
computing overhead and communication overhead, ensuring that models do
14+
not leak privacy, and enabling algorithms to converge in non-idd
15+
scenarios. All these difficulties require developers to have a deeper
16+
understanding of federated learning scenarios.
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# Horizontal Federated Learning
2+
3+
## Cloud-Cloud Scenarios
4+
5+
In a horizontal federated learning system, multiple participants who
6+
have the same data structure collaboratively establish a machine
7+
learning model by using a cloud server. A typical assumption is that the
8+
participants are trustworthy and that the server is both trustworthy and
9+
curious. This means that no participant is allowed to leak the raw
10+
gradient information to the server. The training process of such a
11+
system generally consists of the following four steps:
12+
13+
1. Each participant computes the training gradient locally, masks the
14+
selected gradient by using technologies such as encryption,
15+
differential privacy, or secret sharing, and sends the masked result
16+
to the server.
17+
18+
2. The server performs secure aggregation without knowing the gradient
19+
information of any participant.
20+
21+
3. The server sends the aggregation result to the participants.
22+
23+
4. The participants update their own models by using decrypted
24+
gradients.
25+
26+
In traditional distributed learning, weights on different training nodes
27+
are synchronized after each single-step training iteration is performed.
28+
However, such synchronization is impossible in federated learning due to
29+
unstable training nodes and high communication costs. To improve the
30+
computation-to-communication ratio and reduce the high energy
31+
consumption caused by frequent communication, Google proposed the
32+
federated averaging (FedAvg) algorithm in 2017. Algorithm
33+
[\[FedAvg\]](#FedAvg){reference-type="ref" reference="FedAvg"} shows the
34+
overall process of FedAvg in cloud-cloud federated learning scenarios.
35+
In each round of training, single-step training is performed multiple
36+
times on each client. The server aggregates the weights of multiple
37+
clients and computes the weighted average value.
38+
39+
:::: algorithm
40+
::: algorithmic
41+
$\rm T$ (Total number of federated learning iterations), $\rm C$ (Number
42+
of FL-Clients participating in federated learning in each iteration),
43+
$\rm model$ (Model) $\rm w$ (Parameters of the final model)  Randomly
44+
select $\rm C$ FL-Clients. // Execution on FL-Clients Receive the weight
45+
($\rm w$) from FL-Servers. Read the weight ($\rm w$) and input it to the
46+
model ($\rm model$). train($\rm model$) Send the weight ($w$) of the
47+
model and the size of training data to FL-Servers. // Execution on
48+
FL-Servers Receive the weight set ($\rm w_{1,...,C}$) from FL-Clients.
49+
$\rm w=allreduce(w_{1,...,C})$ Send the weight ($\rm w$) to FL-Clients.
50+
:::
51+
::::
52+
53+
## Device-Cloud Scenarios
54+
55+
The overall process of device-cloud federated learning is similar to
56+
that of cloud-cloud federated learning, but there are some minor
57+
differences. To be specific, device-cloud federated learning faces
58+
difficulties in the following three aspects:
59+
60+
1. **High communication costs.** The communication overhead involved in
61+
device-cloud federated learning is mainly due to the volume of
62+
one-time communication traffic, whereas that in cloud-cloud
63+
federated learning is mainly due to the high communication
64+
frequency. In device-cloud federated learning scenarios, because a
65+
WLAN or mobile data network is typically used and its speed may be
66+
many orders of magnitude lower than that of local computing, high
67+
communication costs become a key bottleneck of federated learning.
68+
69+
2. **System heterogeneity.** Devices that form part of a federated
70+
learning network may have varied capabilities in terms of storage,
71+
computing, and communication due to changes in hardware conditions
72+
(CPU and memory) of client devices, network connections (3G, 4G, 5G,
73+
or Wi-Fi), and power supply (battery level). And because of network
74+
and device limitations, only some devices may be active at any given
75+
time. In addition, devices may fail to connect instantaneously due
76+
to emergencies such as power failures and network access failures.
77+
The heterogeneous system architecture affects how the federated
78+
learning strategy is formulated overall.
79+
80+
3. **Privacy issues.** Implementing data privacy protection is more
81+
difficult in device-cloud federated learning than in other
82+
distributed learning because the clients in device-cloud federated
83+
learning cannot participate in each iteration. Sensitive information
84+
contained in the update information about the device-cloud transfer
85+
model is also at risk of being exposed to third parties or central
86+
servers during federated learning. Privacy protection has become a
87+
key issue that needs to be considered in device-cloud federated
88+
learning.
89+
90+
To address the preceding difficulties, MindSpore Federated adopts the
91+
distributed FL-Server architecture, which consists of a scheduler
92+
module, server module, and client module, as shown in Figure
93+
:numref:`ch10-federated-learning-architecture`. The following
94+
describes the functions of each module.
95+
96+
![Architecture of a federated learningsystem](../img/ch10/ch10-federated-learning-architecture.png)
97+
:label:`ch10-federated-learning-architecture`
98+
99+
1. **FL-Scheduler:** Assists in cluster networking and is responsible
100+
for delivering management-plane tasks.
101+
102+
2. **FL-Server:** Provides client selection, time-limited
103+
communication, and distributed federated aggregation functions. An
104+
FL-Server must be able to serve tens of millions of devices in
105+
device-cloud scenarios, and support the access of edge servers and
106+
the security processing logic.
107+
108+
3. **FL-Client:** Responsible for local data training and performs
109+
secure encryption on the upload weights when communicating with
110+
FL-Servers.
111+
112+
MindSpore Federated also provides four key features for device-cloud
113+
federated learning:
114+
115+
1. **Time-limited communication:** After connections are established
116+
between FL-Servers and FL-Clients, a global timer and counter are
117+
started. Aggregation can be performed if the proportion of
118+
FL-Clients from which FL-Servers receive trained model parameters
119+
reaches the preset threshold within the preset time window. If the
120+
threshold is not reached within the time window, the next iteration
121+
starts. This ensures that, in scenarios with many connected
122+
FL-Clients, an excessively long training duration or disconnection
123+
of certain FL-Clients will not cause the entire federated learning
124+
process to be suspended.
125+
126+
2. **Loosely coupled networking:** Each FL-Server in an FL-Server
127+
cluster receives and delivers weights to some FL-Clients in order to
128+
ease the bandwidth burden on a single FL-Server. In addition,
129+
FL-Clients can be loosely connected, meaning that the disconnection
130+
of any FL-Client does not affect the global task, and each FL-Client
131+
can obtain all the data required for training by accessing any
132+
FL-Server at any time.
133+
134+
3. **Encryption module:** Multiple encryption algorithms can be
135+
utilized in the MindSpore Federated framework to prevent model
136+
gradient leakage. Such algorithms include the secure aggregation
137+
algorithms based on local differential privacy (LDP) or MPC, and the
138+
Huawei-developed sign-based dimension selection (SignDS)
139+
differential privacy algorithm.
140+
141+
4. **Communication compression module:** MindSpore Federated uses
142+
quantization and sparsification (two universal compress methods) to
143+
compress data into smaller data formats and encode weights. At the
144+
peer end, it decodes the compressed and encoded data into raw data
145+
when model parameters are delivered by FL-Servers and uploaded by
146+
FL-Clients.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Federated Learning System
Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# Overview
2+
3+
With the rapid development of artificial intelligence (AI), large-scale
4+
and high-quality data is playing an increasingly important role in
5+
achieving optimum model effect and user experience. However, further
6+
development of AI is restricted by a data utilization bottleneck,
7+
whereby data cannot be shared among devices due to issues regarding
8+
privacy, supervision, and engineering, resulting in data silos. To
9+
resolve this data silo problem, the concept of federated learning was
10+
proposed back in 2016. It aims to effectively utilize multi-party data
11+
for machine learning modeling while also meeting the requirements of
12+
user privacy protection, data security, and government regulations.
13+
14+
## Definition
15+
16+
Centralizing data from multiple parties means that user privacy
17+
protection cannot be guaranteed --- such an approach would also fail to
18+
comply with relevant laws and regulations. The core idea behind
19+
federated learning is that models move whereas data stays put. It
20+
enables models to move among data parties so that data can be used for
21+
modeling without being transferred out of devices. In federated
22+
learning, data of all parties is retained locally, and machine learning
23+
models are established by exchanging encrypted parameters or other
24+
information (on central servers).
25+
26+
## Application Scenarios
27+
28+
Federated learning can be classified into three categories based on
29+
whether samples and features overlap: horizontal federated learning
30+
(with different samples and overlapping features), vertical federated
31+
learning (with different features and overlapping samples), and
32+
federated transfer learning (without overlapping samples or features).
33+
34+
**Horizontal federated learning** applies to scenarios where different
35+
individual participants have identical features. For example, in an
36+
advertisement recommendation scenario, algorithm developers use data of
37+
a specific feature (e.g., number of clicks, time on page/site, or
38+
frequency of use) relating to different mobile phone users in order to
39+
establish a model. Because such feature data cannot be transferred out
40+
of devices, horizontal federated learning is used to establish models by
41+
combining the feature data of multiple users.
42+
43+
**Vertical federated learning** applies to scenarios with many
44+
overlapping samples but few overlapping features. Take two institutions
45+
as an example: one is an insurance company and the other is a hospital.
46+
The user groups of the two institutions are likely to include many local
47+
residents, meaning that the two institutions may have a large
48+
intersection of users. The insurance company holds data on users'
49+
income, expense statements, and credit ratings, whereas the hospital
50+
holds data on users' health and medical purchase records, resulting in a
51+
small intersection of user features. Vertical federated learning
52+
enhances model capabilities by aggregating different features in an
53+
encrypted state.
54+
55+
**Federated transfer learning** aims to find the similarities between
56+
the source and target fields. Take another two institutions as an
57+
example: one is a bank in country A and the other is an e-commerce
58+
company in country B. The user groups of the two institutions have a
59+
small intersection due to geographical restrictions. In addition,
60+
because the two institutions are dissimilar, only a small part of their
61+
data features overlap. In this case, federated transfer learning is one
62+
of the only ways to implement federated learning effectively and improve
63+
the model effect because it can overcome the fact that there is limited
64+
single-side data and few labeled samples.
65+
66+
## Deployment Scenarios
67+
68+
The architecture of federated learning is similar to that of a parameter
69+
server (i.e., distributed learning in data centers). In both
70+
architectures, centralized servers and distributed clients (i.e.,
71+
multiple clients communicate with one server, and there is no
72+
communication between clients) are used to build a machine learning
73+
model. Based on the scenario in which it is deployed, federated learning
74+
can be classified into cross-silo federated learning and cross-device
75+
federated learning. Generally, users of cross-silo federated learning
76+
are enterprises and institutions, whereas cross-device federated
77+
learning is oriented to portable electronic devices (PEDs), mobile
78+
devices, and the like. Table
79+
[\[ch10-federated-learning-different-connection\]](#ch10-federated-learning-different-connection){reference-type="ref"
80+
reference="ch10-federated-learning-different-connection"} describes the
81+
differences and relationships among distributed learning in data
82+
centers, cross-silo federated learning, and cross-device federated
83+
learning.
84+
85+
[]{#ch10-federated-learning-different-connection
86+
label="ch10-federated-learning-different-connection"}
87+
88+
## Common Frameworks
89+
90+
As users and developers continue to place higher demands on federated
91+
learning technologies, more and more federated learning tools and
92+
frameworks are emerging. The following lists some of the mainstream
93+
federated learning frameworks:
94+
95+
1. **TensorFlow Federated (TFF):** an open-source federated learning
96+
framework developed by Google to promote open research and
97+
experimentation in federated learning. It is used to implement
98+
machine learning and other types of computing on decentralized data.
99+
In this framework, a shared global model is trained among many
100+
participating customers who save their training data locally. For
101+
example, federated learning has been successfully used to train
102+
prediction models for mobile keyboards without uploading sensitive
103+
typed data to the server.
104+
105+
2. **PaddleFL:** an open-source federated learning framework proposed
106+
by Baidu based on PaddlePaddle. It enables researchers to easily
107+
replicate and compare different federated learning algorithms and
108+
allows developers to readily deploy PaddleFL-based federated
109+
learning systems in large-scale distributed clusters. This framework
110+
provides multiple federated learning strategies (e.g., horizontal
111+
federated learning and vertical federated learning) and
112+
corresponding applications in fields such as computer vision,
113+
natural language processing, and recommendation algorithms. It also
114+
supports the application of traditional machine learning training
115+
strategies, for example, applying transfer learning to multitask
116+
learning and federated learning environments. PaddleFL can be easily
117+
deployed based on full-stack open-source software and leveraging
118+
PaddlePaddle's large-scale distributed training capability and
119+
Kubernetes' capability of elastically scheduling training tasks.
120+
121+
3. **Federated AI Technology Enabler (FATE):** the world's first
122+
industrial-grade open-source framework proposed by WeBank for
123+
federated learning. It enables enterprises and institutions to
124+
collaborate on data while also ensuring data security and preventing
125+
data privacy leakage. By using secure multi-party computation (MPC)
126+
and homomorphic encryption technologies to build low-level secure
127+
computation protocols, FATE supports secure computation of different
128+
types of machine learning, including logistic regression, tree-based
129+
algorithms, deep learning, and transfer learning. This framework was
130+
opened to the public for the first time in February 2019 along with
131+
the launch of the FATE community, whose members include major cloud
132+
computing and financial service enterprises in China.
133+
134+
4. **FedML:** an open-source research and baseline library proposed by
135+
the University of Southern California (USC) for federated learning.
136+
It facilitates the development of new federated learning algorithms
137+
and fair performance comparison. FedML supports three computing
138+
paradigms (i.e., distributed training, training on mobile devices,
139+
and independent simulation) for users to conduct experiments in
140+
different system environments. It also implements and promotes
141+
diversified algorithm research through flexible and general-purpose
142+
API design and reference baselines. To enable fair comparison of
143+
federated learning algorithms, FedML provides comprehensive
144+
benchmark datasets, including non-independent and identically
145+
distributed (non-iid) datasets.
146+
147+
5. **PySyft:** a Python library released by University College London
148+
(UCL), DeepMind, and OpenMined for deep learning of security and
149+
privacy. It involves federated learning, and differential privacy
150+
(An encryption method: The differential privacy method is used to
151+
ensure that the impact of a single record on the data set is always
152+
lower than a certain threshold when the information is output, so
153+
that the third party cannot judge the change or deletion of a single
154+
record according to the change of the output. This method is
155+
considered as the highest security level in the current
156+
perturbation-based privacy protection method), and multi-party
157+
learning. PySyft uses differential privacy and encrypted computation
158+
(MPC and homomorphic encryption) to decouple private data from model
159+
training.
160+
161+
6. **Fedlearner:** a vertical federated learning framework proposed by
162+
ByteDance for joint modeling based on data distributed among
163+
institutions. It comes with peripheral infrastructure for cluster
164+
management, job management, job monitoring, and network proxy.
165+
Fedlearner uses the cloud-native deployment solution and stores data
166+
in Hadoop Distributed File System (HDFS), and manages and starts
167+
tasks through Kubernetes. The two parties involved in a Fedlearner
168+
training task need to start the task simultaneously by using
169+
Kubernetes. All training tasks are managed by the master node in a
170+
unified manner, and the communication is implemented through Worker.
171+
172+
7. **OpenFL:** a Python framework proposed by Intel for federated
173+
learning. OpenFL is designed to be a flexible, scalable, and
174+
easy-to-learn tool for data scientists.
175+
176+
8. **Flower:** an open-source federated learning system released by the
177+
University of Cambridge for performing optimization in application
178+
scenarios where federated learning algorithms are deployed on
179+
large-scale heterogeneous devices.
180+
181+
9. **MindSpore Federated:** an open-source federated learning framework
182+
proposed by Huawei. It supports the commercial deployment of tens of
183+
millions of stateless devices, and enables all-scenario intelligent
184+
applications when user data is stored locally. MindSpore Federated
185+
focuses on horizontal federated learning involving a large number of
186+
participants, enabling them to jointly build AI models without
187+
sharing local data. It mainly addresses the difficulties of
188+
deploying federated learning in industrial scenarios, including
189+
difficulties in privacy security, large-scale federated aggregation,
190+
semi-supervised federated learning, communication compression, and
191+
cross-platform deployment.

0 commit comments

Comments
 (0)