openmlsys
diff --git a/‎chapter_federated_learning/Conclusions.md‎
Lines changed: 16 additions & 0 deletions b/‎chapter_federated_learning/Conclusions.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎chapter_federated_learning/Horizontal_Federated_Learning.md‎
Lines changed: 146 additions & 0 deletions b/‎chapter_federated_learning/Horizontal_Federated_Learning.md‎
Lines changed: 146 additions & 0 deletions
diff --git a/‎chapter_federated_learning/Index.md‎
Lines changed: 1 addition & 0 deletions b/‎chapter_federated_learning/Index.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎chapter_federated_learning/Overview.md‎
Lines changed: 191 additions & 0 deletions b/‎chapter_federated_learning/Overview.md‎
Lines changed: 191 additions & 0 deletions
@@ -0,0 +1,16 @@
+# Conclusions
+
+\(1\) This chapter briefly introduced the background, system
+architecture, FedAvg algorithm, privacy encryption algorithms, and
+deployment challenges of federated learning. (2) Federated learning is
+an AI algorithm that is on the verge of becoming mainstream. It can be
+used to establish effective machine learning models under the
+constraints of data protection and data silos. (3) In addition, the
+distinct characteristics (data on the devices is not uploaded, the
+security and privacy requirements are high, and data is non-idd) of
+federated learning scenarios increase the difficulties in developing
+systems and algorithms. (4) Such difficulties include balancing the
+computing overhead and communication overhead, ensuring that models do
+not leak privacy, and enabling algorithms to converge in non-idd
+scenarios. All these difficulties require developers to have a deeper
+understanding of federated learning scenarios.
@@ -0,0 +1,146 @@
+# Horizontal Federated Learning
+
+## Cloud-Cloud Scenarios
+
+In a horizontal federated learning system, multiple participants who
+have the same data structure collaboratively establish a machine
+learning model by using a cloud server. A typical assumption is that the
+participants are trustworthy and that the server is both trustworthy and
+curious. This means that no participant is allowed to leak the raw
+gradient information to the server. The training process of such a
+system generally consists of the following four steps:
+
+1.  Each participant computes the training gradient locally, masks the
+    selected gradient by using technologies such as encryption,
+    differential privacy, or secret sharing, and sends the masked result
+    to the server.
+
+2.  The server performs secure aggregation without knowing the gradient
+    information of any participant.
+
+3.  The server sends the aggregation result to the participants.
+
+4.  The participants update their own models by using decrypted
+    gradients.
+
+In traditional distributed learning, weights on different training nodes
+are synchronized after each single-step training iteration is performed.
+However, such synchronization is impossible in federated learning due to
+unstable training nodes and high communication costs. To improve the
+computation-to-communication ratio and reduce the high energy
+consumption caused by frequent communication, Google proposed the
+federated averaging (FedAvg) algorithm in 2017. Algorithm
+[\[FedAvg\]](#FedAvg){reference-type="ref" reference="FedAvg"} shows the
+overall process of FedAvg in cloud-cloud federated learning scenarios.
+In each round of training, single-step training is performed multiple
+times on each client. The server aggregates the weights of multiple
+clients and computes the weighted average value.
+
+:::: algorithm
+::: algorithmic
+$\rm T$ (Total number of federated learning iterations), $\rm C$ (Number
+of FL-Clients participating in federated learning in each iteration),
+$\rm model$ (Model) $\rm w$ (Parameters of the final model)  Randomly
+select $\rm C$ FL-Clients. // Execution on FL-Clients Receive the weight
+($\rm w$) from FL-Servers. Read the weight ($\rm w$) and input it to the
+model ($\rm model$). train($\rm model$) Send the weight ($w$) of the
+model and the size of training data to FL-Servers. // Execution on
+FL-Servers Receive the weight set ($\rm w_{1,...,C}$) from FL-Clients.
+$\rm w=allreduce(w_{1,...,C})$ Send the weight ($\rm w$) to FL-Clients.
+:::
+::::
+
+## Device-Cloud Scenarios
+
+The overall process of device-cloud federated learning is similar to
+that of cloud-cloud federated learning, but there are some minor
+differences. To be specific, device-cloud federated learning faces
+difficulties in the following three aspects:
+
+1.  **High communication costs.** The communication overhead involved in
+    device-cloud federated learning is mainly due to the volume of
+    one-time communication traffic, whereas that in cloud-cloud
+    federated learning is mainly due to the high communication
+    frequency. In device-cloud federated learning scenarios, because a
+    WLAN or mobile data network is typically used and its speed may be
+    many orders of magnitude lower than that of local computing, high
+    communication costs become a key bottleneck of federated learning.
+
+2.  **System heterogeneity.** Devices that form part of a federated
+    learning network may have varied capabilities in terms of storage,
+    computing, and communication due to changes in hardware conditions
+    (CPU and memory) of client devices, network connections (3G, 4G, 5G,
+    or Wi-Fi), and power supply (battery level). And because of network
+    and device limitations, only some devices may be active at any given
+    time. In addition, devices may fail to connect instantaneously due
+    to emergencies such as power failures and network access failures.
+    The heterogeneous system architecture affects how the federated
+    learning strategy is formulated overall.
+
+3.  **Privacy issues.** Implementing data privacy protection is more
+    difficult in device-cloud federated learning than in other
+    distributed learning because the clients in device-cloud federated
+    learning cannot participate in each iteration. Sensitive information
+    contained in the update information about the device-cloud transfer
+    model is also at risk of being exposed to third parties or central
+    servers during federated learning. Privacy protection has become a
+    key issue that needs to be considered in device-cloud federated
+    learning.
+
+To address the preceding difficulties, MindSpore Federated adopts the
+distributed FL-Server architecture, which consists of a scheduler
+module, server module, and client module, as shown in Figure
+:numref:`ch10-federated-learning-architecture`. The following
+describes the functions of each module.
+
+![Architecture of a federated learningsystem](../img/ch10/ch10-federated-learning-architecture.png)
+:label:`ch10-federated-learning-architecture`
+
+1.  **FL-Scheduler:** Assists in cluster networking and is responsible
+    for delivering management-plane tasks.
+
+2.  **FL-Server:** Provides client selection, time-limited
+    communication, and distributed federated aggregation functions. An
+    FL-Server must be able to serve tens of millions of devices in
+    device-cloud scenarios, and support the access of edge servers and
+    the security processing logic.
+
+3.  **FL-Client:** Responsible for local data training and performs
+    secure encryption on the upload weights when communicating with
+    FL-Servers.
+
+MindSpore Federated also provides four key features for device-cloud
+federated learning:
+
+1.  **Time-limited communication:** After connections are established
+    between FL-Servers and FL-Clients, a global timer and counter are
+    started. Aggregation can be performed if the proportion of
+    FL-Clients from which FL-Servers receive trained model parameters
+    reaches the preset threshold within the preset time window. If the
+    threshold is not reached within the time window, the next iteration
+    starts. This ensures that, in scenarios with many connected
+    FL-Clients, an excessively long training duration or disconnection
+    of certain FL-Clients will not cause the entire federated learning
+    process to be suspended.
+
+2.  **Loosely coupled networking:** Each FL-Server in an FL-Server
+    cluster receives and delivers weights to some FL-Clients in order to
+    ease the bandwidth burden on a single FL-Server. In addition,
+    FL-Clients can be loosely connected, meaning that the disconnection
+    of any FL-Client does not affect the global task, and each FL-Client
+    can obtain all the data required for training by accessing any
+    FL-Server at any time.
+
+3.  **Encryption module:** Multiple encryption algorithms can be
+    utilized in the MindSpore Federated framework to prevent model
+    gradient leakage. Such algorithms include the secure aggregation
+    algorithms based on local differential privacy (LDP) or MPC, and the
+    Huawei-developed sign-based dimension selection (SignDS)
+    differential privacy algorithm.
+
+4.  **Communication compression module:** MindSpore Federated uses
+    quantization and sparsification (two universal compress methods) to
+    compress data into smaller data formats and encode weights. At the
+    peer end, it decodes the compressed and encoded data into raw data
+    when model parameters are delivered by FL-Servers and uploaded by
+    FL-Clients.
@@ -0,0 +1 @@
+# Federated Learning System
@@ -0,0 +1,191 @@
+# Overview
+
+With the rapid development of artificial intelligence (AI), large-scale
+and high-quality data is playing an increasingly important role in
+achieving optimum model effect and user experience. However, further
+development of AI is restricted by a data utilization bottleneck,
+whereby data cannot be shared among devices due to issues regarding
+privacy, supervision, and engineering, resulting in data silos. To
+resolve this data silo problem, the concept of federated learning was
+proposed back in 2016. It aims to effectively utilize multi-party data
+for machine learning modeling while also meeting the requirements of
+user privacy protection, data security, and government regulations.
+
+## Definition
+
+Centralizing data from multiple parties means that user privacy
+protection cannot be guaranteed --- such an approach would also fail to
+comply with relevant laws and regulations. The core idea behind
+federated learning is that models move whereas data stays put. It
+enables models to move among data parties so that data can be used for
+modeling without being transferred out of devices. In federated
+learning, data of all parties is retained locally, and machine learning
+models are established by exchanging encrypted parameters or other
+information (on central servers).
+
+## Application Scenarios
+
+Federated learning can be classified into three categories based on
+whether samples and features overlap: horizontal federated learning
+(with different samples and overlapping features), vertical federated
+learning (with different features and overlapping samples), and
+federated transfer learning (without overlapping samples or features).
+
+**Horizontal federated learning** applies to scenarios where different
+individual participants have identical features. For example, in an
+advertisement recommendation scenario, algorithm developers use data of
+a specific feature (e.g., number of clicks, time on page/site, or
+frequency of use) relating to different mobile phone users in order to
+establish a model. Because such feature data cannot be transferred out
+of devices, horizontal federated learning is used to establish models by
+combining the feature data of multiple users.
+
+**Vertical federated learning** applies to scenarios with many
+overlapping samples but few overlapping features. Take two institutions
+as an example: one is an insurance company and the other is a hospital.
+The user groups of the two institutions are likely to include many local
+residents, meaning that the two institutions may have a large
+intersection of users. The insurance company holds data on users'
+income, expense statements, and credit ratings, whereas the hospital
+holds data on users' health and medical purchase records, resulting in a
+small intersection of user features. Vertical federated learning
+enhances model capabilities by aggregating different features in an
+encrypted state.
+
+**Federated transfer learning** aims to find the similarities between
+the source and target fields. Take another two institutions as an
+example: one is a bank in country A and the other is an e-commerce
+company in country B. The user groups of the two institutions have a
+small intersection due to geographical restrictions. In addition,
+because the two institutions are dissimilar, only a small part of their
+data features overlap. In this case, federated transfer learning is one
+of the only ways to implement federated learning effectively and improve
+the model effect because it can overcome the fact that there is limited
+single-side data and few labeled samples.
+
+## Deployment Scenarios
+
+The architecture of federated learning is similar to that of a parameter
+server (i.e., distributed learning in data centers). In both
+architectures, centralized servers and distributed clients (i.e.,
+multiple clients communicate with one server, and there is no
+communication between clients) are used to build a machine learning
+model. Based on the scenario in which it is deployed, federated learning
+can be classified into cross-silo federated learning and cross-device
+federated learning. Generally, users of cross-silo federated learning
+are enterprises and institutions, whereas cross-device federated
+learning is oriented to portable electronic devices (PEDs), mobile
+devices, and the like. Table
+[\[ch10-federated-learning-different-connection\]](#ch10-federated-learning-different-connection){reference-type="ref"
+reference="ch10-federated-learning-different-connection"} describes the
+differences and relationships among distributed learning in data
+centers, cross-silo federated learning, and cross-device federated
+learning.
+
+[]{#ch10-federated-learning-different-connection
+label="ch10-federated-learning-different-connection"}
+
+## Common Frameworks
+
+As users and developers continue to place higher demands on federated
+learning technologies, more and more federated learning tools and
+frameworks are emerging. The following lists some of the mainstream
+federated learning frameworks:
+
+1.  **TensorFlow Federated (TFF):** an open-source federated learning
+    framework developed by Google to promote open research and
+    experimentation in federated learning. It is used to implement
+    machine learning and other types of computing on decentralized data.
+    In this framework, a shared global model is trained among many
+    participating customers who save their training data locally. For
+    example, federated learning has been successfully used to train
+    prediction models for mobile keyboards without uploading sensitive
+    typed data to the server.
+
+2.  **PaddleFL:** an open-source federated learning framework proposed
+    by Baidu based on PaddlePaddle. It enables researchers to easily
+    replicate and compare different federated learning algorithms and
+    allows developers to readily deploy PaddleFL-based federated
+    learning systems in large-scale distributed clusters. This framework
+    provides multiple federated learning strategies (e.g., horizontal
+    federated learning and vertical federated learning) and
+    corresponding applications in fields such as computer vision,
+    natural language processing, and recommendation algorithms. It also
+    supports the application of traditional machine learning training
+    strategies, for example, applying transfer learning to multitask
+    learning and federated learning environments. PaddleFL can be easily
+    deployed based on full-stack open-source software and leveraging
+    PaddlePaddle's large-scale distributed training capability and
+    Kubernetes' capability of elastically scheduling training tasks.
+
+3.  **Federated AI Technology Enabler (FATE):** the world's first
+    industrial-grade open-source framework proposed by WeBank for
+    federated learning. It enables enterprises and institutions to
+    collaborate on data while also ensuring data security and preventing
+    data privacy leakage. By using secure multi-party computation (MPC)
+    and homomorphic encryption technologies to build low-level secure
+    computation protocols, FATE supports secure computation of different
+    types of machine learning, including logistic regression, tree-based
+    algorithms, deep learning, and transfer learning. This framework was
+    opened to the public for the first time in February 2019 along with
+    the launch of the FATE community, whose members include major cloud
+    computing and financial service enterprises in China.
+
+4.  **FedML:** an open-source research and baseline library proposed by
+    the University of Southern California (USC) for federated learning.
+    It facilitates the development of new federated learning algorithms
+    and fair performance comparison. FedML supports three computing
+    paradigms (i.e., distributed training, training on mobile devices,
+    and independent simulation) for users to conduct experiments in
+    different system environments. It also implements and promotes
+    diversified algorithm research through flexible and general-purpose
+    API design and reference baselines. To enable fair comparison of
+    federated learning algorithms, FedML provides comprehensive
+    benchmark datasets, including non-independent and identically
+    distributed (non-iid) datasets.
+
+5.  **PySyft:** a Python library released by University College London
+    (UCL), DeepMind, and OpenMined for deep learning of security and
+    privacy. It involves federated learning, and differential privacy
+    (An encryption method: The differential privacy method is used to
+    ensure that the impact of a single record on the data set is always
+    lower than a certain threshold when the information is output, so
+    that the third party cannot judge the change or deletion of a single
+    record according to the change of the output. This method is
+    considered as the highest security level in the current
+    perturbation-based privacy protection method), and multi-party
+    learning. PySyft uses differential privacy and encrypted computation
+    (MPC and homomorphic encryption) to decouple private data from model
+    training.
+
+6.  **Fedlearner:** a vertical federated learning framework proposed by
+    ByteDance for joint modeling based on data distributed among
+    institutions. It comes with peripheral infrastructure for cluster
+    management, job management, job monitoring, and network proxy.
+    Fedlearner uses the cloud-native deployment solution and stores data
+    in Hadoop Distributed File System (HDFS), and manages and starts
+    tasks through Kubernetes. The two parties involved in a Fedlearner
+    training task need to start the task simultaneously by using
+    Kubernetes. All training tasks are managed by the master node in a
+    unified manner, and the communication is implemented through Worker.
+
+7.  **OpenFL:** a Python framework proposed by Intel for federated
+    learning. OpenFL is designed to be a flexible, scalable, and
+    easy-to-learn tool for data scientists.
+
+8.  **Flower:** an open-source federated learning system released by the
+    University of Cambridge for performing optimization in application
+    scenarios where federated learning algorithms are deployed on
+    large-scale heterogeneous devices.
+
+9.  **MindSpore Federated:** an open-source federated learning framework
+    proposed by Huawei. It supports the commercial deployment of tens of
+    millions of stateless devices, and enables all-scenario intelligent
+    applications when user data is stored locally. MindSpore Federated
+    focuses on horizontal federated learning involving a large number of
+    participants, enabling them to jointly build AI models without
+    sharing local data. It mainly addresses the difficulties of
+    deploying federated learning in industrial scenarios, including
+    difficulties in privacy security, large-scale federated aggregation,
+    semi-supervised federated learning, communication compression, and
+    cross-platform deployment.