Skip to content

Commit 1a53454

Browse files
authored
Merge pull request #1920 from iArpanPatel/cms/iArpanPatel/hpe-dev-portal/blog/announcing-hpe-swarm-learning-2-0-0
Create Blog “announcing-hpe-swarm-learning-2-0-0”
2 parents c2592f3 + fb071cb commit 1a53454

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: Announcing HPE Swarm Learning 2.0.0
3+
date: 2023-06-12T16:48:53.923Z
4+
featuredBlog: true
5+
author: HPE Swarm Learning Team
6+
authorimage: /img/Avatar1.svg
7+
disable: false
8+
tags:
9+
- swarm-learning
10+
---
11+
We’re excited to announce the HPE Swarm Learning 2.0.0 community release!!
12+
13+
In the previous version of HPE Swarm Learning, if the sentinel Swarm Network (SN) node goes down during Swarm training, the training process would stop, and there was no way to resume it. However, with this release, we have addressed the issue by implementing a mesh topology (connectivity) between SNs, replacing the previous star topology where only the sentinel SN was connected to other SNs.
14+
15+
Also, we now support multiple blockchain miners instead of just one miner in the sentinel SN. Now, even if the initial sentinel SN goes down, since other SNs also function as miners, it allows the training to continue uninterrupted. Additionally, when the initial sentinel SN is down and if a new SN wants to join the network, it can seamlessly integrate and join the Swarm network with the help of any other SN node. This **high availability configuration** ensures improved resilience and robustness of HPE Swarm Learning.
16+
17+
In the HPE Swarm Learning sync stage (defined by sync frequency), when it is time to share the learning from the individual model, one of the Swarm Learning (SL) nodes is designated as the “leader” node. This leader node collects the individual models from each peer node and merges them into a single model by combining parameters of all the individuals. The **Leader Failure Detection and Recovery (LFDR)** feature enables SL nodes to continue Swarm training during the merging process when an SL leader node fails. A new SL leader node is selected to continue the merging process. If the failed SL leader node comes back after the new SL leader node is in action, the failed SL leader node is treated as a normal SL node and contributes its learning to the swarm global model.
18+
19+
With the HPE Swarm Learning v2.0.0 release, a user can now extend a Swarm client to support other machine learning platforms as well. Currently Swarm client supports machine learning platforms like PyTorch and Keras (based on Tensorflow 2 in backend). Please find the instructions to extend Swarm client [here](https://github.com/HewlettPackard/swarm-learning/blob/master/lib/src/README.md).
20+
21+
#### **2.0.0 release contains following updates:**
22+
23+
* High availability for SN
24+
25+
* Handles Sentinel node failure
26+
* Ensures any SN node can act as sentinel while adding new node
27+
* Supports mesh topology of SN network
28+
* High availability for SL leader
29+
30+
* Elects a new merge leader when a leader failure is detected
31+
* Handles stale leader recovery
32+
* Swarm Learning Management UI (SLM-UI)
33+
34+
* Supports Swarm product installation through SLM-UI
35+
* Deploys and manages Swarm Learning through SLM-UI
36+
* Swarm client library
37+
38+
* Extends Swarm Learning for new ML platforms
39+
* Improved diagnostics and utility script for logs collection.
40+
41+
#### For complete details on this new release, please refer to the following resources:
42+
43+
* [HPE Swarm Learning home page](https://github.com/HewlettPackard/swarm-learning)
44+
* [HPE Swarm Learning client readme](https://github.com/HewlettPackard/swarm-learning/blob/master/lib/src/README.md)
45+
46+
For any questions, start a discussion in our [\#hpe-swarm-learning](https://hpedev.slack.com/archives/C04A5DK9TUK) slack channel on [HPE Developer Slack Workspace](https://slack.hpedev.io/)

0 commit comments

Comments
 (0)