You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Swarm Learning is a decentralized, privacy-preserving Machine Learning framework. This framework utilizes the computing power at, or near, the distributed data sources to run the Machine Learning algorithms that train the models. It uses the security of a blockchain platform to share learnings with peers in a safe and secure manner. In Swarm Learning, training of the model occurs at the edge, where data is most recent, and where prompt, data-driven decisions are mostly necessary. In this completely decentralized architecture, only the insights learned are shared with the collaborating ML peers, not the raw data. This tremendously enhances data security and privacy.
5
5
6
6
Swarm Learning nodes works in collaboration with other Swarm Learning nodes in the network. It regularly shares its learnings with the other nodes and incorporates their insights. This process continues until the Swarm Learning nodes train the model to desired state. User can monitor the progress of the current training as shown in the below image. It shows all running Swarm nodes, loss, model metric (for example, accuracy) and overall training progress for each User ML node. On hovering over the "progress bar", one can see the number of completed epochs and the total number of epochs.
@@ -38,7 +38,7 @@ NOTE: The participating nodes must be able to access each other's ports.
38
38
39
39
40
40
## User ML component
41
-
User can transform/modify any Keras or PyTorch based ML program that is written using Python3 into a Swarm Learning ML program by [making a few simple changes](./docs/User/How_to_Swarm_enable_an_ML_algorithm.md) to the model training code by including the `SwarmCallback` API. For more information, see any of the [examples](/examples/README.md) included with the Swarm Learning package.
41
+
User can transform/modify any Keras or PyTorch or HuggingFace Trainer class based ML program that is written using Python3 into a Swarm Learning ML program by [making a few simple changes](./docs/User/How_to_Swarm_enable_an_ML_algorithm.md) to the model training code by including the `SwarmCallback` API. For more information, see any of the [examples](/examples/README.md) included with the Swarm Learning package.
42
42
43
43
The transformed user Machine Learning \(user ML node\) program can be built as a Docker container or can be run on the host.
44
44
@@ -50,19 +50,20 @@ NOTE: HPE recommends users to build an ML Docker container for easier and automa
50
50
The ML node is responsible to train and iteratively update the model. For each ML node, there is a corresponding SL node in the Swarm Learning framework, which performs the Swarm training. Each pair of ML and SL nodes must run on the same host. This process continues until the SL nodes train the model to the desired state.
51
51
52
52
<blockquote>
53
-
NOTE: All the ML nodes must use the same ML platform either Keras (based on TensorFlow 2 backend)or PyTorch. Using Keras for some and PyTorch for the other nodes is not supported.
53
+
NOTE: All the ML nodes must use the same ML platform either Keras (based on TensorFlow 2 backend), PyTorch, or HuggingFace Trainer class. Using Keras for some and PyTorch for the other nodes is not supported.
54
54
</blockquote>
55
55
56
56
## Quick Start
57
57
1.[Prerequisites](/docs/Install/Prerequisites.md) for Swarm Learning
58
58
2.[Upgrading from earlier versions](/docs/Install/Versioning_and_upgrade.md)
59
59
3.[Download and setup Swarm Learning](/docs/Install/HPE_Swarm_Learning_installation.md) using the SLM-UI installer
60
-
4. Execute a simple predefined example - [MNIST example](/examples/mnist/README.md)
61
-
5.[Running MNIST example using SLM-UI](/docs/User/Running_MNIST_example_using_SLM-UI.md)
62
-
6.[Monitoring & Tracking Swarm Learning training using SLM-UI](/docs/User/Monitoring_Swarm_Learning_training_using_SLM-UI.md)
@@ -104,8 +105,7 @@ NOTE: The examples and scripts that are bundled with the Swarm UI installer **ma
104
105
Refer to [Acronyms and Abbreviations](docs/Generic/acronyms.md) for more information.
105
106
106
107
## Getting in touch
107
-
Feedback and questions are appreciated. You can use the issue tracker to report bugs on GitHub. (Or)
108
-
Join the [HPE Developer Slack Workspace](https://slack.hpedev.io/) and start a discussion in our [#hpe-swarm-learning](https://hpedev.slack.com/archives/C04A5DK9TUK) channel.
108
+
Feedback and questions are appreciated. You can use the issue tracker to report bugs on GitHub.
109
109
110
110
## Contributing
111
111
Refer to [Contributing](docs/Generic/CONTRIBUTING.md) for more information.
Copy file name to clipboardExpand all lines: docs/Install/Environment_variables.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,6 +24,7 @@ The following environment variables are available to set and modify:
24
24
|`SL_LEADER_FAILURE_BASE_TIMEOUT`|Sets the minimum timeout value \(in seconds\). If Swarm merging does not happen within this timeout, a new SL leader node is selected. The swarm training continues to run, regardless of SL leader node failures. This timeout will kickin after `min_peers` nodes have completed their local training. <br> Default value: 600 seconds. <br>This variable may need tunning depending on the ML application complexity.|
25
25
|`SL_WAIT_FOR_FULL_QUORUM_SECONDS`|Sets the maximum time for an SL leader node to wait for full quorum after minPeers are ready for merge. This parameter lets you to maximize the number of peers participating in the merge process.<br>Default value: 30 secs|
26
26
|`SL_RAM_INTENSIVE`|Optimizes the usage of RAM in the SL leader node for coordinate and geometric median merge methods. Unlike mean merge method, coordinate and geometric median merge methods involve memory intensive operations. If SL Leader node has limited hardware \(RAM\) configuration, then merging the intermediate model parameters using the median methods can result in memory issues. For such scenarios, user can set up the SL\_RAM\_INTENSIVE flag to 'False' for merging the model parameters layer by layer. This 'False' option is based on I/O operations and is time consuming, hence the default option is set to 'True'.<br> User can pass this parameter in slenvvars option within SWOP profile. This option can be different for each SL node depending on its hardware capacity. Example: 'slenvvars : \[SL\_RAM\_INTENSIVE : False\]' <br> Default value: True|
27
+
|`SL_MODEL_PARAMS_COMPRESSION_THRESHOLD_MB`|Adaptive compression threshold for model parameters (in MB). Model parameter files smaller than this threshold will be compressed to reduce network transfer time. Larger files skip compression to avoid disk I/O contention and CPU blocking. <br> Default Value: 250 (MB).|
27
28
|`SWCI_RUN_TASK_MAX_WAIT_TIME`|Specifies a maximum timeout value for the completion of a Run task (RUN_SWARM).<br>This value must be set in minutes, and the default is 120 mins (2 hours).|
28
29
|`SWCI_GENERIC_TASK_MAX_WAIT_TIME`|Specifies a maximum timeout value for the completion of tasks other than RUN_SWARM type task.<br>This value must be set in minutes, and the default is 120 mins (2 hours).|
29
30
|`SWCI_MODE`| Enables SWCIs web interface instead of command line interface. Allowed values are CLI and WEB.<br> Default value: CLI<br> |
Copy file name to clipboardExpand all lines: docs/Install/Install_the_License_Server.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,13 @@
1
1
# <aname="GUID-CCE936EF-FB0D-4BF1-B002-3CB9125C55B9"/> Installing the License Server
2
2
3
-
1. After purchasing Swarm Learning from HPE, you will receive an email with a download link **Access Your Products**.
3
+
1. After purchasing Swarm Learning from HPE, you will receive an email with a download link **Access Your Products**. If you are using the free community version,
4
+
then you can skip this and directly click the MY HPE SOFTWARE CENTER(MSC) link given below.
4
5
5
6
2. From the email, click **Access Your Products**. You are redirected to [MY HPE SOFTWARE CENTER](https://myenterpriselicense.hpe.com/cwp-ui/auth/login).
6
7
7
8
3. If you have the HPE Passport account, enter the credentials and **Sign In**. If you do not have it, create the HPE Passport Account and **Sign In**.
8
9
9
-
After signing in, you should see the Software Notification Message Receipt page listing the products.
10
+
After signing in, you should see the Software Notification Message Receipt page listing the products. If you are using the free community version, then in the MSC page, click Software->Search -> Product Info -> "Swarm Learning" (as search term). In the search results, choose "HPE Swarm Learning Community edition" ver 2.2.0 > Action (drop down) -> Product Details -> Installation -> Pre install APLS and download APLS software & documentation ZIP file. For quick reference, APLS container based steps are mentioned below.
10
11
11
12
4. Download APLS container and run it using the following procedures.
7. Go to the Software Notification Message Receipt page and click **Access Your Products**.
80
+
7. Go to the Software Notification Message Receipt page and click **Access Your Products**.
80
81
81
-
You will be navigated to the [MY HPE SOFTWARE CENTER](https://myenterpriselicense.hpe.com/cwp-ui/auth/login) home page. After signing in with your HPE Passport credentials, you will see the **Activate** page.
82
+
You will be navigated to the [MY HPE SOFTWARE CENTER](https://myenterpriselicense.hpe.com/cwp-ui/auth/login) home page. After signing in with your HPE Passport credentials, you will see the **Activate** page. If you are using the free community version, then in the MSC page, click Software->Search -> Product Info -> "Swarm Learning" (as search term). In the search results, choose "HPE Swarm Learning Community edition" ver 2.2.0 > Action (drop down) -> Get License
1. Using SLM-UI Installer, you can install the SLM-UI on one host.
3
+
### Pre-requisite:
4
+
APLS license server is installed and Swarm licenses are installed as detailed in the [License server installation steps](Install_the_License_Server.md)
5
+
6
+
## Manual installation for 2.3.0 version:
7
+
We support **only manual** installation for 2.3.0 version. You need to:
8
+
1. Either Clone or download this git repo on **each host machine** where you want to install Swarm learning.
9
+
10
+
2. If your downloading, then navigate to the main page of the repository. To the right of the list of files, click Releases and select 2.3.0 version. Scroll down to the "Assets" section of the release, click Source code (tar.gz). Copy and extract the tar.gz **on each host machine**
11
+
12
+
3. Preferable to extract it under /opt/hpe/swarm-learning.
You can skip rest of the installation steps mentioned below.
27
+
28
+
## Automatic installation for 2.2.0 version:
29
+
Installing Swarm Learning is a two-step process using the GUI.
30
+
31
+
1. Using SLM-UI Installer GUI, you can install the SLM-UI on one linux host.
6
32
2. Using SLM-UI, you can install SL in multiple hosts and run the examples.
7
33
8
34
1. Navigate to the [MY HPE SOFTWARE CENTER](https://myenterpriselicense.hpe.com/cwp-ui/auth/login) home page.
9
35
10
36
2. Perform the following actions after signing in with your HPE Passport credentials:
11
37
12
-
1. Go to **My Activations** and select your ordered product.
38
+
1. Go to **My Activations** and select your ordered product. If you are using the free community version, then in the MSC page, click Software->Search -> Product Info -> "Swarm Learning" (as search term). In the search results, choose "HPE Swarm Learning Community edition" ver 2.2.0 > Action (drop down)
13
39
14
40
2. Go to **Action** pull down and then select **Download/Re-download** page.
15
41
16
42
3. Select and download listed software files.
17
43
18
-
- The tar file containing docs and scripts.
19
-
20
-
- The signature file for the above tar file.
21
-
22
44
- The docker digest hash file \(JSON\).
23
45
24
46
- Download the Swarm Learning SLM-UI installer for your platform, Mac, Windows, or Linux.
0 commit comments