Skip to content

Commit 5a2f39e

Browse files
authored
Merge pull request #116 from iArpanPatel/master
Community refresh 1.1.0
2 parents c0cbf4f + 1586183 commit 5a2f39e

23 files changed

+176
-82
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# <d></d> <img style="float: right;" src="docs/images/GettyImages-1148109728_EAA-graphic-A_112_0_72_RGB.jpg?raw=true"/> SWARM LEARNING
22

3-
#### Product version: 1.0.0
3+
#### Product version: 1.1.0
44

55
Swarm Learning is a decentralized, privacy-preserving Machine Learning framework. This framework utilizes the computing power at, or near, the distributed data sources to run the Machine Learning algorithms that train the models. It uses the security of a blockchain platform to share learnings with peers in a safe and secure manner. In Swarm Learning, training of the model occurs at the edge, where data is most recent, and where prompt, data-driven decisions are mostly necessary. In this completely decentralized architecture, only the insights learned are shared with the collaborating ML peers, not the raw data. This tremendously enhances data security and privacy.
66

docs/Install/Environment_variables.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,13 @@ The environment variables are passed to containers or added to the environment v
1010
|`SN_ETH_PORT_EXT`|Sets an Ethernet port for Swarm Network node.|
1111
|`SN_I_AM_SENTINEL`| Sets a Swarm Network node to become the Sentinel node, only when it is set to true.<br> Default value: False<br> |
1212
|`SN_START_MINING`| Starts mining on non-sentinel nodes. \(Optional\)<br> Default value: False<br> |
13-
|`SL_WAIT_FOR_FULL_QUORUM_SECONDS`|Sets maximum time to wait for full quorum before an SL node, designated as leader node, decides to use minPeers nodes.|
13+
|`SL_WAIT_FOR_FULL_QUORUM_SECONDS`|Sets the maximum time for an SL leader node to wait for full quorum after minPeers are ready for merge. This parameter lets you to maximize the number of peers participating in the merge process.<br>Default value: 30 secs|
14+
|`SWCI_TASK_MAX_WAIT_TIME`|Specifies a maximum timeout value for the completion of a task.<br>This value must be set in minutes, and the default is 120 mins (2 hours).
1415
|`SWCI_MODE`| Enables SWCIs web interface instead of command line interface. Allowed values are CLI and WEB.<br> Default value: CLI<br> |
1516
|`SWCI_STARTUP_SCRIPT`|This is a default start script of SWCI.|
1617
|`SWCI_WEB_PORT`|Default port on which SWCI-WEB starts server.|
1718
|`SWOP_PROFILE`|Indicates default profile for SWOP.|
19+
|`SWOP_KEEP_CONTAINERS`|By default, SL and ML containers spawned by SWOP are removed. This option can be enabled to retain the stopped containers for debugging.<br>Default value: False|
1820
|`SWARM_ID_CACERT`|Indicates user CA certificates file.|
1921
|`SWARM_ID_CAPATH`|Indicates user CA certificates directory.|
2022
|`SWARM_ID_CERT`|Indicates user certificates file.|

docs/Install/Install_HPE_Swarm_Learning.md

Lines changed: 22 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,35 @@
11
# <a name="GUID-60017971-B0A9-4119-AEAF-A21594EE5C1E"/> Install HPE Swarm Learning
22

3-
1. After [completing installation of license server and downloading of the Swarm Learning installation files](/docs/Install/Install_the_License_Server.md), you must run one of the Swarm Learning installer based on your platform (Linux, Windows, or Mac as listed below).
3+
1. After [completing installation of license server and downloading of the Swarm Learning installation files](/docs/Install/Install_the_License_Server.md), you must
4+
run one of the Swarm Learning installer based on your platform (Linux, Windows, or Mac as listed below).
45

56
- HPE_SWARM_LEARNING_INSTALLER_LINUX_Q2V41-11036
6-
- HPE_SWARM_LEARNING_INSTALLER_WINDOWS_Q2V41-11038.EXE
7+
- HPE_SWARM_LEARNING_INSTALLER_WINDOWS_Q2V41-11038.exe
78
- HPE_SWARM_LEARNING_INSTALLER_MAC_Q2V41-11039
8-
9-
The Swarm Learning Web App is launched in a web browser.
10-
11-
<blockquote>
12-
IMPORTANT: HPE recommends you to run the downloaded Swarm Learning installer from the terminal window only.
13-
</blockquote>
14-
15-
The installer has a few configurable options. To change the default options, run the installer from a command prompt. Use the following optional flags to customize the configuration or behavior of the installer:
16-
17-
-port
18-
: Defines the port for the application to run. The default value is 30302.
199

20-
Example, `-port 30355`
10+
2. The Swarm Learning Web App is launched in a web browser.
2111

22-
-edition
23-
: Configure the Swarm Learning edition that must be installed. The following are the available options:
12+
<blockquote>
13+
IMPORTANT: HPE recommends you to run the downloaded Swarm Learning installer from the terminal window only.
14+
</blockquote>
2415

25-
eval
26-
: This option installs the community edition \(free edition\) of the Swarm Learning.
16+
The installer has a few configurable options. To change the default options, run the installer from a command prompt. Use the following optional flags to
17+
customize the configuration or behavior of the installer:
2718

28-
Example, `-edition eval`
19+
**-port**
20+
: Defines the port for the application to run. The default value is 30302.<br> Example, `-port 30355`
2921

30-
-logs
31-
: If enabled, displays the detail message on the CLI during the installation. To enable, use the command, `-logs verbose`.
22+
**-logs**
23+
: If enabled, displays the detail message on the CLI during the installation.<br> To enable, use the command, `-logs verbose`.
3224

33-
-version
34-
: This option defines the version of docker images that must be installed. The default value is 1.0.0. Example, `-version 0.3.0`
25+
**-version**
26+
: This option defines the version of docker images that must be installed. The default value is 1.1.0.<br> Example, `-version 0.3.0`
3527

36-
-timeoutDuration
37-
: Defines installer timeout duration for individual installation tasks. The default value is 300 seconds.
38-
39-
Example, `-timeoutDuration 600`
28+
**-timeoutDuration**
29+
: Defines installer timeout duration for individual installation tasks. The default value is 300 seconds.<br> Example, `-timeoutDuration 600`
4030

4131
32+
4233
![Overview](GUID-633F271F-2F22-4BB9-91A6-EA50BF8C638A-high.png)
4334

4435
3. Click **Next** in the **Overview** screen.
@@ -57,11 +48,11 @@ The installer has a few configurable options. To change the default options, run
5748
- If a host fails to connect, an error message is displayed.
5849
![Host validation](GUID-60C03DFA-04B4-4884-9CB0-441A3E4351A5=1=en-US=High.png)
5950
5. If there is an error message, click **Click here for more info**. Close the error message dialog, **Retry** or **Configure** the host, and click **Next**.
60-
6. A success message is displayed for all installed hosts. Click **Next**.
61-
62-
<blockquote>
51+
6. A success message is displayed for all installed hosts. Click **Next**.
52+
53+
<blockquote>
6354
NOTE: Unless you configure all the hosts successfully, you cannot go to the next screen.
64-
</blockquote>
55+
</blockquote>
6556

6657
7. Review **Next Steps** and click **Next**.
6758
8. Review the **Summary** screen, which displays all the installed hosts. Click **Finish**.

docs/Install/Install_the_License_Server.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323

2424
Click **Software** (left pane) -> Under **Search** Select "Product Info" -> enter the string "Swarm Learning".
2525

26-
Under the search results, For the product "HPE-SWARM-CMT 1.0.0"-> Click on **Action** -> **Get License**
26+
Under the search results, For the product "HPE-SWARM-CMT 1.1.0"-> Click on **Action** -> **Get License**
2727

2828
6. Enter the lock code you got from the **Install Licenses** page in the HPE Serial Number field and click **Activate**.
2929

docs/Install/Running_Swarm_Learning.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,10 @@ NOTE: These options do not apply to the `swarm-learning/bin/stop-swarm` script.
6565
|`--keep`|Same as `--no-rm`. Request Docker to preserve the container after it exits.| |
6666
|`--no-keep`|Same as `--rm`. Request Docker to automatically remove the container when it exits.| |
6767
|`-h, --help`|This \(helpful\) message.| |
68-
|`--apls-ip <IP address or DNS name>`|The IP address on which APLS is serving license requests.|172.1.1.1|
69-
|`--apls-port <port number>`|The port number on which APLS is serving license requests.|5814|
68+
|`--primary-apls-ip <IP address or DNS name>`|The IP address on which the primary Autopass License Server is serving license requests.|None|
69+
|`--secondary-apls-ip <IP address or DNS name>`|The IP address on which the secondary Autopass License Server is serving license requests.|None|
70+
|`--primary-apls-port <port numberw>`|The port number on which the primary Autopass License Server is serving license requests.|5814|
71+
|`--secondary-apls-port <port number>`|The port number on which the secondary Autopass License Server is serving license requests.|The value assigned to --primary-apls-port|
7072
|`--apls-pdf <path to license PD file>`|The path to the license PD file to be used.|None|
7173
|`--cacert <path to certificates file>`|The path to the file containing the list of CA certificates.|None|
7274
|`--capath <path to certificates directory>`|The path to the directory containing CA certificate files.|None|

docs/Install/Starting_SWCI_nodes.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,14 @@ The run-swci script accepts the following parameters:
2323
|`--usr-dir <dir>`|The host directory that must be used as the user directory by this SWCI node.|None|
2424
|`--init-script-name <swci-init file>`*|Name of the init script file that has SWCI commands to be executed at the start of SWCI. <br> This file must be located inside the user directory at the top level.<br>|`swci-init`|
2525

26-
*If init option is provided, all SWCI commands within this script file are processed before it enters the interactive mode and waits for users commands. Users can simulate a non-interactive SWCI run by having a bunch of SWCI commands and an SWCI `EXIT` command at the end of the `swci-init` file. This could be used for automation.
26+
*If init script option is provided, all SWCI commands within this script file are processed before it enters the interactive mode and waits for users commands. Users can simulate a non-interactive SWCI run by having a bunch of SWCI commands and an SWCI `EXIT` command at the end of the `swci-init` file. This could be used for automation.
27+
28+
<blockquote>
29+
30+
NOTE:
31+
- If you need to use the swci-init script file as-is (default), --usr-dir option must be specified and SWCI looks for this default script file under this user directory.
32+
- If you want to run a script file with a different filename, you must explicitly specify the --init-script-name
33+
and --usr-dir.
34+
- If the --usr-dir is not specified, the SWCI runs in an interactive mode.
35+
36+
</blockquote>
Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
# <a name="GUID-2E350669-7E5A-47BC-AB15-58AC4CFAD9C1"/> Upgrading from earlier evaluation version
2+
The current HPE Swarm Learning release is not compatible with the Eval or community 0.3.0 release. **You must** uninstall the 0.3.0 release before installing the current HPE Swarm Learning version. See [Uninstalling Swarm Learning](Uninstalling_the_Swarm_Learning_package.md).
23

3-
The current Swarm Learning release is not compatible with the earlier Eval.0.3.0 release. **You must** uninstall the earlier version before installing the current HPE Swarm Learning version.
4+
To upgrade to the latest version, see [Install HPE Swarm Learning](Install_HPE_Swarm_Learning.md).
5+
<br>The installer deletes existing Swarm Learning files and directories (`docs`, `examples`, `lib`, and `scripts`) from the location and copies updated Swarm Learning files and directories.
6+
<br>Any directory or file created by users outside of the Swarm Learning directories is preserved during the upgrade process.
47

docs/User/Frequently_asked_questions.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,3 +170,16 @@ Yes. New nodes can be added in the network at any point in the training. Just li
170170

171171
Swarm Learning uses averaging as the merge algorithm. Currently, users cannot specify the merge algorithm. This will be supported in a later release.
172172

173+
## Before enabling Swarm Learning, how to confirm the standalone user application has no issues and runs?
174+
175+
Run the user container with `SWARM_LOOPBACK` set to `TRUE`, this bypasses Swarm Learning to help you quickly develop, integrate, and test your model code with Swarm Learning package. If your code runs to completion and saves the local model it would indicate that the ML application may not have any issues.
176+
177+
If `SWARM_LOOPBACK` is set to TRUE, all Swarm functionality is bypassed, except parameter validation.
178+
179+
This can help you to verify and test integration of the model code with Swarm without spawning any Swarm Learning
180+
containers.
181+
182+
## How to run user container as non-root?
183+
184+
By default, when user ML container is run through SWOP or using the `run-sl` script, the user ML container is run with current user's UID and GID of the host machine. If the current user on the host is non-root, the user container also runs as non-root.
185+

docs/User/How_to_Swarm_enable_an_ML_algorithm.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,13 +38,13 @@ swarmCallback.logger.setLevel(logging.DEBUG)
3838
|---------|-----------|
3939
|`syncFrequency`|Specifies the number of batches of local training to be performed between two swarm sync rounds. If adaptive sync enabled, this is the frequency to be used at the start.|
4040
|`minPeers`|Specifies the minimum number of SL peers required during each synchronization round for Swarm Learning to proceed further.|
41-
|`useAdaptiveSync`|Modulate the next syncFrequency value post each synchronization round based on performance on validation data.|
41+
|`useAdaptiveSync`|Modulate the next syncFrequency value post each synchronization round based on performance on validation data. The default value is false.<br>**Note**: As of now, this option is implemented only for KERAS platform.|
4242
|`adsValData`|Specifies the dataset for generating metrics for adaptive sync logic. It can be either an \(x\_val, y\_val\) tuple or a generator.|
4343
|`adsValBatchSize`|Specifies the batch size for `adsValData`. This is used when `useAdaptiveSync` is turned ON.|
4444
|`checkinModelOnTrainEnd`|Specifies the merge behavior of a SL node after it has achieved stopping criterion and it is waiting for all other peers to complete their training. During this period this SL node does not train the model with local data. This parameter decides the nature of the weights that this SL node contributes to the merge process.Allowed values are:<br>`inactive`: Node does not contribute its weights in the merge process but participates as non-contributing peer in the merge process.<br>`snapshot`: Node always contributes the weights that it had when it reached the stopping criterion, it does not accept merged weights.<br>`active`: Node behaves as if it is in active training, but it does not train merged model with local data as mentioned above.<br>`snapshot` is the default value.<br>|
4545
|`trainingContract`|Training contract associated with this learning. It is a user-defined string. Default value is `defaultbb.cqdb.sml.hpe`. <br> **NOTE**: This parameter enables a user to run <strong>concurrent</strong> swarm learning trainings, within the same swarm network. User must create this training contract using SWCI and then use it as the parameter value.|
4646
|`nodeWeightage`|A number between 1–100 to indicate the relative importance of this node compared with others during the parameter merge process.By default, all nodes are equal and have the same weight-age of one.|
47-
|`mlPlatform`|Specifies ML platform. Allowed values are either TF, KERAS or PYTORCH.|
47+
|`mlPlatform`|Specifies ML platform. Allowed values are either TF, KERAS or PYTORCH. If TF platform is used, the default value is KERAS. If PYTORCH platform is used, the default value is PYTORCH.|
4848
|`logger`|Provides information about Python logger. `SwarmCallback` class invokes info, debug, and error methods of this logger for logging. If no logger is passed, then `SwarmCallback` class creates its own logger from basic python logger. If required, user can get hold of this logger instance to change the log level as follows: <br> `import logging` <br> `from swarmlearning.tf import SwarmCallback` <br> `swCallback = SwarmCallback(syncFrequency=128, minPeers=3)` <br> `swCallback.logger.setLevel(logging.DEBUG)`|
4949

5050
3. Use the `SwarmCallback` object for training the model.

docs/User/SWCI_APIs.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,42 @@ The python3 program needs to import the SWCI class, and then use the below APIs.
4343
|`registerTask()`|This method registers a task into the SN network and finalizes it, if the task is valid.|`yamlFileName, finalize=True`|
4444
|`resetTaskRunner()`|This method resets the state of the taskrunner contract to an uninitialized state.<br><strong>WARNING:</strong>This action cannot be undone, reset only completed Taskrunner contracts. Resetting the active taskrunner contract can result in unexpected behavior.|`trName='defaulttaskbb.taskdb.sml.hpe'`|
4545
|`resetTrainingContract()`|This method resets the state of the training contract to an uninitialized state.<br><strong>WARNING:</strong>This action cannot be undone, reset only completed Swarm Learning contracts. Resetting the active contracts can result in unexpected behavior.|`ctName='defaultbb.cqdb.sml.hpe'`|
46+
|`sleep()`|This method sleeps for a specified time before executing the subsequent commands.<br>&nbsp;<br>For example, in between a `WAIT FOR TASKRUNNER` and `RESET TASKRUNNER`, one can use a `SLEEP 10`, to give a grace time of 10 secs, before the `RESET` command cleans up the SL and user container.<br>&nbsp;<br>This would be required to allow the user ML code to save the model or do any inference of the model, after the Swarm training is over.<br>&nbsp;<br>For more information, see the example SWCI scripts in the `swarmlearning/examples/` directory.|`time in seconds`|
4647
|`setLogLevel()`|This method sets the logging level for the SWCI container.|logLv One of `{ logging.CRITICAL, logging.ERROR, logging.WARNING, logging.INFO, logging.DEBUG }`<br>|
4748
|`uploadTaskDefintion()`|This method uploads the local task definition file to the SWCI container.|`taskFilePath`|
4849

50+
51+
## Example snippet of an API
52+
53+
```
54+
##################################################################################################################
55+
# This code snippet shows how an user can use SWCI API's
56+
#
57+
# We assume the following things before running this script:
58+
# 1. Swarm Learning Infrastructure is setup and ready.
59+
# 2. SWCI container is running in WEB mode (-e SWCI_MODE='WEB')
60+
# 3. There should be explicit port forwarding for SWCI_WEB_PORT while running the SWCI container (ex: -p 30306:30306)
61+
# 4. Swarm learning wheel package should be installed in the environment where we run this file.
62+
##################################################################################################################
63+
```
64+
65+
```
66+
# Import swci from the swarmlearning whl package
67+
import swarmlearning.swci as sw
68+
69+
swciSrvName = 'SWCI Server Name or IP'
70+
snServerName = 'SN Server Name or IP'
71+
72+
# Connect to the SWCI via SWCI_WEB_PORT
73+
s = sw.Swci(swciSrvName,port=30306) #30306 is the default port
74+
# Connect to SN and create context
75+
print(s.createContext('testContext', snServerName))
76+
# Switches the context to testContext
77+
print(s.switchContext('testContext'))
78+
# Creates a training contract
79+
print(s.createTrainingContract('testContract'))
80+
# Lists all the created Contexts
81+
print(s.listContexts())
82+
# Lists all the tasks that includes root task
83+
print(s.listTasks())
84+
```

0 commit comments

Comments
 (0)