Skip to content

Commit 7723116

Browse files
authored
Merge pull request #257 from NayantaraK/examples/huggingface-integration
HuggingFace examples, Python version change, Updated wheel file
2 parents 935ed6e + 0a1dd08 commit 7723116

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+600435
-2813
lines changed

.gitignore

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,42 @@
11
.*swo
22
.*swp
33
**/__pycache__
4-
workspace/
4+
workspace/
5+
6+
# Large data files and models
7+
*.bin
8+
*.json
9+
*.txt
10+
11+
# Model files and checkpoints
12+
pytorch_model.bin
13+
model.safetensors
14+
config.json
15+
tokenizer.json
16+
vocab.txt
17+
18+
# Data directories
19+
results/
20+
saved_models/
21+
experiments*/
22+
logs/
23+
24+
# Python virtual environments
25+
venv/
26+
env/
27+
swarm_env/
28+
29+
# Wheel files
30+
*.whl
31+
32+
# Jupyter notebooks checkpoints
33+
.ipynb_checkpoints/
34+
35+
# OS generated files
36+
.DS_Store
37+
.DS_Store?
38+
._*
39+
.Spotlight-V100
40+
.Trashes
41+
ehthumbs.db
42+
Thumbs.db

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# <d></d> <img style="float: right;" src="docs/images/GettyImages-1148109728_EAA-graphic-A_112_0_72_RGB.jpg?raw=true"/> SWARM LEARNING
22

3-
#### Product version: 2.2.0
3+
#### Product version: 2.3.0
44
Swarm Learning is a decentralized, privacy-preserving Machine Learning framework. This framework utilizes the computing power at, or near, the distributed data sources to run the Machine Learning algorithms that train the models. It uses the security of a blockchain platform to share learnings with peers in a safe and secure manner. In Swarm Learning, training of the model occurs at the edge, where data is most recent, and where prompt, data-driven decisions are mostly necessary. In this completely decentralized architecture, only the insights learned are shared with the collaborating ML peers, not the raw data. This tremendously enhances data security and privacy.
55

66
Swarm Learning nodes works in collaboration with other Swarm Learning nodes in the network. It regularly shares its learnings with the other nodes and incorporates their insights. This process continues until the Swarm Learning nodes train the model to desired state. User can monitor the progress of the current training as shown in the below image. It shows all running Swarm nodes, loss, model metric (for example, accuracy) and overall training progress for each User ML node. On hovering over the "progress bar", one can see the number of completed epochs and the total number of epochs.
@@ -38,7 +38,7 @@ NOTE: The participating nodes must be able to access each other's ports.
3838

3939

4040
## User ML component
41-
User can transform/modify any Keras or PyTorch based ML program that is written using Python3 into a Swarm Learning ML program by [making a few simple changes](./docs/User/How_to_Swarm_enable_an_ML_algorithm.md) to the model training code by including the `SwarmCallback` API. For more information, see any of the [examples](/examples/README.md) included with the Swarm Learning package.
41+
User can transform/modify any Keras or PyTorch or HuggingFace Trainer class based ML program that is written using Python3 into a Swarm Learning ML program by [making a few simple changes](./docs/User/How_to_Swarm_enable_an_ML_algorithm.md) to the model training code by including the `SwarmCallback` API. For more information, see any of the [examples](/examples/README.md) included with the Swarm Learning package.
4242

4343
The transformed user Machine Learning \(user ML node\) program can be built as a Docker container or can be run on the host.
4444

@@ -50,19 +50,20 @@ NOTE: HPE recommends users to build an ML Docker container for easier and automa
5050
The ML node is responsible to train and iteratively update the model. For each ML node, there is a corresponding SL node in the Swarm Learning framework, which performs the Swarm training. Each pair of ML and SL nodes must run on the same host. This process continues until the SL nodes train the model to the desired state.
5151

5252
<blockquote>
53-
NOTE: All the ML nodes must use the same ML platform either Keras (based on TensorFlow 2 backend) or PyTorch. Using Keras for some and PyTorch for the other nodes is not supported.
53+
NOTE: All the ML nodes must use the same ML platform either Keras (based on TensorFlow 2 backend), PyTorch, or HuggingFace Trainer class. Using Keras for some and PyTorch for the other nodes is not supported.
5454
</blockquote>
5555

5656
## Quick Start
5757
1. [Prerequisites](/docs/Install/Prerequisites.md) for Swarm Learning
5858
2. [Upgrading from earlier versions](/docs/Install/Versioning_and_upgrade.md)
5959
3. [Download and setup Swarm Learning](/docs/Install/HPE_Swarm_Learning_installation.md) using the SLM-UI installer
60-
4. Execute a simple predefined example - [MNIST example](/examples/mnist/README.md)
61-
5. [Running MNIST example using SLM-UI](/docs/User/Running_MNIST_example_using_SLM-UI.md)
62-
6. [Monitoring & Tracking Swarm Learning training using SLM-UI](/docs/User/Monitoring_Swarm_Learning_training_using_SLM-UI.md)
63-
7. [Frequently Asked Questions](/docs/User/Frequently_asked_questions.md)
64-
8. [Troubleshooting](/docs/User/Troubleshooting.md)
65-
9. [Release Notes](/docs/HPE_Swarm_learning_2.2.0_Release_Notes.pdf)
60+
4. Execute a simple example - [MNIST example](/examples/mnist/README.md)
61+
5. Execute a mini LLM fine-tuning example - [HuggingFace Trainer LoRA](/examples/huggingface-peft/README.md)
62+
6. [Running MNIST example using SLM-UI](/docs/User/Running_MNIST_example_using_SLM-UI.md)
63+
7. [Monitoring & Tracking Swarm Learning training using SLM-UI](/docs/User/Monitoring_Swarm_Learning_training_using_SLM-UI.md)
64+
8. [Frequently Asked Questions](/docs/User/Frequently_asked_questions.md)
65+
9. [Troubleshooting](/docs/User/Troubleshooting.md)
66+
10. [Release Notes](/docs/HPE_Swarm_learning_2.2.0_Release_Notes.pdf)
6667

6768
<blockquote>
6869

@@ -104,8 +105,7 @@ NOTE: The examples and scripts that are bundled with the Swarm UI installer **ma
104105
Refer to [Acronyms and Abbreviations](docs/Generic/acronyms.md) for more information.
105106

106107
## Getting in touch
107-
Feedback and questions are appreciated. You can use the issue tracker to report bugs on GitHub. (Or)
108-
Join the [HPE Developer Slack Workspace](https://slack.hpedev.io/) and start a discussion in our [#hpe-swarm-learning](https://hpedev.slack.com/archives/C04A5DK9TUK) channel.
108+
Feedback and questions are appreciated. You can use the issue tracker to report bugs on GitHub.
109109

110110
## Contributing
111111
Refer to [Contributing](docs/Generic/CONTRIBUTING.md) for more information.

docs/Install/Environment_variables.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ The following environment variables are available to set and modify:
2424
|`SL_LEADER_FAILURE_BASE_TIMEOUT`|Sets the minimum timeout value \(in seconds\). If Swarm merging does not happen within this timeout, a new SL leader node is selected. The swarm training continues to run, regardless of SL leader node failures. This timeout will kickin after `min_peers` nodes have completed their local training. <br> Default value: 600 seconds. <br>This variable may need tunning depending on the ML application complexity.|
2525
|`SL_WAIT_FOR_FULL_QUORUM_SECONDS`|Sets the maximum time for an SL leader node to wait for full quorum after minPeers are ready for merge. This parameter lets you to maximize the number of peers participating in the merge process.<br>Default value: 30 secs|
2626
|`SL_RAM_INTENSIVE`|Optimizes the usage of RAM in the SL leader node for coordinate and geometric median merge methods. Unlike mean merge method, coordinate and geometric median merge methods involve memory intensive operations. If SL Leader node has limited hardware \(RAM\) configuration, then merging the intermediate model parameters using the median methods can result in memory issues. For such scenarios, user can set up the SL\_RAM\_INTENSIVE flag to 'False' for merging the model parameters layer by layer. This 'False' option is based on I/O operations and is time consuming, hence the default option is set to 'True'.<br> User can pass this parameter in slenvvars option within SWOP profile. This option can be different for each SL node depending on its hardware capacity. Example: 'slenvvars : \[SL\_RAM\_INTENSIVE : False\]' <br> Default value: True|
27+
|`SL_MODEL_PARAMS_COMPRESSION_THRESHOLD_MB`|Adaptive compression threshold for model parameters (in MB). Model parameter files smaller than this threshold will be compressed to reduce network transfer time. Larger files skip compression to avoid disk I/O contention and CPU blocking. <br> Default Value: 250 (MB).|
2728
|`SWCI_RUN_TASK_MAX_WAIT_TIME`|Specifies a maximum timeout value for the completion of a Run task (RUN_SWARM).<br>This value must be set in minutes, and the default is 120 mins (2 hours).|
2829
|`SWCI_GENERIC_TASK_MAX_WAIT_TIME`|Specifies a maximum timeout value for the completion of tasks other than RUN_SWARM type task.<br>This value must be set in minutes, and the default is 120 mins (2 hours).|
2930
|`SWCI_MODE`| Enables SWCIs web interface instead of command line interface. Allowed values are CLI and WEB.<br> Default value: CLI<br> |

docs/Install/Install_the_License_Server.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
# <a name="GUID-CCE936EF-FB0D-4BF1-B002-3CB9125C55B9"/> Installing the License Server
22

3-
1. After purchasing Swarm Learning from HPE, you will receive an email with a download link **Access Your Products**.
3+
1. After purchasing Swarm Learning from HPE, you will receive an email with a download link **Access Your Products**. If you are using the free community version,
4+
then you can skip this and directly click the MY HPE SOFTWARE CENTER(MSC) link given below.
45

56
2. From the email, click **Access Your Products**. You are redirected to [MY HPE SOFTWARE CENTER](https://myenterpriselicense.hpe.com/cwp-ui/auth/login).
67

78
3. If you have the HPE Passport account, enter the credentials and **Sign In**. If you do not have it, create the HPE Passport Account and **Sign In**.
89

9-
After signing in, you should see the Software Notification Message Receipt page listing the products.
10+
After signing in, you should see the Software Notification Message Receipt page listing the products. If you are using the free community version, then in the MSC page, click Software->Search -> Product Info -> "Swarm Learning" (as search term). In the search results, choose "HPE Swarm Learning Community edition" ver 2.2.0 > Action (drop down) -> Product Details -> Installation -> Pre install APLS and download APLS software & documentation ZIP file. For quick reference, APLS container based steps are mentioned below.
1011

1112
4. Download APLS container and run it using the following procedures.
1213

@@ -25,7 +26,7 @@
2526
3. Pull the image with a tag.
2627
2728
```
28-
docker pull hub.myenterpriselicense.hpe.com/hpe_eval/autopass/apls:9.14
29+
docker pull hub.myenterpriselicense.hpe.com/hpe_eval/autopass/apls:9.15
2930
```
3031
3132
4. Configure Data persistence.
@@ -76,9 +77,9 @@
7677
7778
![Lock code](GUID-A37C5798-B8B7-4B93-B786-A2682797AB37-high.png)
7879
79-
7. Go to the Software Notification Message Receipt page and click **Access Your Products**.
80+
7. Go to the Software Notification Message Receipt page and click **Access Your Products**.
8081
81-
You will be navigated to the [MY HPE SOFTWARE CENTER](https://myenterpriselicense.hpe.com/cwp-ui/auth/login) home page. After signing in with your HPE Passport credentials, you will see the **Activate** page.
82+
You will be navigated to the [MY HPE SOFTWARE CENTER](https://myenterpriselicense.hpe.com/cwp-ui/auth/login) home page. After signing in with your HPE Passport credentials, you will see the **Activate** page. If you are using the free community version, then in the MSC page, click Software->Search -> Product Info -> "Swarm Learning" (as search term). In the search results, choose "HPE Swarm Learning Community edition" ver 2.2.0 > Action (drop down) -> Get License
8283
8384
8. Activate the license:
8485
Lines changed: 34 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,50 @@
11
# Installing HPE Swarm Learning Management UI \(SLM-UI\)
22

3-
Installing Swarm Learning is a two-step process.
4-
5-
1. Using SLM-UI Installer, you can install the SLM-UI on one host.
3+
### Pre-requisite:
4+
APLS license server is installed and Swarm licenses are installed as detailed in the [License server installation steps](Install_the_License_Server.md)
5+
6+
## Manual installation for 2.3.0 version:
7+
We support **only manual** installation for 2.3.0 version. You need to:
8+
1. Either Clone or download this git repo on **each host machine** where you want to install Swarm learning.
9+
10+
2. If your downloading, then navigate to the main page of the repository. To the right of the list of files, click Releases and select 2.3.0 version. Scroll down to the "Assets" section of the release, click Source code (tar.gz). Copy and extract the tar.gz **on each host machine**
11+
12+
3. Preferable to extract it under /opt/hpe/swarm-learning.
13+
14+
4. Do a Docker login from your host:
15+
16+
docker login hub.myenterpriselicense.hpe.com –u <YOUR-HPE-PASSPORT-EMAIL> -p hpe
17+
5. Pull the signed Swarm Learning images from HPEs Docker Trust Registry (DTR):
18+
19+
docker pull hub.myenterpriselicense.hpe.com/hpe/swarm-learning/sn:2.3.0
20+
docker pull hub.myenterpriselicense.hpe.com/hpe/swarm-learning/sl:2.3.0
21+
docker pull hub.myenterpriselicense.hpe.com/hpe/swarm-learning/swci:2.3.0
22+
docker pull hub.myenterpriselicense.hpe.com/hpe/swarm-learning/swop:2.3.0
23+
docker pull hub.myenterpriselicense.hpe.com/hpe/swarm-learning/slm-ui:2.2.0
24+
docker pull hub.myenterpriselicense.hpe.com/hpe/swarm-learning/slm-ui-postgres:2.2.0
25+
docker pull hello-world
26+
You can skip rest of the installation steps mentioned below.
27+
28+
## Automatic installation for 2.2.0 version:
29+
Installing Swarm Learning is a two-step process using the GUI.
30+
31+
1. Using SLM-UI Installer GUI, you can install the SLM-UI on one linux host.
632
2. Using SLM-UI, you can install SL in multiple hosts and run the examples.
733

834
1. Navigate to the [MY HPE SOFTWARE CENTER](https://myenterpriselicense.hpe.com/cwp-ui/auth/login) home page.
935

1036
2. Perform the following actions after signing in with your HPE Passport credentials:
1137

12-
1. Go to **My Activations** and select your ordered product.
38+
1. Go to **My Activations** and select your ordered product. If you are using the free community version, then in the MSC page, click Software->Search -> Product Info -> "Swarm Learning" (as search term). In the search results, choose "HPE Swarm Learning Community edition" ver 2.2.0 > Action (drop down)
1339

1440
2. Go to **Action** pull down and then select **Download/Re-download** page.
1541

1642
3. Select and download listed software files.
1743

18-
- The tar file containing docs and scripts.
19-
20-
- The signature file for the above tar file.
21-
2244
- The docker digest hash file \(JSON\).
2345

2446
- Download the Swarm Learning SLM-UI installer for your platform, Mac, Windows, or Linux.
47+
48+
- The tar file containing docs and scripts.
49+
50+
- The signature file for the above tar file.

docs/Install/Prerequisites.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Qualified with Keras 2.9.0 \(TensorFlow 2 backend\) and PyTorch 1.5 based Machin
5151

5252
<blockquote>
5353

54-
NOTE: Python version must be between 3.6 to 3.9.
54+
NOTE: Python version must be between 3.8 to 3.9.
5555

5656
</blockquote>
5757

examples/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ Several examples of using Swarm Learning are provided. These examples use differ
44

55
For details of running each example, see the below:
66

7+
- [LLM fine-tuning](examples/huggingface/README.md)
8+
- [LLM fine-tuning with LoRA](examples/huggingface-peft/README.md)
79
- [MNIST](/examples/mnist/README.md)
810
- [MNIST-PYT](/examples/mnist-pyt/README.md)
911
- [CIFAR-10](/examples/cifar10/README.md)

examples/fraud-detection/swci/taskdefs/user_env_tf_build_task.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@ Body:
1515
- RUN pip3 install --upgrade pip && pip3 install \
1616
- ' keras matplotlib opencv-python pandas protobuf==3.15.6 '
1717
- ' '
18+
- RUN pip3 install pip==23.3.2
1819
- RUN mkdir -p /tmp/hpe-swarmcli-pkg
1920
- COPY swarmlearning-client-py3-none-manylinux_2_24_x86_64.whl /tmp/hpe-swarmcli-pkg/swarmlearning-client-py3-none-manylinux_2_24_x86_64.whl
2021
- RUN pip3 install /tmp/hpe-swarmcli-pkg/swarmlearning-client-py3-none-manylinux_2_24_x86_64.whl
22+
- RUN pip3 install --upgrade pip
2123

0 commit comments

Comments
 (0)