You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Cleanup readme and fix bash-ism.
* Moved PyTorch process affimity mapping and GPU memory limits into the
torch __init__ function so that they are set when hpc_launcher.torch
is imported. This should make it easier to use the library without
calling the launcher from the command line.
* Make the env variable a string in the shell test operation.
* Capture the original command line and record it in the launch script.
* Add a guard to make sure that active system parameters is defined.
* Apply suggestions from code review
Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
* Fixed status of NERSC systems
* Add an example of how to use HPC-Launcher within an existing PyTorch code.
---------
Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
The HPC launcher repository contains a set of helpful scripts and
4
+
Python bindings for launching LBANN 2.0 (PyTorch-core) on multiple
5
+
leadership-class HPC systems. There are optimized routines for FLUX,
6
+
SLURM, and LSF launchers. Currently there are supported systems at:
7
+
- LLNL Livermore Computing (LC)
8
+
- LBL NERSC (Pending)
9
+
- ORNL OLCF (Pending)
10
+
- RIKEN (Pending)
11
+
12
+
## Example Usage
13
+
14
+
Using the launch command to execute a command in parallel
15
+
```
16
+
launch -N1 -n1 hostname
17
+
```
18
+
19
+
Using the torchrun-hpc command to execute a PyTorch Python file in parallel on two nodes and four processes per node (8 in total):
20
+
```
21
+
torchrun-hpc -N2 -n4 file.py [arguments to Python file]
22
+
```
23
+
24
+
Using HPC-Launcher within existing PyTorch code with explicity invoking it from the command line (CLI). Within the top level Python file, import `hpc_launcher.torch` first to ensure that `torch` is configured per HPC-Launcher's specification.
25
+
```
26
+
import hpc_launcher.torch
27
+
```
28
+
1
29
# LBANN: Livermore Big Artificial Neural Network Toolkit
2
30
3
31
The Livermore Big Artificial Neural Network toolkit (LBANN) is an
@@ -12,22 +40,11 @@ networks with massive amounts of data. LBANN is able to advantage of
0 commit comments