Skip to content

Commit 4adb831

Browse files
authored
doc: add the example of shell (#190)
1 parent 5ec9b39 commit 4adb831

File tree

6 files changed

+54
-6
lines changed

6 files changed

+54
-6
lines changed

doc/examples/shell.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
## Running multiple MD tasks on a GPU workstation
2+
3+
In this example, we are going to show how to run multiple MD tasks on a GPU workstation. This workstation does not install any job scheduling packages installed, so we will use `Shell` as `batch_type`.
4+
5+
```{literalinclude} ../../examples/machine/mandu.json
6+
:language: json
7+
:linenos:
8+
```
9+
10+
The workstation has 48 cores of CPUs and 8 RTX3090 cards. Here we hope each card runs 6 tasks at the same time, as each task does not consume too many GPU resources. Thus, `strategy/if_cuda_multi_devices` is set to `true` and `para_deg` is set to 6.
11+
12+
```{literalinclude} ../../examples/resources/mandu.json
13+
:language: json
14+
:linenos:
15+
```
16+
17+
Note that `group_size` should be set as large as possible to ensure there is only one job and avoid running multiple jobs at the same time.

doc/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ DPDispatcher will monitor (poke) until these jobs finish and download the result
2626
:caption: Examples
2727
:glob:
2828

29-
examples/expanse
29+
examples/*
3030

3131
Indices and tables
3232
==================

examples/machine/expanse.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@
77
"remote_profile": {
88
"hostname": "login.expanse.sdsc.edu",
99
"username": "njzjz",
10-
"port": "22"
10+
"port": 22
1111
}
1212
}

examples/machine/mandu.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{
2+
"batch_type": "Shell",
3+
"local_root": "./",
4+
"remote_root": "/data2/jinzhe/dpgen_workdir",
5+
"clean_asynchronously": true,
6+
"context_type": "SSHContext",
7+
"remote_profile": {
8+
"hostname": "mandu.iqb.rutgers.edu",
9+
"username": "jz748",
10+
"port": 22
11+
}
12+
}

examples/resources/expanse_cpu.json

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
{
2-
"number_node": "1",
3-
"cpu_per_node": "1",
4-
"gpu_per_node": "0",
2+
"number_node": 1,
3+
"cpu_per_node": 1,
4+
"gpu_per_node": 0,
55
"queue_name": "shared",
6-
"group_size": "1",
6+
"group_size": 1,
77
"custom_flags": [
88
"#SBATCH -c 32",
99
"#SBATCH --mem=16G",

examples/resources/mandu.json

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"number_node": 1,
3+
"cpu_per_node": 48,
4+
"gpu_per_node": 8,
5+
"queue_name": "shell",
6+
"group_size": 9999,
7+
"strategy": {
8+
"if_cuda_multi_devices": true
9+
},
10+
"source_list": [
11+
"activate /home/jz748/deepmd-kit"
12+
],
13+
"envs": {
14+
"OMP_NUM_THREADS": 1,
15+
"TF_INTRA_OP_PARALLELISM_THREADS": 1,
16+
"TF_INTER_OP_PARALLELISM_THREADS": 1
17+
},
18+
"para_deg": 6
19+
}

0 commit comments

Comments
 (0)