You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/blog/posts/mpi.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,10 +86,10 @@ resources:
86
86
87
87
</div>
88
88
89
-
The first worker node (`DSTACK_NODE_RANK=0`) generates a `hostfile` listing all node IPs and waits until all nodes are
89
+
The master node (`DSTACK_NODE_RANK=0`) generates a `hostfile` listing all node IPs and waits until all nodes are
90
90
reachable via MPI. Once confirmed, it launches the `/root/nccl-tests/build/all_reduce_perf` benchmark across all available GPUs in the cluster.
91
91
92
-
The other worker nodes remain blocked until they receive a termination signal from the master node via a FIFO pipe.
92
+
Non-master nodes remain blocked until they receive a termination signal from the master node via a FIFO pipe.
93
93
94
94
With this, now you can use such a task to run both NCCL or RCCL tests on both cloud and SSH fleets,
95
95
as well as use MPI for other tasks.
@@ -102,4 +102,4 @@ as well as use MPI for other tasks.
102
102
!!! info "What's next?"
103
103
1. Learn more about [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), [services](../../docs/concepts/services.md), and [fleets](../../docs/concepts/fleets.md)
104
104
2. Check the [NCCL tests](../../examples/clusters/nccl-tests/index.md) example
If the `aws` backend config has `public_ips: false` set, `dstack` enables the maximum number of interfaces supported by the instance.
72
-
Otherwise, if instances have public IPs, only one EFA interface is enabled per instance due to AWS limitations.
69
+
When you create a cloud fleet with AWS, [Elastic Fabric Adapter networking :material-arrow-top-right-thin:{ .external }](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html){:target="_blank"} is automatically configured if it’s supported for the corresponding instance type.
70
+
Note, EFA requires the `public_ips` to be set to `false` in the `aws` backend configuration.
71
+
Otherwise, instances are only connected by the default VPC subnet.
72
+
73
+
Refer to the [EFA](../../blog/posts/efa.md) example for more details.
74
+
75
+
??? info "GCP"
76
+
When you create a cloud fleet with GCP, for the A3 Mega and A3 High instance types, [GPUDirect-TCPXO and GPUDirect-TCPX :material-arrow-top-right-thin:{ .external }](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot){:target="_blank"} networking is automatically configured.
77
+
78
+
!!! info "Backend configuration"
79
+
Note, GPUDirect-TCPXO and GPUDirect-TCPX require `extra_vpcs` to be configured in the `gcp` backend configuration.
80
+
Refer to the [A3 Mega](../../examples/clusters/a3mega/index.md) and
81
+
[A3 High](../../examples/clusters/a3high/index.md) examples for more details.
73
82
74
83
??? info "Nebius"
75
-
`dstack` automatically creates an [InfiniBand cluster](https://docs.nebius.com/compute/clusters/gpu)
76
-
if all instances in the fleet support it.
84
+
When you create a cloud fleet with Nebius, [InfiniBand networking :material-arrow-top-right-thin:{ .external }](https://docs.nebius.com/compute/clusters/gpu){:target="_blank"} is automatically configured if it’s supported for the corresponding instance type.
77
85
Otherwise, instances are only connected by the default VPC subnet.
78
86
79
-
An InfiniBand fabric for the cluster is selected automatically.
80
-
If you prefer to use some specific fabrics, configure them in the
87
+
An InfiniBand fabric for the cluster is selected automatically. If you prefer to use some specific fabrics, configure them in the
A cluster is a fleet with its `placement` set to `cluster`. This configuration ensures that the instances within the fleet are interconnected, enabling fast inter-node communication—crucial for tasks such as efficient distributed training.
4
+
5
+
## Fleets
6
+
7
+
Ensure a fleet is created before you run any distributed task. This can be either an SSH fleet or a cloud fleet.
8
+
9
+
### SSH fleets
10
+
11
+
SSH fleets can be used to create a fleet out of existing baremetals or VMs, e.g. if they are already pre-provisioned, or set up on-premises.
12
+
13
+
> For SSH fleets, fast interconnect is supported provided that the hosts are pre-configured with the appropriate interconnect drivers.
14
+
15
+
### Cloud fleets
16
+
17
+
Cloud fleets allow to provision interconnected clusters across supported backends.
18
+
For cloud fleets, fast interconnect is currently supported only on the `aws`, `gcp`, and `nebius` backends.
19
+
20
+
=== "AWS"
21
+
When you create a cloud fleet with AWS, [Elastic Fabric Adapter :material-arrow-top-right-thin:{ .external }](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html){:target="_blank"} networking is automatically configured if it’s supported for the corresponding instance type.
22
+
23
+
!!! info "Backend configuration"
24
+
Note, EFA requires the `public_ips` to be set to `false` in the `aws` backend configuration.
25
+
Refer to the [EFA](../../blog/posts/efa.md) example for more details.
26
+
27
+
=== "GCP"
28
+
When you create a cloud fleet with GCP, for the A3 Mega and A3 High instance types, [GPUDirect-TCPXO and GPUDirect-TCPX :material-arrow-top-right-thin:{ .external }](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot){:target="_blank"} networking is automatically configured.
29
+
30
+
!!! info "Backend configuration"
31
+
Note, GPUDirect-TCPXO and GPUDirect-TCPX require `extra_vpcs` to be configured in the `gcp` backend configuration.
32
+
Refer to the [A3 Mega](../../examples/clusters/a3mega/index.md) and
33
+
[A3 High](../../examples/clusters/a3high/index.md) examples for more details.
34
+
35
+
=== "Nebius"
36
+
When you create a cloud fleet with Nebius, [InfiniBand :material-arrow-top-right-thin:{ .external }](https://docs.nebius.com/compute/clusters/gpu){:target="_blank"} networking is automatically configured if it’s supported for the corresponding instance type.
37
+
38
+
> To request fast interconnect support for a other backends,
39
+
file an [issue :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues){:target="_ blank"}.
40
+
41
+
## NCCL/RCCL tests
42
+
43
+
To test the interconnect of a created fleet, ensure you run [NCCL](../../examples/clusters/nccl-tests/index.md)
44
+
(for NVIDIA) or [RCCL](../../examples/clusters/rccl-tests/index.md) (for AMD) tests.
45
+
46
+
## Distributed tasks
47
+
48
+
A distributed task is a task with `nodes` set to a value greater than `2`. In this case, `dstack` first ensures a
49
+
suitable fleet is available, then starts the master node and runs the task container on it. Once the master is up,
50
+
`dstack` starts the rest of the nodes and runs the task container on each of them.
51
+
52
+
Within the task's `commands`, it's possible to use `DSTACK_MASTER_NODE_IP`, `DSTACK_NODES_IPS`, `DSTACK_NODE_RANK`, and other
53
+
[system environment variables](../concepts/tasks.md#system-environment-variables) for inter-node communication.
54
+
55
+
Refer to [distributed tasks](../concepts/tasks.md#distributed-tasks) for an example.
56
+
57
+
!!! info "Retry policy"
58
+
By default, if any of the nodes fails, `dstack` terminates the entire run. Configure a [retry policy](../concepts/tasks.md#retry-policy) to restart the run if any node fails.
59
+
60
+
## Volumes
61
+
62
+
### Network volumes
63
+
64
+
Currently, no backend supports multi-attach network volumes for distributed tasks. However, single-attach volumes can be used by leveraging volume name [interpolation syntax](../concepts/volumes.md#distributed-tasks). This approach mounts a separate single-attach volume to each node.
65
+
66
+
### Instance volumes
67
+
68
+
Instance volumes enable mounting any folder from the host into the container, allowing data persistence during distributed tasks.
69
+
70
+
Instance volumes can be used to mount:
71
+
72
+
* Regular folders (data persists only while the fleet exists)
73
+
* Folders that are mounts of shared filesystems (e.g., manually mounted shared filesystems).
74
+
75
+
Refer to [instance volumes](../concepts/volumes.md#instance) for an example.
76
+
77
+
!!! info "What's next?"
78
+
1. Read about [distributed tasks](../concepts/tasks.md#distributed-tasks), [fleets](../concepts/fleets.md), and [volumes](../concepts/volumes.md)
79
+
2. Browse the [Clusters](../../examples.md#clusters) examples
Copy file name to clipboardExpand all lines: docs/docs/guides/protips.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,9 +36,9 @@ unlimited).
36
36
## Volumes
37
37
38
38
To persist data across runs, it is recommended to use volumes.
39
-
`dstack` supports two types of volumes: [network](../concepts/volumes.md#network-volumes)
39
+
`dstack` supports two types of volumes: [network](../concepts/volumes.md#network)
40
40
(for persisting data even if the instance is interrupted)
41
-
and [instance](../concepts/volumes.md#instance-volumes) (useful for persisting cached data across runs while the instance remains active).
41
+
and [instance](../concepts/volumes.md#instance) (useful for persisting cached data across runs while the instance remains active).
42
42
43
43
> If you use [SSH fleets](../concepts/fleets.md#ssh), you can mount network storage (e.g., NFS or SMB) to the hosts and access it in runs via instance volumes.
Copy file name to clipboardExpand all lines: examples/clusters/nccl-tests/README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,10 +63,10 @@ resources:
63
63
64
64
!!! info "MPI"
65
65
NCCL tests rely on MPI to run on multiple processes. The master node (`DSTACK_NODE_RANK=0`) generates `hostfile` (using `DSTACK_NODES_IPS`)
66
-
and waits until worker nodes are accessible via MPI.
66
+
and waits until other nodes are accessible via MPI.
67
67
Then, it executes `/nccl-tests/build/all_reduce_perf` across all GPUs.
68
68
69
-
Worker nodes use a `FIFO` pipe to wait for until the MPI run is finished.
69
+
Non-master nodes use a `FIFO` pipe to wait for until the MPI run is finished.
70
70
71
71
There is an open [issue :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues/2467){:target="_blank"} to simplify the use of MPI with distributed tasks.
0 commit comments