You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/finetune.md
+13-5Lines changed: 13 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,6 +75,8 @@ This enables scaling training across multiple nodes.
75
75
76
76
Use servers with compatible/same network interface(eg:ethernet).
77
77
78
+
And supported only for linux servers now. Use servers connected to same switch for benefits in time while scaling.
79
+
78
80
```
79
81
PYTHONUNBUFFERED: make python prints unbuffered, especially useful to identify progress (or lack thereof) for distributed tasks.This is optional and not compulsory
80
82
```
@@ -102,13 +104,13 @@ Steps to run Multi Node Finetuning:
102
104
103
105
Run the following docker setup commands on both machines (server and client).
104
106
105
-
# Expose QAIC accelerator devices
107
+
####Expose QAIC accelerator devices
106
108
107
109
```
108
110
devices=(/dev/accel/*)
109
111
```
110
112
111
-
# Start Docker container
113
+
####Start Docker container
112
114
113
115
```
114
116
sudo docker run -it \
@@ -127,18 +129,24 @@ In distributed ML setups, all nodes must resolve each other’s hostnames. If DN
127
129
128
130
2. Set QAIC Device Visibility
129
131
130
-
```export QAIC_VISIBLE_DEVICES=$(seq -s, 0 63)
132
+
```
133
+
export QAIC_VISIBLE_DEVICES=$(seq -s, 0 63)
134
+
131
135
```
132
136
133
-
This exposes devices 0–63 to the training process.
137
+
For example this sample command exposes devices 0–63 to the training process.
134
138
135
139
3. Activate the TORCH_QAIC Environment Inside the Container
136
140
137
141
```
138
142
source /opt/torch-qaic-env/bin/activate
139
143
```
140
144
141
-
4. Verify that the Qefficient Library is installed
145
+
4. Verify that the Qefficient Library is installed:
0 commit comments