Skip to content

Commit 02e2f71

Browse files
committed
Merge remote-tracking branch 'origin/master' into unity-machine-view-integration
2 parents 85eded6 + 5abb225 commit 02e2f71

File tree

1 file changed

+87
-1
lines changed

1 file changed

+87
-1
lines changed

docs/SAPLING.md

Lines changed: 87 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,89 @@
11
# Setup Guide for sapling
22

3-
TODO (tracked in [#1572](https://github.com/flexflow/flexflow-train/issues/1572))
3+
1. ssh into the sapling head node.
4+
5+
2. Install [DavHau/nix-portable](https://github.com/DavHau/nix-portable).
6+
7+
```bash
8+
USERBIN="${XDG_BIN_HOME:-$HOME/.local/bin}"
9+
mkdir -p "$USERBIN"
10+
wget 'https://github.com/DavHau/nix-portable/releases/download/v010/nix-portable' -O "$USERBIN/nix-portable"
11+
chmod u+x "$USERBIN/nix-portable"
12+
ln -sf "$USERBIN/nix-portable" "$USERBIN/nix"
13+
```
14+
15+
3. Configure the nix-portable store.
16+
17+
```bash
18+
cat >>"$HOME/.bashrc" <<EOF
19+
mkdir -p "/tmp/\$USER"
20+
export NP_LOCATION="/tmp/\$USER/"
21+
EOF
22+
```
23+
24+
4. Clone the repo.
25+
26+
```bash
27+
SSH_URL="git@github.com:flexflow/flexflow-train.git"
28+
git clone --recursive "$SSH_URL" "$HOME/ff"
29+
```
30+
31+
5. Enter the nix-provided `default` development environment[^1]
32+
33+
[^1]: aka "dev shell"
34+
35+
```bash
36+
cd "$HOME/ff"
37+
nix develop --accept-flake-config
38+
```
39+
40+
6. Build and run the non-GPU-required tests.
41+
42+
```
43+
(ff) $ proj cmake
44+
...
45+
(ff) $ proj test --skip-gpu-tests
46+
...
47+
```
48+
If everything is correctly configured, you should see a bunch of build messages followed by something like
49+
```
50+
(ff) $ proj test --skip-gpu-tests
51+
421/421 Test #441: get_transformer_computation_graph
52+
100% tests passed, 0 tests failed out of 421
53+
54+
Label Time Summary:
55+
compiler-tests = 6.13 sec*proc (19 tests)
56+
local-execution-tests = 0.13 sec*proc (3 tests)
57+
models-tests = 0.05 sec*proc (4 tests)
58+
op-attrs-tests = 0.48 sec*proc (59 tests)
59+
pcg-tests = 0.33 sec*proc (33 tests)
60+
substitution-generator-tests = 0.06 sec*proc (2 tests)
61+
substitutions-tests = 0.10 sec*proc (9 tests)
62+
utils-tests = 1.20 sec*proc (293 tests)
63+
64+
Total Test time (real) = 8.64 sec
65+
```
66+
67+
7. Exit the `default` dev shell
68+
```
69+
exit
70+
```
71+
72+
8. Allocate and ssh into a GPU node.
73+
74+
9. Enter the gpu-enabled dev shell.
75+
```
76+
cd "$HOME/ff"
77+
NIXPKGS_ALLOW_UNFREE=1 nix develop .#gpu --accept-flake-config --impure
78+
```
79+
80+
10. Run the gpu tests
81+
```
82+
(ff) $ proj test
83+
...
84+
```
85+
You should see the additional GPU tests run. If you instead see a message like
86+
87+
> `Error: ... Pass --skip-gpu-tests to skip running tests that require a GPU`
88+
89+
Double check that you are correctly in the `gpu` devshell, not the `default` devshell.

0 commit comments

Comments
 (0)