|
1 | 1 | # Setup Guide for sapling |
2 | 2 |
|
3 | | -TODO (tracked in [#1572](https://github.com/flexflow/flexflow-train/issues/1572)) |
| 3 | +1. ssh into the sapling head node. |
| 4 | + |
| 5 | +2. Install [DavHau/nix-portable](https://github.com/DavHau/nix-portable). |
| 6 | + |
| 7 | +```bash |
| 8 | +USERBIN="${XDG_BIN_HOME:-$HOME/.local/bin}" |
| 9 | +mkdir -p "$USERBIN" |
| 10 | +wget 'https://github.com/DavHau/nix-portable/releases/download/v010/nix-portable' -O "$USERBIN/nix-portable" |
| 11 | +chmod u+x "$USERBIN/nix-portable" |
| 12 | +ln -sf "$USERBIN/nix-portable" "$USERBIN/nix" |
| 13 | +``` |
| 14 | + |
| 15 | +3. Configure the nix-portable store. |
| 16 | + |
| 17 | +```bash |
| 18 | +cat >>"$HOME/.bashrc" <<EOF |
| 19 | +mkdir -p "/tmp/\$USER" |
| 20 | +export NP_LOCATION="/tmp/\$USER/" |
| 21 | +EOF |
| 22 | +``` |
| 23 | + |
| 24 | +4. Clone the repo. |
| 25 | + |
| 26 | +```bash |
| 27 | +SSH_URL="git@github.com:flexflow/flexflow-train.git" |
| 28 | +git clone --recursive "$SSH_URL" "$HOME/ff" |
| 29 | +``` |
| 30 | + |
| 31 | +5. Enter the nix-provided `default` development environment[^1] |
| 32 | + |
| 33 | +[^1]: aka "dev shell" |
| 34 | + |
| 35 | +```bash |
| 36 | +cd "$HOME/ff" |
| 37 | +nix develop --accept-flake-config |
| 38 | +``` |
| 39 | + |
| 40 | +6. Build and run the non-GPU-required tests. |
| 41 | + |
| 42 | +``` |
| 43 | +(ff) $ proj cmake |
| 44 | +... |
| 45 | +(ff) $ proj test --skip-gpu-tests |
| 46 | +... |
| 47 | +``` |
| 48 | +If everything is correctly configured, you should see a bunch of build messages followed by something like |
| 49 | +``` |
| 50 | +(ff) $ proj test --skip-gpu-tests |
| 51 | +421/421 Test #441: get_transformer_computation_graph |
| 52 | +100% tests passed, 0 tests failed out of 421 |
| 53 | +
|
| 54 | +Label Time Summary: |
| 55 | +compiler-tests = 6.13 sec*proc (19 tests) |
| 56 | +local-execution-tests = 0.13 sec*proc (3 tests) |
| 57 | +models-tests = 0.05 sec*proc (4 tests) |
| 58 | +op-attrs-tests = 0.48 sec*proc (59 tests) |
| 59 | +pcg-tests = 0.33 sec*proc (33 tests) |
| 60 | +substitution-generator-tests = 0.06 sec*proc (2 tests) |
| 61 | +substitutions-tests = 0.10 sec*proc (9 tests) |
| 62 | +utils-tests = 1.20 sec*proc (293 tests) |
| 63 | +
|
| 64 | +Total Test time (real) = 8.64 sec |
| 65 | +``` |
| 66 | + |
| 67 | +7. Exit the `default` dev shell |
| 68 | +``` |
| 69 | +exit |
| 70 | +``` |
| 71 | + |
| 72 | +8. Allocate and ssh into a GPU node. |
| 73 | + |
| 74 | +9. Enter the gpu-enabled dev shell. |
| 75 | +``` |
| 76 | +cd "$HOME/ff" |
| 77 | +NIXPKGS_ALLOW_UNFREE=1 nix develop .#gpu --accept-flake-config --impure |
| 78 | +``` |
| 79 | + |
| 80 | +10. Run the gpu tests |
| 81 | +``` |
| 82 | +(ff) $ proj test |
| 83 | +... |
| 84 | +``` |
| 85 | +You should see the additional GPU tests run. If you instead see a message like |
| 86 | + |
| 87 | +> `Error: ... Pass --skip-gpu-tests to skip running tests that require a GPU` |
| 88 | +
|
| 89 | +Double check that you are correctly in the `gpu` devshell, not the `default` devshell. |
0 commit comments