You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> **Note:** Access to Llama models on Hugging Face requires accepting the Community License Agreement and awaiting approval before you can download and serve them.
7
9
8
-
## Step 0: Install `gcloud cli`
10
+
## Step 0: Install `gcloud CLI`
9
11
10
12
You can reproduce this experiment from your dev environment
11
13
(e.g. your laptop).
12
14
You need to install `gcloud` locally to complete this tutorial.
13
15
14
-
To install `gcloud cli` please follow this guide:
16
+
To install `gcloud CLI` please follow this guide:
15
17
[Install the gcloud CLI](https://cloud.google.com/sdk/docs/install#mac)
16
18
17
19
Once it is installed, you can login to GCP from your terminal with this
18
20
command: `gcloud auth login`.
19
21
20
22
## Step 1: Create a v6e TPU instance
21
23
22
-
We create a single VM. For Llama3.3-70B, at least 8 chips are required. If you need a different number of
24
+
We create a single VM. For Llama3.1-8B, 1 chip is sufficient and for the 70B
25
+
models, at least 8 chips are required. If you need a different number of
23
26
chips, you can set a different value for `--topology` such as `1x1`,
24
27
`2x4`, etc.
25
28
@@ -53,7 +56,7 @@ export ZONE=your-tpu-zone
53
56
export PROJECT=your-tpu-project
54
57
export QR_ID=your-queued-resource-id # e.g. my-qr-request
55
58
56
-
# This command requests a v6e-8 (8 chips). Adjust accelerator-type for different sizes. For 1 chip, use --accelerator-type v6e-1.
59
+
# This command requests a v6e-8 (8 chips). Adjust accelerator-type for different sizes. For 1 chip (Llama3.1-8B), use --accelerator-type v6e-1.
0 commit comments