You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> **Note:** Access to Llama models on Hugging Face requires accepting the Community License Agreement and awaiting approval before you can download and serve them.
7
8
8
-
## Step 0: Install `gcloud cli`
9
+
## Step 0: Install `gcloud CLI`
9
10
10
11
You can reproduce this experiment from your dev environment
11
12
(e.g. your laptop).
12
13
You need to install `gcloud` locally to complete this tutorial.
13
14
14
-
To install `gcloud cli` please follow this guide:
15
+
To install `gcloud CLI` please follow this guide:
15
16
[Install the gcloud CLI](https://cloud.google.com/sdk/docs/install#mac)
16
17
17
18
Once it is installed, you can login to GCP from your terminal with this
18
19
command: `gcloud auth login`.
19
20
20
21
## Step 1: Create a v6e TPU instance
21
22
22
-
We create a single VM. For Llama3.3-70B, at least 8 chips are required. If you need a different number of
23
+
We create a single VM. For Llama3.1-8B, 1 chip is sufficient and for the 70B
24
+
models, at least 8 chips are required. If you need a different number of
23
25
chips, you can set a different value for `--topology` such as `1x1`,
24
26
`2x4`, etc.
25
27
@@ -53,7 +55,7 @@ export ZONE=your-tpu-zone
53
55
export PROJECT=your-tpu-project
54
56
export QR_ID=your-queued-resource-id # e.g. my-qr-request
55
57
56
-
# This command requests a v6e-8 (8 chips). Adjust accelerator-type for different sizes. For 1 chip, use --accelerator-type v6e-1.
58
+
# This command requests a v6e-8 (8 chips). Adjust accelerator-type for different sizes. For 1 chip (Llama3.1-8B), use --accelerator-type v6e-1.
0 commit comments