You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/tei_cloud_run.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,10 +25,10 @@ On Google Cloud, there are 3 main options for deploying TEI (or any other Docker
25
25
26
26
This guide explains how to deploy TEI on Cloud Run, a fully managed service by Google. Cloud Run is a so-called serverless offering. This means that the server infrastructure is handled by Google, you only need to provide a Docker container. The benefit of this is that you only pay for compute when there is demand for your application. Cloud Run will automatically spin up servers when there's demand, and scale down to zero when there is no demand.
27
27
28
-
We will showcase how to deploy the model with or without a GPU.
28
+
We will showcase how to deploy any text embedding model with or without a GPU.
29
29
30
30
> [!NOTE]
31
-
> GPU support on Cloud Run was just made generally available. If you're interested in using it, [request a quota increase](https://cloud.google.com/run/quotas#increase) for `Total Nvidia L4 GPU allocation, per project per region`. At the time of writing this example, NVIDIA L4 GPUs (24GiB VRAM) are the only available GPUs on Cloud Run; enabling automatic scaling up to 7 instances by default (more available via quota), as well as scaling down to zero instances when there are no requests.
31
+
> At the time of writing, GPU support on Cloud Run is generally available in 4 regions. If you're interested in using it, [request a quota increase](https://cloud.google.com/run/quotas#increase) for `Total Nvidia L4 GPU allocation, per project per region`. So far, NVIDIA L4 GPUs (24GiB VRAM) are the only available GPUs on Cloud Run; enabling automatic scaling up to 7 instances by default (more available via quota), as well as scaling down to zero instances when there are no requests.
32
32
33
33
## Setup / Configuration
34
34
@@ -92,11 +92,12 @@ The command needs you to specify the following parameters:
92
92
Finally, you can run the `gcloud run deploy`command to deploy TEI on Cloud Run as:
93
93
94
94
```bash
95
-
export SERVICE_NAME="text-embedding-server"# or choose another name for your service
95
+
export SERVICE_NAME="text-embedding-server"# choose a name for your service
96
+
export MODEL_ID="ibm-granite/granite-embedding-278m-multilingual"# choose any embedding model
0 commit comments