huggingface
diff --git a/‎README.md‎
Lines changed: 2 additions & 1 deletion b/‎README.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/source/resources.mdx‎
Lines changed: 2 additions & 1 deletion b/‎docs/source/resources.mdx‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎examples/cloud-run/README.md‎
Lines changed: 9 additions & 3 deletions b/‎examples/cloud-run/README.md‎
Lines changed: 9 additions & 3 deletions
diff --git a/‎examples/cloud-run/deploy-gemma-2-on-cloud-run/README.md‎
Lines changed: 382 additions & 0 deletions b/‎examples/cloud-run/deploy-gemma-2-on-cloud-run/README.md‎
Lines changed: 382 additions & 0 deletions
diff --git a/‎examples/cloud-run/deploy-gemma-2-on-cloud-run/imgs/cloud-run-deployment.png‎
545 KB b/‎examples/cloud-run/deploy-gemma-2-on-cloud-run/imgs/cloud-run-deployment.png‎
545 KB
diff --git a/‎examples/cloud-run/deploy-gemma-2-on-cloud-run/imgs/cloud-run-details.png‎
1.07 MB b/‎examples/cloud-run/deploy-gemma-2-on-cloud-run/imgs/cloud-run-details.png‎
1.07 MB
diff --git a/‎examples/cloud-run/tgi-deployment/README.md‎ renamed to ‎examples/cloud-run/deploy-llama-3-1-on-cloud-run/README.md‎
Lines changed: 6 additions & 6 deletions b/‎examples/cloud-run/tgi-deployment/README.md‎ renamed to ‎examples/cloud-run/deploy-llama-3-1-on-cloud-run/README.md‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎examples/cloud-run/tgi-deployment/imgs/cloud-run-deployment.png‎ renamed to ‎examples/cloud-run/deploy-llama-3-1-on-cloud-run/imgs/cloud-run-deployment.png‎ b/‎examples/cloud-run/tgi-deployment/imgs/cloud-run-deployment.png‎ renamed to ‎examples/cloud-run/deploy-llama-3-1-on-cloud-run/imgs/cloud-run-deployment.png‎
diff --git a/‎examples/cloud-run/tgi-deployment/imgs/cloud-run-details.png‎ renamed to ‎examples/cloud-run/deploy-llama-3-1-on-cloud-run/imgs/cloud-run-details.png‎ b/‎examples/cloud-run/tgi-deployment/imgs/cloud-run-details.png‎ renamed to ‎examples/cloud-run/deploy-llama-3-1-on-cloud-run/imgs/cloud-run-details.png‎
@@ -67,7 +67,8 @@ The [`examples`](./examples) directory contains examples for using the container
 | GKE       | [examples/gke/tgi-deployment](./examples/gke/tgi-deployment)                                                                         | Deploy Meta Llama 3 8B with TGI DLC on GKE                    |
 | GKE       | [examples/gke/tgi-from-gcs-deployment](./examples/gke/tgi-from-gcs-deployment)                                                       | Deploy Qwen2 7B with TGI DLC from GCS on GKE                  |
 | GKE       | [examples/gke/tei-deployment](./examples/gke/tei-deployment)                                                                         | Deploy Snowflake's Arctic Embed with TEI DLC on GKE           |
-| Cloud Run | [examples/cloud-run/tgi-deployment](./examples/cloud-run/tgi-deployment)                                                             | Deploy Meta Llama 3.1 8B with TGI DLC on Cloud Run            |
+| Cloud Run | [examples/cloud-run/deploy-gemma-2-on-cloud-run](./examples/cloud-run/deploy-gemma-2-on-cloud-run)                                   | Deploy Gemma2 9B with TGI DLC on Cloud Run                    |
+| Cloud Run | [examples/cloud-run/deploy-llama-3-1-on-cloud-run](./examples/cloud-run/deploy-llama-3-1-on-cloud-run)                               | Deploy Llama 3.1 8B with TGI DLC on Cloud Run                 |
 
 ### Evaluation
 
 
@@ -66,4 +66,5 @@ Learn how to use Hugging Face in Google Cloud by reading our blog posts, present
 
 - Inference
 
-  - [Deploy Meta Llama 3.1 8B with TGI DLC on Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run/tgi-deployment)
+  - [Deploy Gemma2 9B with TGI DLC on Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run/deploy-gemma-2-on-cloud-run)
+  - [Deploy Llama 3.1 8B with TGI DLC on Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run/deploy-llama-3-1-on-cloud-run)
@@ -7,6 +7,12 @@ This directory contains usage examples of the Hugging Face Deep Learning Contain
 
 ## Inference Examples
 
-| Example                            | Title                                           |
-| ---------------------------------- | ----------------------------------------------- |
-| [tgi-deployment](./tgi-deployment) | Deploy Meta Llama 3.1 with TGI DLC on Cloud Run |
+| Example                                                          | Title                                         |
+| ---------------------------------------------------------------- | --------------------------------------------- |
+| [deploy-gemma-2-on-cloud-run](./deploy-gemma-2-on-cloud-run)     | Deploy Gemma2 9B with TGI DLC on Cloud Run    |
+| [deploy-llama-3-1-on-cloud-run](./deploy-llama-3-1-on-cloud-run) | Deploy Llama 3.1 8B with TGI DLC on Cloud Run |
+
+## Training Examples
+
+Coming soon!
+
@@ -1,13 +1,13 @@
 ---
-title: Deploy Meta Llama 3.1 8B with TGI DLC on Cloud Run
+title: Deploy Llama 3.1 8B with TGI DLC on Cloud Run
 type: inference
 ---
 
-# Deploy Meta Llama 3.1 8B with TGI DLC on Cloud Run
+# Deploy Llama 3.1 8B with TGI DLC on Cloud Run
 
-Meta Llama 3.1 is the latest open LLM from Meta, released in July 2024. Meta Llama 3.1 comes in three sizes: 8B for efficient deployment and development on consumer-size GPU, 70B for large-scale AI native applications, and 405B for synthetic data, LLM as a Judge or distillation; among other use cases. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. Google Cloud Run is a serverless container platform that allows developers to deploy and manage containerized applications without managing infrastructure, enabling automatic scaling and billing only for usage.
+Llama 3.1 is the latest open LLM from Meta, released in July 2024. Llama 3.1 comes in three sizes: 8B for efficient deployment and development on consumer-size GPU, 70B for large-scale AI native applications, and 405B for synthetic data, LLM as a Judge or distillation; among other use cases. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. Google Cloud Run is a serverless container platform that allows developers to deploy and manage containerized applications without managing infrastructure, enabling automatic scaling and billing only for usage.
 
-This example showcases how to deploy an LLM from the Hugging Face Hub, in this case Meta Llama 3.1 8B Instruct model quantized to INT4 using AWQ, with the Hugging Face DLC for TGI on Google Cloud Run with GPU support ([in preview](https://cloud.google.com/products#product-launch-stages)).
+This example showcases how to deploy an LLM from the Hugging Face Hub, in this case Llama 3.1 8B Instruct model quantized to INT4 using AWQ, with the Hugging Face DLC for TGI on Google Cloud Run with GPU support ([in preview](https://cloud.google.com/products#product-launch-stages)).
 
 > [!NOTE]
 > GPU support on Cloud Run is only available as a waitlisted public preview. If you're interested in trying out the feature, [request a quota increase](https://cloud.google.com/run/quotas#increase) for `Total Nvidia L4 GPU allocation, per project per region`. At the time of writing this example, NVIDIA L4 GPUs (24GiB VRAM) are the only available GPUs on Cloud Run; enabling automatic scaling up to 7 instances by default (more available via quota), as well as scaling down to zero instances when there are no requests.
@@ -216,7 +216,7 @@ The recommended approach is to use a Service Account (SA), as the access can be
 - Set the `SERVICE_ACCOUNT_NAME` environment variable for convenience:
 
   ```bash
-  export SERVICE_ACCOUNT_NAME=text-generation-inference-invoker
+  export SERVICE_ACCOUNT_NAME=tgi-invoker
   ```
 
 - Create the Service Account:
@@ -241,7 +241,7 @@ The recommended approach is to use a Service Account (SA), as the access can be
   ```
 
 > [!WARNING]
-> The access token is short-lived and will expire, by default after 1 hour. If you want to extend the token lifetime beyond the default, you must create and organization policy and use the `--lifetime` argument when createing the token. Refer to (Access token lifetime)[[https://cloud.google.com/resource-manager/docs/organization-policy/restricting-service-accounts#extend_oauth_ttl]] to learn more. Otherwise, you can also generate a new token by running the same command again.
+> The access token is short-lived and will expire, by default after 1 hour. If you want to extend the token lifetime beyond the default, you must create and organization policy and use the `--lifetime` argument when createing the token. Refer to [Access token lifetime](https://cloud.google.com/resource-manager/docs/organization-policy/restricting-service-accounts#extend_oauth_ttl) to learn more. Otherwise, you can also generate a new token by running the same command again.
 
 Now you can already dive into the different alternatives for sending the requests to the deployed Cloud Run Service using the `SERVICE_URL` AND `ACCESS_TOKEN` as described above.