huggingface · pagezyhf · Sep 24, 2024 · Sep 24, 2024 · Sep 24, 2024 · Sep 24, 2024
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -1,14 +1,14 @@
 - sections:
     - local: index
       title: Hugging Face on Google Cloud
-    - local: features
-      title: Features & benefits
     - local: resources
       title: Other Resources
   title: Getting Started
 - sections:
     - local: containers/introduction
       title: Introduction
+    - local: containers/features
+      title: Features & benefits
     - local: containers/available
       title: Available DLCs on Google Cloud
   title: Deep Learning Containers (DLCs)
diff --git a/docs/source/containers/available.mdx b/docs/source/containers/available.mdx
@@ -1,6 +1,6 @@
 # DLCs on Google Cloud
 
-Below you can find a listing of all the Deep Learning Containers (DLCs) available on Google Cloud.
+Below you can find a listing of all the Deep Learning Containers (DLCs) available on Google Cloud. Containers are created for each supported combination of use-case (training, inference), accelerator type (CPU, GPU, TPU), and framework (PyTorch, TGI, TEI).
 
 <Tip>
 
@@ -10,26 +10,34 @@ The listing below only contains the latest version of each one of the Hugging Fa
 
 ## Text Generation Inference (TGI)
 
+Text Generation Inference (TGI) DLC is available for high-performance text generation of Large Language Models on both GPU and TPU (soon). The TGI DLC enables you to deploy [any of the +140,000 text generation inference supported models from the Hugging Face Hub](https://huggingface.co/models?other=text-generation-inference&sort=trending), or any custom model as long as [its architecture is supported within TGI](https://huggingface.co/docs/text-generation-inference/supported_models).
+
 | Container URI                                                                                                                     | Path                                                                                                                                               | Accelerator |
 | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
 | us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-2.ubuntu2204.py310           | [text-generation-inference-gpu.2.2.0](./containers/tgi/gpu/2.2.0/Dockerfile)                                                                       | GPU         |
 
 ## Text Embeddings Inference (TEI)
 
+Text Embeddings Inference (TEI) DLC is available for high-performance serving of embedding models on both GPU and GPU. The TEI DLC enables you to deploy [any of the +10,000 embedding, re-ranking or sequence classification supported models from the Hugging Face Hub](https://huggingface.co/models?other=text-embeddings-inference&sort=trending), or any custom model as long as [its architecture is supported within TEI](https://huggingface.co/docs/text-embeddings-inference/en/supported_models).
+
 | Container URI                                                                                                                     | Path                                                                                                                                               | Accelerator |
 | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
 | us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cu122.1-4.ubuntu2204                 | [text-embeddings-inference-gpu.1.4.0](./containers/tei/gpu/1.4.0/Dockerfile)                                                                       | GPU         |
 | us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cpu.1-4                              | [text-embeddings-inference-cpu.1.4.0](./containers/tei/cpu/1.4.0/Dockerfile)                                                                       | CPU         |
 
 ## PyTorch Inference
 
+Pytorch Inference DLC is available for Pytorch via 🤗 Transformers, for serving models trained with 🤗 TRL, Sentence Transformers or 🧨 Diffusers, on both CPU and GPU.
+
 | Container URI                                                                                                                     | Path                                                                                                                                               | Accelerator |
 | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
 | us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-inference-cu121.2-2.transformers.4-44.ubuntu2204.py311 | [huggingface-pytorch-inference-gpu.2.2.2.transformers.4.44.0.py311](./containers/pytorch/inference/gpu/2.2.2/transformers/4.44.0/py311/Dockerfile) | GPU         |
 | us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-inference-cpu.2-2.transformers.4-44.ubuntu2204.py311   | [huggingface-pytorch-inference-cpu.2.2.2.transformers.4.44.0.py311](./containers/pytorch/inference/cpu/2.2.2/transformers/4.44.0/py311/Dockerfile) | CPU         |
 
 ## PyTorch Training
 
+Pytorch Training DLC is available for PyTorch via 🤗 Transformers. It includes support for training with libraries such as 🤗 TRL, Sentence Transformers, or 🧨 Diffusers, on both GPUs and TPUs (soon).
+
 | Container URI                                                                                                                     | Path                                                                                                                                               | Accelerator |
 | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
 | us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.2-3.transformers.4-42.ubuntu2204.py310  | [huggingface-pytorch-training-gpu.2.3.0.transformers.4.42.3.py310](./containers/pytorch/training/gpu/2.3.0/transformers/4.42.3/py310/Dockerfile)   | GPU         |
diff --git a/docs/source/features.mdx → docs/source/containers/features.mdx b/docs/source/features.mdx → docs/source/containers/features.mdx
@@ -1,6 +1,6 @@
-# 🔥 Features & benefits
+# Features & benefits
 
-The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models. They can be used in combination with Google Cloud offerings including Google Kubernetes Engine (GKE) and Vertex AI. GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize Large Language Models (LLMs).
+The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models.
 
 ## One command is all you need
 
@@ -10,7 +10,7 @@ With the new Hugging Face DLCs, train cutting-edge Transformers-based NLP models
 
 In addition to Hugging Face DLCs, we created a first-class Hugging Face library for inference, [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit), that comes with the Hugging Face PyTorch DLCs for inference, with full support on serving any PyTorch model on Google Cloud.
 
-Deploy your trained models for inference with just one more line of code or select [any of the 170,000+ publicly available models from the model Hub](https://huggingface.co/models?library=pytorch,transformers&sort=trending) and deploy them on either Vertex AI or GKE.
+Deploy your trained models for inference with just one more line of code or select [any of the 170,000+ publicly available models from the model Hub](https://huggingface.co/models?library=pytorch,transformers&sort=trending).
 
 ## High-performance text generation and embedding
 
@@ -30,6 +30,3 @@ The Hugging Face Training DLCs are fully integrated with Google Cloud, enabling
 
 Hugging Face Inference DLCs provide you with production-ready endpoints that scale quickly with your Google Cloud environment, built-in monitoring, and a ton of enterprise features.
 
----
-
-Read more about both Vertex AI in [their official documentation](https://cloud.google.com/vertex-ai/docs) and GKE in [their official documentation](https://cloud.google.com/kubernetes-engine/docs).
diff --git a/docs/source/containers/introduction.mdx b/docs/source/containers/introduction.mdx
@@ -1,5 +1,11 @@
 # Introduction
 
-[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE).
+Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.
 
-The [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers) repository contains the container files for building Hugging Face-specific Deep Learning Containers (DLCs), examples on how to train and deploy models on Google Cloud. The containers are publicly maintained, updated and released periodically by Hugging Face and the Google Cloud Team and available for all Google Cloud Customers within the [Google Cloud's Artifact Registry](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face). For each supported combination of use-case (training, inference), accelerator type (CPU, GPU, TPU), and framework (PyTorch, TGI, TEI) containers are created.
+The containers are publicly maintained, updated and released periodically by Hugging Face and the Google Cloud Team and available for all Google Cloud Customers within the [Google Cloud’s Artifact Registry](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io). They can be used from any Google Cloud service such as:
+
+- [Vertex AI]((https://cloud.google.com/vertex-ai/docs)): Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize Large Language Models (LLMs).
+- [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/docs) (GKE): GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. 
+- [Cloud Run](https://cloud.google.com/run/docs) (in preview): Cloud Run is a serverless managed compute platform that enables you to run containers that are invocable via requests or events.
+
+We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services.
-We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services.
+We are curating a list of [examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmatically train and deploy models on these Google Cloud services.
-We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services.
+We are curating a list of [examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmatically train and deploy models on these Google Cloud services.
diff --git a/docs/source/index.mdx b/docs/source/index.mdx
@@ -4,23 +4,65 @@
 
 Hugging Face collaborates with Google across open science, open source, cloud, and hardware to enable companies to build their own AI with the latest open models from Hugging Face and the latest cloud and hardware features from Google Cloud.
 
-Hugging Face enables new experiences for Google Cloud customers. They can easily train and deploy Hugging Face models on Google Kubernetes Engine (GKE) and Vertex AI, on any hardware available in Google Cloud using Hugging Face Deep Learning Containers (DLCs).
+Hugging Face enables new experiences for Google Cloud customers. They can easily train and deploy Hugging Face models on Google Kubernetes Engine (GKE), Vertex AI, and Cloud Run, on any hardware available in Google Cloud using Hugging Face Deep Learning Containers (DLCs) or our no-code integrations.
 
-If you have any issues using Hugging Face on Google Cloud, you can get community support by creating a new topic in the [Forum](https://discuss.huggingface.co/c/google-cloud/69/l/latest) dedicated to Google Cloud usage.
+## Deploy Models on Google Cloud
+
+### With Hugging Face DLCs
+
+For advanced scenarios, you can pull any Hugging Face DLCs from the Google Cloud Artifact Registry directly in your environment. We are curating a list of notebook examples on how to deploy models with Hugging Face DLCs in:
+- [Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai#inference-examples)
+- [GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke#inference-examples)
+- [Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run#inference-examples) (preview)
+
+### From the Hub Model Page
-### From the Hub Model Page
+### From the Hub
-### From the Hub Model Page
+### From the Hub
+
+#### On Vertex AI or GKE
+
+If you want to deploy a model from the Hub in your Google Cloud account on Vertex AI or GKE, you can use our no-code integrations. Below, you will find step-by-step instructions on how to deploy [Gemma 2 9B](https://huggingface.co/google/gemma-2-9b-it):
+1. On the model page, open the “Deploy” menu, and select “Google Cloud”. This will bring you straight into the Google Cloud Console.
+2. Select Vertex AI or GKE as a deployment option.
+3. Paste a [Hugging Face Token](https://huggingface.co/docs/hub/en/security-tokens) with "Read access contents of all public gated repos you can access" permission.
+4. If Vertex AI is selected, click on "Deploy". If GKE is selected, paste the manifest code and apply to your EKS cluster.
+
+Alternatively, you can follow this short video.
+<video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/google-cloud/deploy-google-cloud.mp4" controls autoplay muted loop />
 
-## Train and Deploy Models on Google Cloud with Hugging Face Deep Learning Containers
+#### On Hugging Face Inference Endpoints
 
-Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.
+If you want to deploy a model from the hub but you don't have a Google Cloud environment, you can use Hugging Face [Inference Endpoints](https://huggingface.co/inference-endpoints/dedicated) on Google Cloud. Below, you will find step-by-step instructions on how to deploy [Gemma 2 9B](https://huggingface.co/google/gemma-2-9b-it):
+1. On the model page, open the “Deploy” menu, and select “Inference Endpoints (dedicated)”. This will now bring you in the Inference Endpoint deployment page.
+2. Select Google Cloud Platform, scroll down and click on "Create Endpoint". 
 
-For training, our DLCs are available for PyTorch via 🤗 Transformers. They include support for training on both GPUs and TPUs with libraries such as 🤗 TRL, Sentence Transformers, or 🧨 Diffusers.
+Alternatively, you can follow this short video.
+<video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/google-cloud/inference-endpoints.mp4" controls autoplay muted loop />
 
-For inference, we have a general-purpose PyTorch inference DLC, for serving models trained with any of those frameworks mentioned before on both CPU and GPU. There is also the Text Generation Inference (TGI) DLC for high-performance text generation of LLMs on both GPU and TPU. Finally, there is a Text Embeddings Inference (TEI) DLC for high-performance serving of embedding models on both CPU and GPU.
+### From Vertex AI Model Garden
 
-The DLCs are hosted in [Google Cloud Artifact Registry](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io) and can be used from any Google Cloud service such as Google Kubernetes Engine (GKE), Vertex AI, or Cloud Run (in preview).
+#### On Vertex AI or GKE
 
-Hugging Face DLCs are open source and licensed under Apache 2.0 within the [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers) repository. For premium support, our [Expert Support Program](https://huggingface.co/support) gives you direct dedicated support from our team.
+If you are used to browse models directly from Vertex AI Model Garden, we brought more than 4000 models from the Hugging Face Hub to it. Below, you will find step-by-step instructions on how to deploy [Gemma 2 9B](https://huggingface.co/google/gemma-2-9b-it):
+1. On [Vertex AI Model Garden landing page](https://console.cloud.google.com/vertex-ai/model-garden), you can browse Hugging Face models:
+    1. by clicking “Deploy From Hugging Face” at the top left
+    2. by scrolling down to see our curated list of 12 open source models 
+    3. by clicking on "Hugging Face" in the Featured Partner section to access a catalog of 4000+ models hosted on the Hub.
+2. Once you found the model that you want to deploy, you can select Vertex AI or GKE as a deployment option.
+3. Paste a [Hugging Face Token](https://huggingface.co/docs/hub/en/security-tokens) with "Read access contents of all public gated repos you can access" permission.
+4. If Vertex AI is selected, click on "Deploy". If GKE is selected, paste the manifest code and apply to your EKS cluster.
 
-You have two options to take advantage of these DLCs as a Google Cloud customer:
+Alternatively, you can follow this short video.
+<video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/google-cloud/vertex-ai-model-garden.mp4" controls autoplay muted loop />
+
+## Train models on Google Cloud
+
+### With Hugging Face DLCs
+
+For advanced scenarios, you can pull the containers from the Google Cloud Artifact Registry directly in your environment. We are curating a list of notebook examples on how to train models with Hugging Face DLCs in:
+- [Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai#training-examples)
+- [GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke#training-examples)
+
+## Support
+
+If you have any issues using Hugging Face on Google Cloud, you can get community support by creating a new topic in the [Forum](https://discuss.huggingface.co/c/google-cloud/69/l/latest) dedicated to Google Cloud usage.
 
-1. To [get started](https://huggingface.co/blog/google-cloud-model-garden), you can use our no-code integrations within Vertex AI or GKE.
-2. For more advanced scenarios, you can pull the containers from the Google Cloud Artifact Registry directly in your environment. [Here](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) is a list of notebooks examples.
+Hugging Face DLCs are open source and licensed under Apache 2.0 within the [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers) repository. For premium support, our [Expert Support Program](https://huggingface.co/support) gives you direct dedicated support from our team.
diff --git a/docs/source/resources.mdx b/docs/source/resources.mdx
@@ -1,4 +1,4 @@
-# 📄 Other Resources
+# Other Resources
 
 Learn how to use Hugging Face in Google Cloud by reading our blog posts, Google documentation and examples below.