Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
- sections:
- local: index
title: Hugging Face on Google Cloud
- local: features
title: Features & benefits
- local: resources
title: Other Resources
title: Getting Started
- sections:
- local: containers/introduction
title: Introduction
- local: containers/features
title: Features & benefits
- local: containers/available
title: Available DLCs on Google Cloud
title: Deep Learning Containers (DLCs)
10 changes: 9 additions & 1 deletion docs/source/containers/available.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DLCs on Google Cloud

Below you can find a listing of all the Deep Learning Containers (DLCs) available on Google Cloud.
Below you can find a listing of all the Deep Learning Containers (DLCs) available on Google Cloud. Containers are created for each supported combination of use-case (training, inference), accelerator type (CPU, GPU, TPU), and framework (PyTorch, TGI, TEI).

<Tip>

Expand All @@ -10,26 +10,34 @@ The listing below only contains the latest version of each one of the Hugging Fa

## Text Generation Inference (TGI)

Text Generation Inference (TGI) DLC is available for high-performance text generation of Large Language Models on both GPU and TPU (soon). The TGI DLC enables you to deploy [any of the +140,000 text generation inference supported models from the Hugging Face Hub](https://huggingface.co/models?other=text-generation-inference&sort=trending), or any custom model as long as [its architecture is supported within TGI](https://huggingface.co/docs/text-generation-inference/supported_models).

| Container URI | Path | Accelerator |
| --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-2.ubuntu2204.py310 | [text-generation-inference-gpu.2.2.0](./containers/tgi/gpu/2.2.0/Dockerfile) | GPU |

## Text Embeddings Inference (TEI)

Text Embeddings Inference (TEI) DLC is available for high-performance serving of embedding models on both GPU and GPU. The TEI DLC enables you to deploy [any of the +10,000 embedding, re-ranking or sequence classification supported models from the Hugging Face Hub](https://huggingface.co/models?other=text-embeddings-inference&sort=trending), or any custom model as long as [its architecture is supported within TEI](https://huggingface.co/docs/text-embeddings-inference/en/supported_models).

| Container URI | Path | Accelerator |
| --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cu122.1-4.ubuntu2204 | [text-embeddings-inference-gpu.1.4.0](./containers/tei/gpu/1.4.0/Dockerfile) | GPU |
| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cpu.1-4 | [text-embeddings-inference-cpu.1.4.0](./containers/tei/cpu/1.4.0/Dockerfile) | CPU |

## PyTorch Inference

Pytorch Inference DLC is available for Pytorch via 🤗 Transformers, for serving models trained with 🤗 TRL, Sentence Transformers or 🧨 Diffusers, on both CPU and GPU.

| Container URI | Path | Accelerator |
| --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-inference-cu121.2-2.transformers.4-44.ubuntu2204.py311 | [huggingface-pytorch-inference-gpu.2.2.2.transformers.4.44.0.py311](./containers/pytorch/inference/gpu/2.2.2/transformers/4.44.0/py311/Dockerfile) | GPU |
| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-inference-cpu.2-2.transformers.4-44.ubuntu2204.py311 | [huggingface-pytorch-inference-cpu.2.2.2.transformers.4.44.0.py311](./containers/pytorch/inference/cpu/2.2.2/transformers/4.44.0/py311/Dockerfile) | CPU |

## PyTorch Training

Pytorch Training DLC is available for PyTorch via 🤗 Transformers. It includes support for training with libraries such as 🤗 TRL, Sentence Transformers, or 🧨 Diffusers, on both GPUs and TPUs (soon).

| Container URI | Path | Accelerator |
| --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.2-3.transformers.4-42.ubuntu2204.py310 | [huggingface-pytorch-training-gpu.2.3.0.transformers.4.42.3.py310](./containers/pytorch/training/gpu/2.3.0/transformers/4.42.3/py310/Dockerfile) | GPU |
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 🔥 Features & benefits
# Features & benefits

The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models. They can be used in combination with Google Cloud offerings including Google Kubernetes Engine (GKE) and Vertex AI. GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize Large Language Models (LLMs).
The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models.

## One command is all you need

Expand All @@ -10,7 +10,7 @@ With the new Hugging Face DLCs, train cutting-edge Transformers-based NLP models

In addition to Hugging Face DLCs, we created a first-class Hugging Face library for inference, [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit), that comes with the Hugging Face PyTorch DLCs for inference, with full support on serving any PyTorch model on Google Cloud.

Deploy your trained models for inference with just one more line of code or select [any of the 170,000+ publicly available models from the model Hub](https://huggingface.co/models?library=pytorch,transformers&sort=trending) and deploy them on either Vertex AI or GKE.
Deploy your trained models for inference with just one more line of code or select [any of the 170,000+ publicly available models from the model Hub](https://huggingface.co/models?library=pytorch,transformers&sort=trending).

## High-performance text generation and embedding

Expand All @@ -30,6 +30,3 @@ The Hugging Face Training DLCs are fully integrated with Google Cloud, enabling

Hugging Face Inference DLCs provide you with production-ready endpoints that scale quickly with your Google Cloud environment, built-in monitoring, and a ton of enterprise features.

---

Read more about both Vertex AI in [their official documentation](https://cloud.google.com/vertex-ai/docs) and GKE in [their official documentation](https://cloud.google.com/kubernetes-engine/docs).
Comment on lines -33 to -35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove?

10 changes: 8 additions & 2 deletions docs/source/containers/introduction.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Introduction

[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE).
Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.
Copy link
Member

@alvarobartt alvarobartt Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should mention TGI and TEI too right? We can phrase it as the following (but with better wording)

"DLCs are Docker images pre-installed with deep learning solutions such as TGI and TEI for inference; or frameworks as Transformers for both training and inference."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use "🤗 Transformers" emojis anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI those are not frameworks we have libraries (transformers) and solutions (TGI)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the direct link, we can replace it with on our side if we have one. i would no use corporate "blurb" lets keep it direct and simple.

Face Deep Learning Containers for Google Cloud are optimized Docker containers for training and deploying Generative AI models including deep learning libraries like Transformers, Datasets, Tokenizers or Diffusers and and purpose built versions of Hugging Face Text Generation Inference (TGI) and Text Embedding Inference (TEI).
DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.


The [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers) repository contains the container files for building Hugging Face-specific Deep Learning Containers (DLCs), examples on how to train and deploy models on Google Cloud. The containers are publicly maintained, updated and released periodically by Hugging Face and the Google Cloud Team and available for all Google Cloud Customers within the [Google Cloud's Artifact Registry](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face). For each supported combination of use-case (training, inference), accelerator type (CPU, GPU, TPU), and framework (PyTorch, TGI, TEI) containers are created.
The containers are publicly maintained, updated and released periodically by Hugging Face and the Google Cloud Team and available for all Google Cloud Customers within the [Google Cloud’s Artifact Registry](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io). They can be used from any Google Cloud service such as:

- [Vertex AI]((https://cloud.google.com/vertex-ai/docs)): Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize Large Language Models (LLMs).
- [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/docs) (GKE): GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure.
- [Cloud Run](https://cloud.google.com/run/docs) (in preview): Cloud Run is a serverless managed compute platform that enables you to run containers that are invocable via requests or events.

We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only notebooks, and fixed a typo in programmatically

Suggested change
We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services.
We are curating a list of [examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmatically train and deploy models on these Google Cloud services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mix times, "curated"

64 changes: 53 additions & 11 deletions docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,65 @@

Hugging Face collaborates with Google across open science, open source, cloud, and hardware to enable companies to build their own AI with the latest open models from Hugging Face and the latest cloud and hardware features from Google Cloud.

Hugging Face enables new experiences for Google Cloud customers. They can easily train and deploy Hugging Face models on Google Kubernetes Engine (GKE) and Vertex AI, on any hardware available in Google Cloud using Hugging Face Deep Learning Containers (DLCs).
Hugging Face enables new experiences for Google Cloud customers. They can easily train and deploy Hugging Face models on Google Kubernetes Engine (GKE), Vertex AI, and Cloud Run, on any hardware available in Google Cloud using Hugging Face Deep Learning Containers (DLCs) or our no-code integrations.

If you have any issues using Hugging Face on Google Cloud, you can get community support by creating a new topic in the [Forum](https://discuss.huggingface.co/c/google-cloud/69/l/latest) dedicated to Google Cloud usage.
## Deploy Models on Google Cloud

### With Hugging Face DLCs

For advanced scenarios, you can pull any Hugging Face DLCs from the Google Cloud Artifact Registry directly in your environment. We are curating a list of notebook examples on how to deploy models with Hugging Face DLCs in:
- [Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai#inference-examples)
- [GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke#inference-examples)
- [Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run#inference-examples) (preview)

### From the Hub Model Page
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### From the Hub Model Page
### From the Hub


#### On Vertex AI or GKE

If you want to deploy a model from the Hub in your Google Cloud account on Vertex AI or GKE, you can use our no-code integrations. Below, you will find step-by-step instructions on how to deploy [Gemma 2 9B](https://huggingface.co/google/gemma-2-9b-it):
1. On the model page, open the “Deploy” menu, and select “Google Cloud”. This will bring you straight into the Google Cloud Console.
2. Select Vertex AI or GKE as a deployment option.
3. Paste a [Hugging Face Token](https://huggingface.co/docs/hub/en/security-tokens) with "Read access contents of all public gated repos you can access" permission.
4. If Vertex AI is selected, click on "Deploy". If GKE is selected, paste the manifest code and apply to your EKS cluster.

Alternatively, you can follow this short video.
<video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/google-cloud/deploy-google-cloud.mp4" controls autoplay muted loop />

## Train and Deploy Models on Google Cloud with Hugging Face Deep Learning Containers
#### On Hugging Face Inference Endpoints

Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.
If you want to deploy a model from the hub but you don't have a Google Cloud environment, you can use Hugging Face [Inference Endpoints](https://huggingface.co/inference-endpoints/dedicated) on Google Cloud. Below, you will find step-by-step instructions on how to deploy [Gemma 2 9B](https://huggingface.co/google/gemma-2-9b-it):
1. On the model page, open the “Deploy” menu, and select “Inference Endpoints (dedicated)”. This will now bring you in the Inference Endpoint deployment page.
2. Select Google Cloud Platform, scroll down and click on "Create Endpoint".

For training, our DLCs are available for PyTorch via 🤗 Transformers. They include support for training on both GPUs and TPUs with libraries such as 🤗 TRL, Sentence Transformers, or 🧨 Diffusers.
Alternatively, you can follow this short video.
<video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/google-cloud/inference-endpoints.mp4" controls autoplay muted loop />
Comment on lines +31 to +38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should add a Inference Endpoints section here. We should rather have that in the inference endpoints doc. We don't use any of the containers or solutions in IE.


For inference, we have a general-purpose PyTorch inference DLC, for serving models trained with any of those frameworks mentioned before on both CPU and GPU. There is also the Text Generation Inference (TGI) DLC for high-performance text generation of LLMs on both GPU and TPU. Finally, there is a Text Embeddings Inference (TEI) DLC for high-performance serving of embedding models on both CPU and GPU.
### From Vertex AI Model Garden
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this section up, between DLCs and Hub


The DLCs are hosted in [Google Cloud Artifact Registry](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io) and can be used from any Google Cloud service such as Google Kubernetes Engine (GKE), Vertex AI, or Cloud Run (in preview).
#### On Vertex AI or GKE

Hugging Face DLCs are open source and licensed under Apache 2.0 within the [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers) repository. For premium support, our [Expert Support Program](https://huggingface.co/support) gives you direct dedicated support from our team.
If you are used to browse models directly from Vertex AI Model Garden, we brought more than 4000 models from the Hugging Face Hub to it. Below, you will find step-by-step instructions on how to deploy [Gemma 2 9B](https://huggingface.co/google/gemma-2-9b-it):
1. On [Vertex AI Model Garden landing page](https://console.cloud.google.com/vertex-ai/model-garden), you can browse Hugging Face models:
1. by clicking “Deploy From Hugging Face” at the top left
2. by scrolling down to see our curated list of 12 open source models
3. by clicking on "Hugging Face" in the Featured Partner section to access a catalog of 4000+ models hosted on the Hub.
2. Once you found the model that you want to deploy, you can select Vertex AI or GKE as a deployment option.
3. Paste a [Hugging Face Token](https://huggingface.co/docs/hub/en/security-tokens) with "Read access contents of all public gated repos you can access" permission.
4. If Vertex AI is selected, click on "Deploy". If GKE is selected, paste the manifest code and apply to your EKS cluster.

You have two options to take advantage of these DLCs as a Google Cloud customer:
Alternatively, you can follow this short video.
<video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/google-cloud/vertex-ai-model-garden.mp4" controls autoplay muted loop />

## Train models on Google Cloud

### With Hugging Face DLCs

For advanced scenarios, you can pull the containers from the Google Cloud Artifact Registry directly in your environment. We are curating a list of notebook examples on how to train models with Hugging Face DLCs in:
- [Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai#training-examples)
- [GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke#training-examples)

## Support

If you have any issues using Hugging Face on Google Cloud, you can get community support by creating a new topic in the [Forum](https://discuss.huggingface.co/c/google-cloud/69/l/latest) dedicated to Google Cloud usage.

1. To [get started](https://huggingface.co/blog/google-cloud-model-garden), you can use our no-code integrations within Vertex AI or GKE.
2. For more advanced scenarios, you can pull the containers from the Google Cloud Artifact Registry directly in your environment. [Here](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) is a list of notebooks examples.
Hugging Face DLCs are open source and licensed under Apache 2.0 within the [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers) repository. For premium support, our [Expert Support Program](https://huggingface.co/support) gives you direct dedicated support from our team.
2 changes: 1 addition & 1 deletion docs/source/resources.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# 📄 Other Resources
# Other Resources

Learn how to use Hugging Face in Google Cloud by reading our blog posts, Google documentation and examples below.

Expand Down