diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index c5371cfd..5d285fdf 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -1,14 +1,14 @@ - sections: - local: index title: Hugging Face on Google Cloud - - local: features - title: Features & benefits - local: resources title: Other Resources title: Getting Started - sections: - local: containers/introduction title: Introduction + - local: containers/features + title: Features & benefits - local: containers/available title: Available DLCs on Google Cloud title: Deep Learning Containers (DLCs) diff --git a/docs/source/containers/available.mdx b/docs/source/containers/available.mdx index 5a79f5a9..a294a208 100644 --- a/docs/source/containers/available.mdx +++ b/docs/source/containers/available.mdx @@ -1,6 +1,6 @@ # DLCs on Google Cloud -Below you can find a listing of all the Deep Learning Containers (DLCs) available on Google Cloud. +Below you can find a listing of all the Deep Learning Containers (DLCs) available on Google Cloud. Containers are created for each supported combination of use-case (training, inference), accelerator type (CPU, GPU, TPU), and framework (PyTorch, TGI, TEI). @@ -10,12 +10,16 @@ The listing below only contains the latest version of each one of the Hugging Fa ## Text Generation Inference (TGI) +Text Generation Inference (TGI) DLC is available for high-performance text generation of Large Language Models on both GPU and TPU (soon). The TGI DLC enables you to deploy [any of the +140,000 text generation inference supported models from the Hugging Face Hub](https://huggingface.co/models?other=text-generation-inference&sort=trending), or any custom model as long as [its architecture is supported within TGI](https://huggingface.co/docs/text-generation-inference/supported_models). + | Container URI | Path | Accelerator | | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | | us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-2.ubuntu2204.py310 | [text-generation-inference-gpu.2.2.0](./containers/tgi/gpu/2.2.0/Dockerfile) | GPU | ## Text Embeddings Inference (TEI) +Text Embeddings Inference (TEI) DLC is available for high-performance serving of embedding models on both GPU and GPU. The TEI DLC enables you to deploy [any of the +10,000 embedding, re-ranking or sequence classification supported models from the Hugging Face Hub](https://huggingface.co/models?other=text-embeddings-inference&sort=trending), or any custom model as long as [its architecture is supported within TEI](https://huggingface.co/docs/text-embeddings-inference/en/supported_models). + | Container URI | Path | Accelerator | | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | | us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cu122.1-4.ubuntu2204 | [text-embeddings-inference-gpu.1.4.0](./containers/tei/gpu/1.4.0/Dockerfile) | GPU | @@ -23,6 +27,8 @@ The listing below only contains the latest version of each one of the Hugging Fa ## PyTorch Inference +Pytorch Inference DLC is available for Pytorch via 🤗 Transformers, for serving models trained with 🤗 TRL, Sentence Transformers or 🧨 Diffusers, on both CPU and GPU. + | Container URI | Path | Accelerator | | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | | us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-inference-cu121.2-2.transformers.4-44.ubuntu2204.py311 | [huggingface-pytorch-inference-gpu.2.2.2.transformers.4.44.0.py311](./containers/pytorch/inference/gpu/2.2.2/transformers/4.44.0/py311/Dockerfile) | GPU | @@ -30,6 +36,8 @@ The listing below only contains the latest version of each one of the Hugging Fa ## PyTorch Training +Pytorch Training DLC is available for PyTorch via 🤗 Transformers. It includes support for training with libraries such as 🤗 TRL, Sentence Transformers, or 🧨 Diffusers, on both GPUs and TPUs (soon). + | Container URI | Path | Accelerator | | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | | us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.2-3.transformers.4-42.ubuntu2204.py310 | [huggingface-pytorch-training-gpu.2.3.0.transformers.4.42.3.py310](./containers/pytorch/training/gpu/2.3.0/transformers/4.42.3/py310/Dockerfile) | GPU | diff --git a/docs/source/features.mdx b/docs/source/containers/features.mdx similarity index 79% rename from docs/source/features.mdx rename to docs/source/containers/features.mdx index 95eab807..22f926a9 100644 --- a/docs/source/features.mdx +++ b/docs/source/containers/features.mdx @@ -1,6 +1,6 @@ -# 🔥 Features & benefits +# Features & benefits -The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models. They can be used in combination with Google Cloud offerings including Google Kubernetes Engine (GKE) and Vertex AI. GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize Large Language Models (LLMs). +The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models. ## One command is all you need @@ -10,7 +10,7 @@ With the new Hugging Face DLCs, train cutting-edge Transformers-based NLP models In addition to Hugging Face DLCs, we created a first-class Hugging Face library for inference, [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit), that comes with the Hugging Face PyTorch DLCs for inference, with full support on serving any PyTorch model on Google Cloud. -Deploy your trained models for inference with just one more line of code or select [any of the 170,000+ publicly available models from the model Hub](https://huggingface.co/models?library=pytorch,transformers&sort=trending) and deploy them on either Vertex AI or GKE. +Deploy your trained models for inference with just one more line of code or select [any of the 170,000+ publicly available models from the model Hub](https://huggingface.co/models?library=pytorch,transformers&sort=trending). ## High-performance text generation and embedding @@ -30,6 +30,3 @@ The Hugging Face Training DLCs are fully integrated with Google Cloud, enabling Hugging Face Inference DLCs provide you with production-ready endpoints that scale quickly with your Google Cloud environment, built-in monitoring, and a ton of enterprise features. ---- - -Read more about both Vertex AI in [their official documentation](https://cloud.google.com/vertex-ai/docs) and GKE in [their official documentation](https://cloud.google.com/kubernetes-engine/docs). diff --git a/docs/source/containers/introduction.mdx b/docs/source/containers/introduction.mdx index 538de157..278f9345 100644 --- a/docs/source/containers/introduction.mdx +++ b/docs/source/containers/introduction.mdx @@ -1,5 +1,11 @@ # Introduction -[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE). +Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch. -The [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers) repository contains the container files for building Hugging Face-specific Deep Learning Containers (DLCs), examples on how to train and deploy models on Google Cloud. The containers are publicly maintained, updated and released periodically by Hugging Face and the Google Cloud Team and available for all Google Cloud Customers within the [Google Cloud's Artifact Registry](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face). For each supported combination of use-case (training, inference), accelerator type (CPU, GPU, TPU), and framework (PyTorch, TGI, TEI) containers are created. +The containers are publicly maintained, updated and released periodically by Hugging Face and the Google Cloud Team and available for all Google Cloud Customers within the [Google Cloud’s Artifact Registry](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io). They can be used from any Google Cloud service such as: + +- [Vertex AI]((https://cloud.google.com/vertex-ai/docs)): Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize Large Language Models (LLMs). +- [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/docs) (GKE): GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. +- [Cloud Run](https://cloud.google.com/run/docs) (in preview): Cloud Run is a serverless managed compute platform that enables you to run containers that are invocable via requests or events. + +We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services. diff --git a/docs/source/index.mdx b/docs/source/index.mdx index 6a3c7264..3582ac23 100644 --- a/docs/source/index.mdx +++ b/docs/source/index.mdx @@ -4,23 +4,65 @@ Hugging Face collaborates with Google across open science, open source, cloud, and hardware to enable companies to build their own AI with the latest open models from Hugging Face and the latest cloud and hardware features from Google Cloud. -Hugging Face enables new experiences for Google Cloud customers. They can easily train and deploy Hugging Face models on Google Kubernetes Engine (GKE) and Vertex AI, on any hardware available in Google Cloud using Hugging Face Deep Learning Containers (DLCs). +Hugging Face enables new experiences for Google Cloud customers. They can easily train and deploy Hugging Face models on Google Kubernetes Engine (GKE), Vertex AI, and Cloud Run, on any hardware available in Google Cloud using Hugging Face Deep Learning Containers (DLCs) or our no-code integrations. -If you have any issues using Hugging Face on Google Cloud, you can get community support by creating a new topic in the [Forum](https://discuss.huggingface.co/c/google-cloud/69/l/latest) dedicated to Google Cloud usage. +## Deploy Models on Google Cloud + +### With Hugging Face DLCs + +For advanced scenarios, you can pull any Hugging Face DLCs from the Google Cloud Artifact Registry directly in your environment. We are curating a list of notebook examples on how to deploy models with Hugging Face DLCs in: +- [Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai#inference-examples) +- [GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke#inference-examples) +- [Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run#inference-examples) (preview) + +### From the Hub Model Page + +#### On Vertex AI or GKE + +If you want to deploy a model from the Hub in your Google Cloud account on Vertex AI or GKE, you can use our no-code integrations. Below, you will find step-by-step instructions on how to deploy [Gemma 2 9B](https://huggingface.co/google/gemma-2-9b-it): +1. On the model page, open the “Deploy” menu, and select “Google Cloud”. This will bring you straight into the Google Cloud Console. +2. Select Vertex AI or GKE as a deployment option. +3. Paste a [Hugging Face Token](https://huggingface.co/docs/hub/en/security-tokens) with "Read access contents of all public gated repos you can access" permission. +4. If Vertex AI is selected, click on "Deploy". If GKE is selected, paste the manifest code and apply to your EKS cluster. + +Alternatively, you can follow this short video. +