diff --git a/README.md b/README.md index 3719173b..3992dd83 100644 --- a/README.md +++ b/README.md @@ -42,12 +42,12 @@ The [`examples`](./examples) directory contains examples for using the container ### Training Examples -| Service | Example | Title | -| --------- | ------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------- | -| Vertex AI | [examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai](./examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai) | Fine-tune Gemma 2B with PyTorch Training DLC using SFT + LoRA on Vertex AI | -| Vertex AI | [examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai](./examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai) | Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT on Vertex AI | -| GKE | [examples/gke/trl-full-fine-tuning](./examples/gke/trl-full-fine-tuning) | Fine-tune Gemma 2B with PyTorch Training DLC using SFT on GKE | -| GKE | [examples/gke/trl-lora-fine-tuning](./examples/gke/trl-lora-fine-tuning) | Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT + LoRA on GKE | +| Service | Example | Title | +| --------- | ------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------- | +| Vertex AI | [examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai](./examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai) | Fine-tune Gemma 2B with PyTorch Training DLC using SFT + LoRA on Vertex AI | +| Vertex AI | [examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai](./examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai) | Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT on Vertex AI | +| GKE | [examples/gke/trl-full-fine-tuning](./examples/gke/trl-full-fine-tuning) | Fine-tune Gemma 2B with PyTorch Training DLC using SFT on GKE | +| GKE | [examples/gke/trl-lora-fine-tuning](./examples/gke/trl-lora-fine-tuning) | Fine-tune Gemma2 2B with PyTorch Training DLC using SFT + LoRA on GKE | ### Inference Examples diff --git a/docs/source/resources.mdx b/docs/source/resources.mdx index 6f70c7e6..c88b824e 100644 --- a/docs/source/resources.mdx +++ b/docs/source/resources.mdx @@ -60,7 +60,7 @@ Learn how to use Hugging Face in Google Cloud by reading our blog posts, present - Training - [Fine-tune Gemma 2B with PyTorch Training DLC using SFT on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/trl-full-fine-tuning) - - [Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT + LoRA on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/trl-lora-fine-tuning) + - [Fine-tune Gemma2 2B with PyTorch Training DLC using SFT + LoRA on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/trl-lora-fine-tuning) ### (Preview) Cloud Run diff --git a/examples/cloud-run/README.md b/examples/cloud-run/README.md index 20cca87f..b92e7fcd 100644 --- a/examples/cloud-run/README.md +++ b/examples/cloud-run/README.md @@ -14,5 +14,4 @@ This directory contains usage examples of the Hugging Face Deep Learning Contain ## Training Examples -Coming soon! - +Coming soon! \ No newline at end of file diff --git a/examples/gke/README.md b/examples/gke/README.md index 8bb0057a..60dca91d 100644 --- a/examples/gke/README.md +++ b/examples/gke/README.md @@ -4,10 +4,10 @@ This directory contains usage examples of the Hugging Face Deep Learning Contain ## Training Examples -| Example | Title | -| ---------------------------------------------- | --------------------------------------------------------------------------- | -| [trl-full-fine-tuning](./trl-full-fine-tuning) | Fine-tune Gemma 2B with PyTorch Training DLC using SFT on GKE | -| [trl-lora-fine-tuning](./trl-lora-fine-tuning) | Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT + LoRA on GKE | +| Example | Title | +| ---------------------------------------------- | --------------------------------------------------------------------- | +| [trl-full-fine-tuning](./trl-full-fine-tuning) | Fine-tune Gemma 2B with PyTorch Training DLC using SFT on GKE | +| [trl-lora-fine-tuning](./trl-lora-fine-tuning) | Fine-tune Gemma2 2B with PyTorch Training DLC using SFT + LoRA on GKE | ## Inference Examples diff --git a/examples/gke/trl-lora-fine-tuning/README.md b/examples/gke/trl-lora-fine-tuning/README.md index 60eacf2c..c7e49433 100644 --- a/examples/gke/trl-lora-fine-tuning/README.md +++ b/examples/gke/trl-lora-fine-tuning/README.md @@ -1,13 +1,13 @@ --- -title: Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT + LoRA on GKE +title: Fine-tune Gemma2 2B with PyTorch Training DLC using SFT + LoRA on GKE type: training --- -# Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT + LoRA on GKE +# Fine-tune Gemma2 2B with PyTorch Training DLC using SFT + LoRA on GKE -Mistral is a family of models with varying sizes, created by the Mistral AI team; the Mistral 7B v0.3 LLM is a Mistral 7B v0.2 with extended vocabulary. TRL is a full stack library to fine-tune and align Large Language Models (LLMs) developed by Hugging Face. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. +Gemma 2 is an advanced, lightweight open model that enhances performance and efficiency while building on the research and technology of its predecessor and the Gemini models developed by Google DeepMind and other teams across Google. TRL is a full stack library to fine-tune and align Large Language Models (LLMs) developed by Hugging Face. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. -This example showcases how to fine-tune Mistral 7B v0.3 with TRL via Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA) in a single GPU on a GKE Cluster. +This example showcases how to fine-tune Google Gemma2 2B with TRL via Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA) in a single GPU on a GKE Cluster. ## Setup / Configuration @@ -22,6 +22,7 @@ Optionally, to ease the usage of the commands within this tutorial, you need to export PROJECT_ID=your-project-id export LOCATION=your-location export CLUSTER_NAME=your-cluster-name +export BUCKET_NAME=your-bucket-name ``` Then you need to login into your GCP account and set the project ID to the one you want to use for the deployment of the GKE Cluster. @@ -88,6 +89,20 @@ Once the GKE Cluster is created, you can get the credentials to access it via `k gcloud container clusters get-credentials $CLUSTER_NAME --location=$LOCATION ``` +## Optional: Create bucket and upload model from Hub in GCS + +> Unless you already have a GCS Bucket, please follow the instructions below in order to create a new bucket where the generated fine-tuning artifacts will be uploaded to. + +To create the bucket on Google Cloud Storage (GCS), you first need to ensure that the name is unique for the new bucket or if a bucket with the same name already exists. + +```bash +gcloud components install gsutil + +if [ -z \"$(gsutil ls | grep gs://$BUCKET_NAME)\" ]; then + gcloud storage buckets create gs://$BUCKET_NAME --project=$PROJECT_ID --location=$LOCATION --default-storage-class=STANDARD --uniform-bucket-level-access +fi +``` + ## Configure IAM for GCS Before you run the fine-tuning job of the Hugging Face PyTorch DLC for training on the GKE Cluster, you need to set the IAM permissions for the GCS bucket so that the pod in the GKE Cluster can access the bucket, that will be mounted into the running container and use to write the generated artifacts so that those are automatically uploaded to the GCS Bucket. To do so, you need to create a namespace and a service account in the GKE Cluster, and then set the IAM permissions for the GCS Bucket. @@ -117,7 +132,7 @@ gcloud storage buckets add-iam-policy-binding \ ## Optional: Set Secrets in GKE -As [`mistralai/Mistral-7B-v0.3`](https://huggingface.co/mistralai/Mistral-7B-v0.3) is a gated model, you need to set a Kubernetes secret with the Hugging Face Hub token via `kubectl`. +As [`google/gemma-2-2b-it`](https://huggingface.co/google/gemma-2-2b-it) is a gated model, you need to set a Kubernetes secret with the Hugging Face Hub token via `kubectl`. To generate a custom token for the Hugging Face Hub, you can follow the instructions at ; and the recommended way of setting it is to install the `huggingface_hub` Python SDK as follows: @@ -151,22 +166,12 @@ kubectl create secret generic hf-secret \ More information on how to set Kubernetes secrets in a GKE Cluster at . -## Define Job Configuration - -Before proceeding into the Kubernetes deployment of the batch job via the Hugging Face PyTorch DLC for training, you need to define first the configuration required for the job to run successfully i.e. which GPU is capable of fine-tuning [`mistralai/Mistral-7B-v0.3`](https://huggingface.co/mistralai/Mistral-7B-v0.3) in `bfloat16` using LoRA. - -As a rough calculation, you could assume that the amount of GPU VRAM required to fine-tune a model in half precision is about four times the model size (read more about it in [Eleuther AI - Transformer Math 101](https://blog.eleuther.ai/transformer-math/)). - -Alternatively, if your model is uploaded to the Hugging Face Hub, you can check the numbers in the community space [`Vokturz/can-it-run-llm`](https://huggingface.co/spaces/Vokturz/can-it-run-llm), which does those calculations for you, based the model to fine-tune and the available hardware. - -![`Vokturz/can-it-run-llm` for `mistralai/Mistral-7B-v0.3`](./imgs/can-it-run-llm.png) - ## Run Job -Now you can already run the Kubernetes job in the Hugging Face PyTorch DLC for training on the GKE Cluster via `kubectl` from the [`job.yaml`](./job.yaml) configuration file, that contains the job specification for running the command `trl sft` provided by the TRL CLI for the SFT LoRA fine-tuning of [`mistralai/Mistral-7B-v0.3`](https://huggingface.co/mistralai/Mistral-7B-v0.3) in `bfloat16` using [`timdettmers/openassistant-guanaco`](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), which is a subset from [`OpenAssistant/oasst1`](https://huggingface.co/datasets/OpenAssistant/oasst1) with ~10k samples in a single L4 24GiB GPU, storing the generated artifacts into a volume mount under `/data` linked to a GCS Bucket. +Now you can already run the Kubernetes job in the Hugging Face PyTorch DLC for training on the GKE Cluster via `kubectl` from the [`job.yaml`](./job.yaml) configuration file, that contains the job specification for running the command `trl sft` provided by the TRL CLI for the SFT LoRA fine-tuning of [`google/gemma-2-2b-it`](https://huggingface.co/google/gemma-2-2b-it) in `bfloat16` using [`google-cloud-partnership/Magicoder-Gemma2`](https://huggingface.co/datasets/google-cloud-partnership/Magicoder-Gemma2), which is a dataset formatted using the Gemma2 chat formatting, originally coming from [`ise-uiuc/Magicoder-OSS-Instruct-75K`](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K) with ~10k samples in a single L4 24GiB GPU, storing the generated artifacts into a volume mount under `/data` linked to a GCS Bucket. ```bash -git clone https://github.com/huggingface/Google-Cloud-Containers +git clone https://github.com/huggingface/Google-Cloud-Containers@gke-lora-ft-gemma kubectl apply -f Google-Cloud-Containers/examples/gke/trl-lora-fine-tuning/job.yaml ``` diff --git a/examples/gke/trl-lora-fine-tuning/job.yaml b/examples/gke/trl-lora-fine-tuning/job.yaml index 19660d3c..487dee70 100644 --- a/examples/gke/trl-lora-fine-tuning/job.yaml +++ b/examples/gke/trl-lora-fine-tuning/job.yaml @@ -9,8 +9,8 @@ spec: name: trl labels: app: trl - hf.co/model: mistralai--Mistral-7B-v0.3 - hf.co/dataset: timdettmers--openassistant-guanaco + hf.co/model: google--gemma-2-2b + hf.co/dataset: google-cloud-partnership--Magicoder-Gemma2 annotations: gke-gcsfuse/volumes: "true" gke-gcsfuse/ephemeral-storage-request: 200Gi @@ -19,8 +19,8 @@ spec: cloud.google.com/gke-accelerator: nvidia-l4 cloud.google.com/compute-class: Accelerator containers: - - name: trl-container - image: us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.2-3.transformers.4-42.ubuntu2204.py310:latest + - name: trl + image: "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.2-3.transformers.4-42.ubuntu2204.py310" command: - "/bin/bash" - "-c" @@ -28,34 +28,35 @@ spec: - "--" args: # MODEL - - "--model_name_or_path=mistralai/Mistral-7B-v0.3" + - "--model_name_or_path=google/gemma-2-2b" - "--torch_dtype=bfloat16" - - "--attn_implementation=flash_attention_2" + - "--attn_implementation=eager" # DATASET - - "--dataset_name=timdettmers/openassistant-guanaco" + - "--dataset_name=google-cloud-partnership/Magicoder-Gemma2" - "--dataset_text_field=text" # PEFT - "--use_peft" - - "--lora_r=16" - - "--lora_alpha=32" - - "--lora_dropout=0.1" + - "--lora_r=8" + - "--lora_alpha=16" + - "--lora_dropout=0.01" - "--lora_target_modules=all-linear" # TRAINER - "--bf16" - - "--max_seq_length=1024" - - "--per_device_train_batch_size=2" - - "--gradient_accumulation_steps=8" + - "--max_seq_length=512" - "--gradient_checkpointing" + - "--gradient_accumulation_steps=4" + - "--per_device_train_batch_size=2" + - "--per_device_eval_batch_size=2" - "--learning_rate=0.0002" - "--lr_scheduler_type=cosine" - "--optim=adamw_bnb_8bit" - "--num_train_epochs=3" - - "--logging_steps=10" - "--do_eval" - - "--eval_steps=100" + - "--eval_strategy=epoch" + - "--logging_steps=10" - "--report_to=none" - "--save_strategy=epoch" - - "--output_dir=/data/Mistral-7B-v0.3-SFT-LoRA" + - "--output_dir=/data/gemma-2-2b-it-SFT-LoRA" - "--overwrite_output_dir" - "--seed=42" - "--log_level=info"