This repository contains two Jupyter notebooks created for the GCP Guild AOssociation, providing hands-on guidance for deploying AI models to Google Cloud Run. Each notebook walks through the steps required to set up, containerize, and deploy AI workloads effectively.
This notebook provides a step-by-step guide to deploying the Gemma2 2B parameter model on GCP. It covers:
- Environment setup and configuration
- Containerizing the model
- Uploading to Google Artifact Registry
- Deploying to Cloud Run
This notebook demonstrates deploying Ollama as a sidecar alongside Open WebUI as a frontend ingress container on Cloud Run. The guide covers:
- Configuring a multi-container Cloud Run deployment
- Allocating resources for efficient model inference
- Integrating Open WebUI for a user-friendly interface
Before running these notebooks, ensure you have the following:
- A Google Cloud Platform (GCP) account
- Enabled billing and access to Cloud Run and Vertex AI