Adding ShieldGemma 2 notebook to Responsible AI Toolkit docs

RyanMullins · RyanMullins · commit c4e5c0da8366 · 2025-03-21T10:29:17.000-04:00
diff --git a/site/en/responsible/docs/safeguards/shieldgemma2_on_huggingface.ipynb b/site/en/responsible/docs/safeguards/shieldgemma2_on_huggingface.ipynb
@@ -0,0 +1,214 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cLCmbOz_5tWH"
+      },
+      "source": [
+        "##### Copyright 2025 Google LLC"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "vdPaBz5y5LHW"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+        "# you may not use this file except in compliance with the License.\n",
+        "# You may obtain a copy of the License at\n",
+        "#\n",
+        "# https://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing, software\n",
+        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+        "# See the License for the specific language governing permissions and\n",
+        "# limitations under the License."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "3Zd1278P5wt_"
+      },
+      "source": [
+        "# Evaluating content safety with ShieldGemma 2 and Hugging Face Transformers"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4IlgEYUj7xdW"
+      },
+      "source": [
+        "The **ShieldGemma 2** model is trained to detect key harms detailed in the [model card](https://ai.google.dev/gemma/docs/shieldgemma/model_card_2). This guide demonstrates how to use Hugging Face Transformers to build robust data and models.\n",
+        "\n",
+        "Note that `ShieldGemma 2` is trained to classify only one harm type at a time, so you will need to make a separate call to `ShieldGemma 2` for each harm type you want to check against. You may have additional that you can use model tuning techniques on `ShieldGemma 2`."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "RhlnMQoK9fZG"
+      },
+      "source": [
+        "# Supported safety checks\n",
+        "\n",
+        "**ShieldGemma2** is a model trained on Gemma 3's 4B IT checkpoint and is trained to detect and predict violations of key harm types listed below:\n",
+        "\n",
+        "* **Dangerous Content**:  The image shall not contain content that facilitates or encourages activities that could cause real-world harm (e.g., building firearms and explosive devices, promotion of terrorism, instructions for suicide).\n",
+        "\n",
+        "* **Sexually Explicit**: The image shall not contain content that depicts explicit or graphic sexual acts (e.g., pornography, erotic nudity, depictions of rape or sexual assault).\n",
+        "\n",
+        "* **Violence/Gore**: The image shall not contain content that depicts shocking, sensational, or gratuitous violence (e.g., excessive blood and gore, gratuitous violence against animals, extreme injury or moment of death).\n",
+        "\n",
+        "This serves as a foundation, but users can provide customized safety policies as input to the model, allowing for fine-grained control and specific use-case requirements."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "t3aq-ToeAmRM"
+      },
+      "source": [
+        "# Supported Use Case\n",
+        "\n",
+        "**We recommend using `ShieldGemma 2` as an input filter to vision language models or as an output filter of image generation systems or both.**  ShieldGemma 2 offers the following key advantages:\n",
+        "\n",
+        "* **Policy-Aware Classification**: ShieldGemma 2 accepts both a user-defined safety policy and an image as input, providing classifications for both real and generated images, tailored to the specific policy guidelines.\n",
+        "* **Probability-Based Output and Thresholding**: ShieldGemma 2 outputs a probability score for its predictions, allowing downstream users to flexibly tune the classification threshold based on their specific use cases and risk tolerance. This enables a more nuanced and adaptable approach to safety classification.\n",
+        "\n",
+        "The input/output format are as follows:\n",
+        "* **Input**: Image + Prompt Instruction with policy definition\n",
+        "* **Output**: Probability of 'Yes'/'No' tokens, 'Yes' meaning that the image violated the specific policy. The higher the score, the higher the model's confidence that the image violates the specified policy."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "0WhRozADVJos"
+      },
+      "source": [
+        "# Usage example"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "K_XERopLUZhk"
+      },
+      "outputs": [],
+      "source": [
+        "! pip install -q 'transformers>=4.50.0'"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Qg-Hy0ffbwvE"
+      },
+      "outputs": [],
+      "source": [
+        "! huggingface-cli login"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "40Rm46Xt7wqW"
+      },
+      "outputs": [],
+      "source": [
+        "from transformers import AutoProcessor, AutoModelForImageClassification\n",
+        "import torch\n",
+        "\n",
+        "model_id = \"google/shieldgemma-2-4b-it\"\n",
+        "\n",
+        "processor = AutoProcessor.from_pretrained(model_id)\n",
+        "model = AutoModelForImageClassification.from_pretrained(model_id)\n",
+        "model.to(torch.device(\"cuda\"))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from PIL import Image\n",
+        "import requests\n",
+        "\n",
+        "# The image included in this Colab is benign and will results in the prediction\n",
+        "# of a `No` token for all policies, meanign the image does not violate any\n",
+        "# content policies. Change this URL or otherwise update this code to use an\n",
+        "# image that may be violative.\n",
+        "url = \"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg\"\n",
+        "image = Image.open(requests.get(url, stream=True).raw)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "AK1PrHnYz4fv"
+      },
+      "outputs": [],
+      "source": [
+        "inputs = processor(images=[image], return_tensors=\"pt\").to(torch.device(\"cuda\"))\n",
+        "\n",
+        "with torch.no_grad():\n",
+        "  scores = model(**inputs)\n",
+        "\n",
+        "# `scores` is a `ShieldGemma2ImageClassifierOutputWithNoAttention` instance\n",
+        "# continaing the logits and probabilities associated with the model predicting\n",
+        "# the `Yes` or `No` token as the response to the prompt batch, captured in the\n",
+        "# following properties.\n",
+        "#\n",
+        "#   *   `logits` (`torch.Tensor` of shape `(batch_size, 2)`): The first position\n",
+        "#       along dim=1 is the logits for the `Yes` token and the second position\n",
+        "#       along dim=1 is the logits for the `No` token.\n",
+        "#   *   `probabilities` (`torch.Tensor` of shape `(batch_size, 2)`): The first\n",
+        "#       position along dim=1 is the probability of predicting the `Yes` token\n",
+        "#       and the second position along dim=1 is the probability of predicting the\n",
+        "#       `No` token.\n",
+        "#\n",
+        "# When used with the `ShieldGemma2Processor`, the `batch_size` will be equal to\n",
+        "# `len(images) * len(policies)`, and the order within the batch will be\n",
+        "# img1_policy1, ... img1_policyN, ... imgM_policyN.\n",
+        "print(scores.logits)\n",
+        "print(scores.probabilities)\n",
+        "\n",
+        "# ShieldGemma prompts are constructed such that predicting the `Yes` token means\n",
+        "# the content does violate the policy. If you are only interested in the\n",
+        "# violative condition, use to extract that slice from the output tensors.\n",
+        "p_violated = scores.probabilities[:, 0]\n",
+        "print(p_violated)\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "gpuType": "A100",
+      "machine_shape": "hm",
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}