PrunaAI
diff --git a/‎docs/tutorials/computer_vision.ipynb‎
Lines changed: 177 additions & 0 deletions b/‎docs/tutorials/computer_vision.ipynb‎
Lines changed: 177 additions & 0 deletions
diff --git a/‎docs/tutorials/index.rst‎
Lines changed: 32 additions & 0 deletions b/‎docs/tutorials/index.rst‎
Lines changed: 32 additions & 0 deletions
@@ -0,0 +1,177 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Blazingly fast Computer Vision Models"
+      ]
+    },
+    {
+      "cell_type": "raw",
+      "metadata": {
+        "vscode": {
+          "languageId": "raw"
+        }
+      },
+      "source": [
+        "<a target=\"_blank\" href=\"https://colab.research.google.com/github/PrunaAI/pruna/blob/v|version|/docs/tutorials/computer_vision.ipynb\">\n",
+        "    <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
+        "</a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "This tutorial demonstrates how to use the `pruna` package to optimize any custom computer vision model. We will use the `vit_b_16` model as an example. Any execution times given below are measured on a T4 GPU."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### 1. Loading the CV Model\n",
+        "\n",
+        "First, load your ViT model.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "import torchvision\n",
+        "\n",
+        "model = torchvision.models.vit_b_16(weights=\"ViT_B_16_Weights.DEFAULT\").cuda()"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### 2. Initializing the Smash Config\n",
+        "\n",
+        "Next, initialize the smash_config."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "from pruna import SmashConfig\n",
+        "\n",
+        "# Initialize the SmashConfig\n",
+        "smash_config = SmashConfig([\"x_fast\"])"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### 3. Smashing the Model\n",
+        "\n",
+        "Now, you can smash the model, which will take around 5 seconds."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "from pruna import smash\n",
+        "\n",
+        "# Smash the model\n",
+        "smashed_model = smash(\n",
+        "    model=model,\n",
+        "    smash_config=smash_config,\n",
+        ")"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### 4. Preparing the Input"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "import numpy as np\n",
+        "from torchvision import transforms\n",
+        "\n",
+        "# Generating a random image\n",
+        "image = np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8)\n",
+        "input_tensor = transforms.ToTensor()(image).unsqueeze(0).cuda()"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### 5. Running the Model\n",
+        "\n",
+        "After the model has been compiled, we run inference for a few iterations as warm-up. This will take around 8 seconds."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "# run some warm-up iterations\n",
+        "for _ in range(5):\n",
+        "  smashed_model(input_tensor)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Finally, run the model with accelerated inference."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "# Display the result\n",
+        "smashed_model(input_tensor)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Wrap Up\n",
+        "\n",
+        "Congratulations! You have successfully smashed a CV model. You can now use the `pruna` package to optimize any custom CV model. The only parts that you should modify are step 1, 4 and 5 to fit your use case"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "pruna",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.10.15"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 2
+}
@@ -75,6 +75,7 @@ These tutorials will guide you through the process of using |pruna| to optimize
       :link: ./sd_deepcache.ipynb
 
       Optimize your ``diffusion`` model with ``deepcache`` ``caching``.
+
    .. grid-item-card:: Optimize and Deploy Sana diffusers with Pruna and Hugging Face
       :text-align: center
       :link: ./deploying_sana_tutorial.ipynb
@@ -87,10 +88,41 @@ These tutorials will guide you through the process of using |pruna| to optimize
 
       Learn how to use the ``target_modules`` parameter to target specific modules in your model.
 
+   .. grid-item-card:: Blazingly Fast Computer Vision
+      :text-align: center
+      :link: ./computer_vision.ipynb
+
+      Optimize any ``computer vision`` model with ``x_fast`` ``compilation``.
+
+   .. grid-item-card:: Recover Quality after Quantization
+      :text-align: center
+      :link: ./recovery.ipynb
+
+      Recover quality using ``text_to_image_perp`` after ``diffusers_int8`` ``quantization``.
+
+   .. grid-item-card:: Distribute across GPUs with Ring Attention
+      :text-align: center
+      :link: ./ring_attn.ipynb
+
+      Distribute your ``Flux`` model across multiple GPUs with ``ring_attn`` and ``torch_compile``.
+
+   .. grid-item-card:: Reducing Warm-up Time for Compilation
+      :text-align: center
+      :link: ./portable_compilation.ipynb
+
+      Reduce warm-up time significantly when re-loading a ``torch_compile`` compiled model on a new machine.
+
+   .. grid-item-card:: Quantize and Speedup any LLM
+      :text-align: center
+      :link: ./llm_quantization_compilation_acceleration.ipynb
+
+      Optimize latency and memory footprint of any LLM with ``hqq`` ``quantization`` and ``torch_compile`` ``compilation``.
+
 .. toctree::
    :hidden:
    :maxdepth: 1
    :caption: Pruna
    :glob:
 
    ./*
+