Add Sesame_CSM_1B_TTS notebook

Dhivya-Bharathy · Dhivya-Bharathy · commit 390ddeb476ef · 2025-06-18T07:37:58.000Z
diff --git a/examples/cookbooks/Sesame_CSM_1B_TTS.ipynb b/examples/cookbooks/Sesame_CSM_1B_TTS.ipynb
@@ -0,0 +1,146 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "id": "a050dfc0",
+      "metadata": {
+        "id": "a050dfc0"
+      },
+      "source": [
+        "# Sesame CSM (1B) - Text to Speech"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "7c2c61ec",
+      "metadata": {
+        "id": "7c2c61ec"
+      },
+      "source": [
+        "This notebook showcases how to generate speech from text using the Sesame CSM 1B model. This is ideal for converting instructional or conversational content into natural audio output."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/Sesame_CSM_1B_TTS.ipynb)\n"
+      ],
+      "metadata": {
+        "id": "5SA94wTb9AH-"
+      },
+      "id": "5SA94wTb9AH-"
+    },
+    {
+      "cell_type": "markdown",
+      "id": "ab27fd8d",
+      "metadata": {
+        "id": "ab27fd8d"
+      },
+      "source": [
+        "## 🧩 Dependencies"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "ac3a627f",
+      "metadata": {
+        "id": "ac3a627f"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install -q transformers\n",
+        "!pip install -q torchaudio\n",
+        "!pip install -q soundfile"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "a382e4d0",
+      "metadata": {
+        "id": "a382e4d0"
+      },
+      "source": [
+        "## 🛠️ Tools\n",
+        "- transformers for loading the model\n",
+        "- torchaudio for audio processing\n",
+        "- soundfile for playback"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "7ff6eb6d",
+      "metadata": {
+        "id": "7ff6eb6d"
+      },
+      "source": [
+        "## 🧾 YAML Prompt\n",
+        "```yaml\n",
+        "task: \"Text to Speech\"\n",
+        "style: \"Clear, educational\"\n",
+        "language: \"en\"\n",
+        "```"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "0ab9cda8",
+      "metadata": {
+        "id": "0ab9cda8"
+      },
+      "source": [
+        "## 🧠 Main"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "6459f8bd",
+      "metadata": {
+        "id": "6459f8bd"
+      },
+      "outputs": [],
+      "source": [
+        "from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq\n",
+        "import torchaudio\n",
+        "import soundfile as sf\n",
+        "import torch\n",
+        "\n",
+        "processor = AutoProcessor.from_pretrained(\"m-a-p/Sesame-CM\")\n",
+        "model = AutoModelForSpeechSeq2Seq.from_pretrained(\"m-a-p/Sesame-CM\")\n",
+        "model = model.to(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+        "\n",
+        "text = \"Welcome to the world of voice synthesis using open models!\"\n",
+        "inputs = processor(text, return_tensors=\"pt\").to(model.device)\n",
+        "with torch.no_grad():\n",
+        "    outputs = model.generate(**inputs)\n",
+        "speech = processor.batch_decode(outputs, return_tensors=\"pt\")[0]\n",
+        "\n",
+        "sf.write(\"output.wav\", speech.numpy(), 16000)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "376fe112",
+      "metadata": {
+        "id": "376fe112"
+      },
+      "source": [
+        "## 📤 Output\n",
+        "🖼️ Output Preview (Text Summary):\n",
+        "\n",
+        "Prompt: A clear educational text is converted to a .wav file.\n",
+        "\n",
+        "🎧 The output audio will say: 'Welcome to the world of voice synthesis using open models!'\n",
+        "This demonstrates how Sesame-CM can be used for building TTS applications easily."
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": []
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}