Notebook 2 finalise

init27 · init27 · commit 5d430e3e8f45 · 2024-10-24T10:33:55.000-07:00
diff --git a/recipes/quickstart/NotebookLlama/README.md b/recipes/quickstart/NotebookLlama/README.md
@@ -77,5 +77,4 @@ The speakers and the prompt for parler model were decided based on experimentati
 - https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing#scrollTo=NyYQ--3YksJY
 - https://replicate.com/suno-ai/bark?prediction=zh8j6yddxxrge0cjp9asgzd534
 - https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c
-- 
 
diff --git a/recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb b/recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb
@@ -1,5 +1,25 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "de42c49d",
+   "metadata": {},
+   "source": [
+    "## Notebook 2: Transcript Writer\n",
+    "\n",
+    "This notebook uses the `Llama-3.1-70B-Instruct` model to take the cleaned up text from previous notebook and convert it into a podcast transcript\n",
+    "\n",
+    "`SYSTEM_PROMPT` is used for setting the model context or profile for working on a task. Here we prompt it to be a great podcast transcript writer to assist with our task"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e576ea9",
+   "metadata": {},
+   "source": [
+    "Experimentation with the `SYSTEM_PROMPT` below  is encouraged, this worked best for the few examples the flow was tested with:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -35,6 +55,16 @@
     "\"\"\""
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "549aaccb",
+   "metadata": {},
+   "source": [
+    "For those of the readers that want to flex their money, please feel free to try using the 405B model here. \n",
+    "\n",
+    "For our GPU poor friends, you're encouraged to test with a smaller model as well. 8B should work well out of the box for this example:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 2,
@@ -45,6 +75,14 @@
     "MODEL = \"meta-llama/Llama-3.1-70B-Instruct\""
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "fadc7eda",
+   "metadata": {},
+   "source": [
+    "Import the necessary framework"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 3,
@@ -64,6 +102,16 @@
     "warnings.filterwarnings('ignore')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "7865ff7e",
+   "metadata": {},
+   "source": [
+    "Read in the file generated from earlier. \n",
+    "\n",
+    "The encoding details are to avoid issues with generic PDF(s) that might be ingested"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 4,
@@ -99,6 +147,14 @@
     "        return None"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "66093561",
+   "metadata": {},
+   "source": [
+    "Since we have defined the System role earlier, we can now pass the entire file as `INPUT_PROMPT` to the model and have it use that to generate the podcast"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 5,
@@ -109,6 +165,16 @@
     "INPUT_PROMPT = read_file_to_string('./clean_extracted_text.txt')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "9be8dd2c",
+   "metadata": {},
+   "source": [
+    "Hugging Face has a great `pipeline()` method which makes our life easy for generating text from LLMs. \n",
+    "\n",
+    "We will set the `temperature` to 1 to encourage creativity and `max_new_tokens` to 8126"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 6,
@@ -158,6 +224,14 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "6349e7f3",
+   "metadata": {},
+   "source": [
+    "This is awesome, we can now save and verify the output generated from the model before moving to the next notebook"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 7,
@@ -209,6 +283,14 @@
     "print(outputs[0][\"generated_text\"][-1]['content'])"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "1e1414fe",
+   "metadata": {},
+   "source": [
+    "Let's save the output as pickle file and continue further to Notebook 3"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 8,
@@ -226,7 +308,9 @@
    "id": "d9bab2f2-f539-435a-ae6a-3c9028489628",
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "#fin"
+   ]
   }
  ],
  "metadata": {