|
1 | 1 | {
|
2 | 2 | "cells": [
|
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "de42c49d", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "## Notebook 2: Transcript Writer\n", |
| 9 | + "\n", |
| 10 | + "This notebook uses the `Llama-3.1-70B-Instruct` model to take the cleaned up text from previous notebook and convert it into a podcast transcript\n", |
| 11 | + "\n", |
| 12 | + "`SYSTEM_PROMPT` is used for setting the model context or profile for working on a task. Here we prompt it to be a great podcast transcript writer to assist with our task" |
| 13 | + ] |
| 14 | + }, |
| 15 | + { |
| 16 | + "cell_type": "markdown", |
| 17 | + "id": "2e576ea9", |
| 18 | + "metadata": {}, |
| 19 | + "source": [ |
| 20 | + "Experimentation with the `SYSTEM_PROMPT` below is encouraged, this worked best for the few examples the flow was tested with:" |
| 21 | + ] |
| 22 | + }, |
3 | 23 | {
|
4 | 24 | "cell_type": "code",
|
5 | 25 | "execution_count": 1,
|
|
35 | 55 | "\"\"\""
|
36 | 56 | ]
|
37 | 57 | },
|
| 58 | + { |
| 59 | + "cell_type": "markdown", |
| 60 | + "id": "549aaccb", |
| 61 | + "metadata": {}, |
| 62 | + "source": [ |
| 63 | + "For those of the readers that want to flex their money, please feel free to try using the 405B model here. \n", |
| 64 | + "\n", |
| 65 | + "For our GPU poor friends, you're encouraged to test with a smaller model as well. 8B should work well out of the box for this example:" |
| 66 | + ] |
| 67 | + }, |
38 | 68 | {
|
39 | 69 | "cell_type": "code",
|
40 | 70 | "execution_count": 2,
|
|
45 | 75 | "MODEL = \"meta-llama/Llama-3.1-70B-Instruct\""
|
46 | 76 | ]
|
47 | 77 | },
|
| 78 | + { |
| 79 | + "cell_type": "markdown", |
| 80 | + "id": "fadc7eda", |
| 81 | + "metadata": {}, |
| 82 | + "source": [ |
| 83 | + "Import the necessary framework" |
| 84 | + ] |
| 85 | + }, |
48 | 86 | {
|
49 | 87 | "cell_type": "code",
|
50 | 88 | "execution_count": 3,
|
|
64 | 102 | "warnings.filterwarnings('ignore')"
|
65 | 103 | ]
|
66 | 104 | },
|
| 105 | + { |
| 106 | + "cell_type": "markdown", |
| 107 | + "id": "7865ff7e", |
| 108 | + "metadata": {}, |
| 109 | + "source": [ |
| 110 | + "Read in the file generated from earlier. \n", |
| 111 | + "\n", |
| 112 | + "The encoding details are to avoid issues with generic PDF(s) that might be ingested" |
| 113 | + ] |
| 114 | + }, |
67 | 115 | {
|
68 | 116 | "cell_type": "code",
|
69 | 117 | "execution_count": 4,
|
|
99 | 147 | " return None"
|
100 | 148 | ]
|
101 | 149 | },
|
| 150 | + { |
| 151 | + "cell_type": "markdown", |
| 152 | + "id": "66093561", |
| 153 | + "metadata": {}, |
| 154 | + "source": [ |
| 155 | + "Since we have defined the System role earlier, we can now pass the entire file as `INPUT_PROMPT` to the model and have it use that to generate the podcast" |
| 156 | + ] |
| 157 | + }, |
102 | 158 | {
|
103 | 159 | "cell_type": "code",
|
104 | 160 | "execution_count": 5,
|
|
109 | 165 | "INPUT_PROMPT = read_file_to_string('./clean_extracted_text.txt')"
|
110 | 166 | ]
|
111 | 167 | },
|
| 168 | + { |
| 169 | + "cell_type": "markdown", |
| 170 | + "id": "9be8dd2c", |
| 171 | + "metadata": {}, |
| 172 | + "source": [ |
| 173 | + "Hugging Face has a great `pipeline()` method which makes our life easy for generating text from LLMs. \n", |
| 174 | + "\n", |
| 175 | + "We will set the `temperature` to 1 to encourage creativity and `max_new_tokens` to 8126" |
| 176 | + ] |
| 177 | + }, |
112 | 178 | {
|
113 | 179 | "cell_type": "code",
|
114 | 180 | "execution_count": 6,
|
|
158 | 224 | ")"
|
159 | 225 | ]
|
160 | 226 | },
|
| 227 | + { |
| 228 | + "cell_type": "markdown", |
| 229 | + "id": "6349e7f3", |
| 230 | + "metadata": {}, |
| 231 | + "source": [ |
| 232 | + "This is awesome, we can now save and verify the output generated from the model before moving to the next notebook" |
| 233 | + ] |
| 234 | + }, |
161 | 235 | {
|
162 | 236 | "cell_type": "code",
|
163 | 237 | "execution_count": 7,
|
|
209 | 283 | "print(outputs[0][\"generated_text\"][-1]['content'])"
|
210 | 284 | ]
|
211 | 285 | },
|
| 286 | + { |
| 287 | + "cell_type": "markdown", |
| 288 | + "id": "1e1414fe", |
| 289 | + "metadata": {}, |
| 290 | + "source": [ |
| 291 | + "Let's save the output as pickle file and continue further to Notebook 3" |
| 292 | + ] |
| 293 | + }, |
212 | 294 | {
|
213 | 295 | "cell_type": "code",
|
214 | 296 | "execution_count": 8,
|
|
226 | 308 | "id": "d9bab2f2-f539-435a-ae6a-3c9028489628",
|
227 | 309 | "metadata": {},
|
228 | 310 | "outputs": [],
|
229 |
| - "source": [] |
| 311 | + "source": [ |
| 312 | + "#fin" |
| 313 | + ] |
230 | 314 | }
|
231 | 315 | ],
|
232 | 316 | "metadata": {
|
|
0 commit comments