|
1 | 1 | {
|
2 | 2 | "cells": [
|
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "d0b5beda", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "## Notebook 3: Transcript Re-writer\n", |
| 9 | + "\n", |
| 10 | + "In the previouse notebook, we got a great podcast transcript using the raw file we have uploaded earlier. \n", |
| 11 | + "\n", |
| 12 | + "In this one, we will use `Llama-3.1-8B-Instruct` model to re-write the output from previous pipeline and make it more dramatic or realistic." |
| 13 | + ] |
| 14 | + }, |
| 15 | + { |
| 16 | + "cell_type": "markdown", |
| 17 | + "id": "fdc3d32a", |
| 18 | + "metadata": {}, |
| 19 | + "source": [ |
| 20 | + "We will again set the `SYSTEM_PROMPT` and remind the model of its task. \n", |
| 21 | + "\n", |
| 22 | + "Note: We can even prompt the model like so to encourage creativity:\n", |
| 23 | + "\n", |
| 24 | + "> Your job is to use the podcast transcript written below to re-write it for an AI Text-To-Speech Pipeline. A very dumb AI had written this so you have to step up for your kind.\n" |
| 25 | + ] |
| 26 | + }, |
| 27 | + { |
| 28 | + "cell_type": "markdown", |
| 29 | + "id": "c32c0d85", |
| 30 | + "metadata": {}, |
| 31 | + "source": [ |
| 32 | + "Note: We will prompt the model to return a list of Tuples to make our life easy in the next stage of using these for Text To Speech Generation" |
| 33 | + ] |
| 34 | + }, |
3 | 35 | {
|
4 | 36 | "cell_type": "code",
|
5 | 37 | "execution_count": 1,
|
|
51 | 83 | "\"\"\""
|
52 | 84 | ]
|
53 | 85 | },
|
| 86 | + { |
| 87 | + "cell_type": "markdown", |
| 88 | + "id": "8ee70bee", |
| 89 | + "metadata": {}, |
| 90 | + "source": [ |
| 91 | + "This time we will use the smaller 8B model" |
| 92 | + ] |
| 93 | + }, |
54 | 94 | {
|
55 | 95 | "cell_type": "code",
|
56 | 96 | "execution_count": 2,
|
|
61 | 101 | "MODEL = \"meta-llama/Llama-3.1-8B-Instruct\""
|
62 | 102 | ]
|
63 | 103 | },
|
| 104 | + { |
| 105 | + "cell_type": "markdown", |
| 106 | + "id": "f7bc794b", |
| 107 | + "metadata": {}, |
| 108 | + "source": [ |
| 109 | + "Let's import the necessary libraries" |
| 110 | + ] |
| 111 | + }, |
64 | 112 | {
|
65 | 113 | "cell_type": "code",
|
66 | 114 | "execution_count": 3,
|
|
79 | 127 | "warnings.filterwarnings('ignore')"
|
80 | 128 | ]
|
81 | 129 | },
|
| 130 | + { |
| 131 | + "cell_type": "markdown", |
| 132 | + "id": "8020c39c", |
| 133 | + "metadata": {}, |
| 134 | + "source": [ |
| 135 | + "We will load in the pickle file saved from previous notebook\n", |
| 136 | + "\n", |
| 137 | + "This time the `INPUT_PROMPT` to the model will be the output from the previous stage" |
| 138 | + ] |
| 139 | + }, |
82 | 140 | {
|
83 | 141 | "cell_type": "code",
|
84 | 142 | "execution_count": 4,
|
|
92 | 150 | " INPUT_PROMPT = pickle.load(file)"
|
93 | 151 | ]
|
94 | 152 | },
|
| 153 | + { |
| 154 | + "cell_type": "markdown", |
| 155 | + "id": "c4461926", |
| 156 | + "metadata": {}, |
| 157 | + "source": [ |
| 158 | + "We can again use Hugging Face `pipeline` method to generate text from the model" |
| 159 | + ] |
| 160 | + }, |
95 | 161 | {
|
96 | 162 | "cell_type": "code",
|
97 | 163 | "execution_count": null,
|
|
140 | 206 | ")"
|
141 | 207 | ]
|
142 | 208 | },
|
| 209 | + { |
| 210 | + "cell_type": "markdown", |
| 211 | + "id": "612a27e0", |
| 212 | + "metadata": {}, |
| 213 | + "source": [ |
| 214 | + "We can verify the output from the model" |
| 215 | + ] |
| 216 | + }, |
143 | 217 | {
|
144 | 218 | "cell_type": "code",
|
145 | 219 | "execution_count": null,
|
|
160 | 234 | "save_string_pkl = outputs[0][\"generated_text\"][-1]['content']"
|
161 | 235 | ]
|
162 | 236 | },
|
| 237 | + { |
| 238 | + "cell_type": "markdown", |
| 239 | + "id": "d495a957", |
| 240 | + "metadata": {}, |
| 241 | + "source": [ |
| 242 | + "Let's save the output as a pickle file to be used in Notebook 4" |
| 243 | + ] |
| 244 | + }, |
163 | 245 | {
|
164 | 246 | "cell_type": "code",
|
165 | 247 | "execution_count": null,
|
|
0 commit comments