Skip to content

Commit 5d430e3

Browse files
committed
Notebook 2 finalise
1 parent 75cd0f4 commit 5d430e3

File tree

2 files changed

+85
-2
lines changed

2 files changed

+85
-2
lines changed

recipes/quickstart/NotebookLlama/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,5 +77,4 @@ The speakers and the prompt for parler model were decided based on experimentati
7777
- https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing#scrollTo=NyYQ--3YksJY
7878
- https://replicate.com/suno-ai/bark?prediction=zh8j6yddxxrge0cjp9asgzd534
7979
- https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c
80-
-
8180

recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb

Lines changed: 85 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "de42c49d",
6+
"metadata": {},
7+
"source": [
8+
"## Notebook 2: Transcript Writer\n",
9+
"\n",
10+
"This notebook uses the `Llama-3.1-70B-Instruct` model to take the cleaned up text from previous notebook and convert it into a podcast transcript\n",
11+
"\n",
12+
"`SYSTEM_PROMPT` is used for setting the model context or profile for working on a task. Here we prompt it to be a great podcast transcript writer to assist with our task"
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"id": "2e576ea9",
18+
"metadata": {},
19+
"source": [
20+
"Experimentation with the `SYSTEM_PROMPT` below is encouraged, this worked best for the few examples the flow was tested with:"
21+
]
22+
},
323
{
424
"cell_type": "code",
525
"execution_count": 1,
@@ -35,6 +55,16 @@
3555
"\"\"\""
3656
]
3757
},
58+
{
59+
"cell_type": "markdown",
60+
"id": "549aaccb",
61+
"metadata": {},
62+
"source": [
63+
"For those of the readers that want to flex their money, please feel free to try using the 405B model here. \n",
64+
"\n",
65+
"For our GPU poor friends, you're encouraged to test with a smaller model as well. 8B should work well out of the box for this example:"
66+
]
67+
},
3868
{
3969
"cell_type": "code",
4070
"execution_count": 2,
@@ -45,6 +75,14 @@
4575
"MODEL = \"meta-llama/Llama-3.1-70B-Instruct\""
4676
]
4777
},
78+
{
79+
"cell_type": "markdown",
80+
"id": "fadc7eda",
81+
"metadata": {},
82+
"source": [
83+
"Import the necessary framework"
84+
]
85+
},
4886
{
4987
"cell_type": "code",
5088
"execution_count": 3,
@@ -64,6 +102,16 @@
64102
"warnings.filterwarnings('ignore')"
65103
]
66104
},
105+
{
106+
"cell_type": "markdown",
107+
"id": "7865ff7e",
108+
"metadata": {},
109+
"source": [
110+
"Read in the file generated from earlier. \n",
111+
"\n",
112+
"The encoding details are to avoid issues with generic PDF(s) that might be ingested"
113+
]
114+
},
67115
{
68116
"cell_type": "code",
69117
"execution_count": 4,
@@ -99,6 +147,14 @@
99147
" return None"
100148
]
101149
},
150+
{
151+
"cell_type": "markdown",
152+
"id": "66093561",
153+
"metadata": {},
154+
"source": [
155+
"Since we have defined the System role earlier, we can now pass the entire file as `INPUT_PROMPT` to the model and have it use that to generate the podcast"
156+
]
157+
},
102158
{
103159
"cell_type": "code",
104160
"execution_count": 5,
@@ -109,6 +165,16 @@
109165
"INPUT_PROMPT = read_file_to_string('./clean_extracted_text.txt')"
110166
]
111167
},
168+
{
169+
"cell_type": "markdown",
170+
"id": "9be8dd2c",
171+
"metadata": {},
172+
"source": [
173+
"Hugging Face has a great `pipeline()` method which makes our life easy for generating text from LLMs. \n",
174+
"\n",
175+
"We will set the `temperature` to 1 to encourage creativity and `max_new_tokens` to 8126"
176+
]
177+
},
112178
{
113179
"cell_type": "code",
114180
"execution_count": 6,
@@ -158,6 +224,14 @@
158224
")"
159225
]
160226
},
227+
{
228+
"cell_type": "markdown",
229+
"id": "6349e7f3",
230+
"metadata": {},
231+
"source": [
232+
"This is awesome, we can now save and verify the output generated from the model before moving to the next notebook"
233+
]
234+
},
161235
{
162236
"cell_type": "code",
163237
"execution_count": 7,
@@ -209,6 +283,14 @@
209283
"print(outputs[0][\"generated_text\"][-1]['content'])"
210284
]
211285
},
286+
{
287+
"cell_type": "markdown",
288+
"id": "1e1414fe",
289+
"metadata": {},
290+
"source": [
291+
"Let's save the output as pickle file and continue further to Notebook 3"
292+
]
293+
},
212294
{
213295
"cell_type": "code",
214296
"execution_count": 8,
@@ -226,7 +308,9 @@
226308
"id": "d9bab2f2-f539-435a-ae6a-3c9028489628",
227309
"metadata": {},
228310
"outputs": [],
229-
"source": []
311+
"source": [
312+
"#fin"
313+
]
230314
}
231315
],
232316
"metadata": {

0 commit comments

Comments
 (0)