Skip to content

Commit aab327c

Browse files
committed
Update prompt guide for Llama 3
Update content for Llama 3 Switch to Groq Use 70B by default as Groq provides fast enough inference
1 parent 14e4b05 commit aab327c

File tree

1 file changed

+75
-79
lines changed

1 file changed

+75
-79
lines changed

recipes/quickstart/Prompt_Engineering_with_Llama_2.ipynb renamed to recipes/quickstart/Prompt_Engineering_with_Llama_3.ipynb

Lines changed: 75 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@
55
"cell_type": "markdown",
66
"metadata": {},
77
"source": [
8-
"# Prompt Engineering with Llama 2\n",
8+
"# Prompt Engineering with Llama 3\n",
99
"\n",
1010
"Prompt engineering is using natural language to produce a desired response from a large language model (LLM).\n",
1111
"\n",
12-
"This interactive guide covers prompt engineering & best practices with Llama 2."
12+
"This interactive guide covers prompt engineering & best practices with Llama 3."
1313
]
1414
},
1515
{
@@ -41,7 +41,13 @@
4141
"\n",
4242
"In 2023, Meta introduced the [Llama language models](https://ai.meta.com/llama/) (Llama Chat, Code Llama, Llama Guard). These are general purpose, state-of-the-art LLMs.\n",
4343
"\n",
44-
"Llama 2 models come in 7 billion, 13 billion, and 70 billion parameter sizes. Smaller models are cheaper to deploy and run (see: deployment and performance); larger models are more capable.\n",
44+
"Llama models come in varying parameter sizes. The smaller models are cheaper to deploy and run; the larger models are more capable.\n",
45+
"\n",
46+
"#### Llama 3\n",
47+
"1. `llama-3-8b` - base pretrained 8 billion parameter model\n",
48+
"1. `llama-3-70b` - base pretrained 8 billion parameter model\n",
49+
"1. `llama-3-8b-instruct` - instruction fine-tuned 8 billion parameter model\n",
50+
"1. `llama-3-70b-instruct` - instruction fine-tuned 70 billion parameter model (flagship)\n",
4551
"\n",
4652
"#### Llama 2\n",
4753
"1. `llama-2-7b` - base pretrained 7 billion parameter model\n",
@@ -86,11 +92,11 @@
8692
"\n",
8793
"Large language models are deployed and accessed in a variety of ways, including:\n",
8894
"\n",
89-
"1. **Self-hosting**: Using local hardware to run inference. Ex. running Llama 2 on your Macbook Pro using [llama.cpp](https://github.com/ggerganov/llama.cpp).\n",
95+
"1. **Self-hosting**: Using local hardware to run inference. Ex. running Llama on your Macbook Pro using [llama.cpp](https://github.com/ggerganov/llama.cpp).\n",
9096
" * Best for privacy/security or if you already have a GPU.\n",
91-
"1. **Cloud hosting**: Using a cloud provider to deploy an instance that hosts a specific model. Ex. running Llama 2 on cloud providers like AWS, Azure, GCP, and others.\n",
97+
"1. **Cloud hosting**: Using a cloud provider to deploy an instance that hosts a specific model. Ex. running Llama on cloud providers like AWS, Azure, GCP, and others.\n",
9298
" * Best for customizing models and their runtime (ex. fine-tuning a model for your use case).\n",
93-
"1. **Hosted API**: Call LLMs directly via an API. There are many companies that provide Llama 2 inference APIs including AWS Bedrock, Replicate, Anyscale, Together and others.\n",
99+
"1. **Hosted API**: Call LLMs directly via an API. There are many companies that provide Llama inference APIs including AWS Bedrock, Replicate, Anyscale, Together and others.\n",
94100
" * Easiest option overall."
95101
]
96102
},
@@ -118,11 +124,11 @@
118124
"\n",
119125
"> Our destiny is written in the stars.\n",
120126
"\n",
121-
"...is tokenized into `[\"our\", \"dest\", \"iny\", \"is\", \"written\", \"in\", \"the\", \"stars\"]` for Llama 2.\n",
127+
"...is tokenized into `[\"Our\", \"destiny\", \"is\", \"written\", \"in\", \"the\", \"stars\", \".\"]` for Llama 3.\n",
122128
"\n",
123129
"Tokens matter most when you consider API pricing and internal behavior (ex. hyperparameters).\n",
124130
"\n",
125-
"Each model has a maximum context length that your prompt cannot exceed. That's 4096 tokens for Llama 2 and 100K for Code Llama. \n"
131+
"Each model has a maximum context length that your prompt cannot exceed. That's 8K tokens for Llama 3 and 100K for Code Llama. \n"
126132
]
127133
},
128134
{
@@ -132,7 +138,7 @@
132138
"source": [
133139
"## Notebook Setup\n",
134140
"\n",
135-
"The following APIs will be used to call LLMs throughout the guide. As an example, we'll call Llama 2 chat using [Replicate](https://replicate.com/meta/llama-2-70b-chat) and use LangChain to easily set up a chat completion API.\n",
141+
"The following APIs will be used to call LLMs throughout the guide. As an example, we'll call Llama 3 chat using [Grok](https://console.groq.com/playground?model=llama3-70b-8192).\n",
136142
"\n",
137143
"To install prerequisites run:"
138144
]
@@ -143,7 +149,8 @@
143149
"metadata": {},
144150
"outputs": [],
145151
"source": [
146-
"pip install langchain replicate"
152+
"import sys\n",
153+
"!{sys.executable} -m pip install groq"
147154
]
148155
},
149156
{
@@ -152,64 +159,54 @@
152159
"metadata": {},
153160
"outputs": [],
154161
"source": [
155-
"from typing import Dict, List\n",
156-
"from langchain.llms import Replicate\n",
157-
"from langchain.memory import ChatMessageHistory\n",
158-
"from langchain.schema.messages import get_buffer_string\n",
159162
"import os\n",
163+
"from typing import Dict, List\n",
164+
"from groq import Groq\n",
160165
"\n",
161-
"# Get a free API key from https://replicate.com/account/api-tokens\n",
162-
"os.environ[\"REPLICATE_API_TOKEN\"] = \"YOUR_KEY_HERE\"\n",
166+
"# Get a free API key from https://console.groq.com/keys\n",
167+
"# os.environ[\"GROQ_API_KEY\"] = \"YOUR_KEY_HERE\"\n",
163168
"\n",
164-
"LLAMA2_70B_CHAT = \"meta/llama-2-70b-chat:2d19859030ff705a87c746f7e96eea03aefb71f166725aee39692f1476566d48\"\n",
165-
"LLAMA2_13B_CHAT = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n",
169+
"LLAMA3_70B_INSTRUCT = \"llama3-70b-8192\"\n",
170+
"LLAMA3_8B_INSTRUCT = \"llama3-8b-8192\"\n",
166171
"\n",
167-
"# We'll default to the smaller 13B model for speed; change to LLAMA2_70B_CHAT for more advanced (but slower) generations\n",
168-
"DEFAULT_MODEL = LLAMA2_13B_CHAT\n",
172+
"DEFAULT_MODEL = LLAMA3_70B_INSTRUCT\n",
169173
"\n",
170-
"def completion(\n",
171-
" prompt: str,\n",
172-
" model: str = DEFAULT_MODEL,\n",
174+
"client = Groq()\n",
175+
"\n",
176+
"def assistant(content: str):\n",
177+
" return { \"role\": \"assistant\", \"content\": content }\n",
178+
"\n",
179+
"def user(content: str):\n",
180+
" return { \"role\": \"user\", \"content\": content }\n",
181+
"\n",
182+
"def chat_completion(\n",
183+
" messages: List[Dict],\n",
184+
" model = DEFAULT_MODEL,\n",
173185
" temperature: float = 0.6,\n",
174186
" top_p: float = 0.9,\n",
175187
") -> str:\n",
176-
" llm = Replicate(\n",
188+
" response = client.chat.completions.create(\n",
189+
" messages=messages,\n",
177190
" model=model,\n",
178-
" model_kwargs={\"temperature\": temperature,\"top_p\": top_p, \"max_new_tokens\": 1000}\n",
191+
" temperature=temperature,\n",
192+
" top_p=top_p,\n",
179193
" )\n",
180-
" return llm(prompt)\n",
194+
" return response.choices[0].message.content\n",
195+
" \n",
181196
"\n",
182-
"def chat_completion(\n",
183-
" messages: List[Dict],\n",
184-
" model = DEFAULT_MODEL,\n",
197+
"def completion(\n",
198+
" prompt: str,\n",
199+
" model: str = DEFAULT_MODEL,\n",
185200
" temperature: float = 0.6,\n",
186201
" top_p: float = 0.9,\n",
187202
") -> str:\n",
188-
" history = ChatMessageHistory()\n",
189-
" for message in messages:\n",
190-
" if message[\"role\"] == \"user\":\n",
191-
" history.add_user_message(message[\"content\"])\n",
192-
" elif message[\"role\"] == \"assistant\":\n",
193-
" history.add_ai_message(message[\"content\"])\n",
194-
" else:\n",
195-
" raise Exception(\"Unknown role\")\n",
196-
" return completion(\n",
197-
" get_buffer_string(\n",
198-
" history.messages,\n",
199-
" human_prefix=\"USER\",\n",
200-
" ai_prefix=\"ASSISTANT\",\n",
201-
" ),\n",
202-
" model,\n",
203-
" temperature,\n",
204-
" top_p,\n",
203+
" return chat_completion(\n",
204+
" [user(prompt)],\n",
205+
" model=model,\n",
206+
" temperature=temperature,\n",
207+
" top_p=top_p,\n",
205208
" )\n",
206209
"\n",
207-
"def assistant(content: str):\n",
208-
" return { \"role\": \"assistant\", \"content\": content }\n",
209-
"\n",
210-
"def user(content: str):\n",
211-
" return { \"role\": \"user\", \"content\": content }\n",
212-
"\n",
213210
"def complete_and_print(prompt: str, model: str = DEFAULT_MODEL):\n",
214211
" print(f'==============\\n{prompt}\\n==============')\n",
215212
" response = completion(prompt, model)\n",
@@ -223,7 +220,7 @@
223220
"source": [
224221
"### Completion APIs\n",
225222
"\n",
226-
"Llama 2 models tend to be wordy and explain their rationale. Later we'll explore how to manage the response length."
223+
"Let's try Llama 3!"
227224
]
228225
},
229226
{
@@ -345,7 +342,7 @@
345342
"cell_type": "markdown",
346343
"metadata": {},
347344
"source": [
348-
"You can think about giving explicit instructions as using rules and restrictions to how Llama 2 responds to your prompt.\n",
345+
"You can think about giving explicit instructions as using rules and restrictions to how Llama 3 responds to your prompt.\n",
349346
"\n",
350347
"- Stylization\n",
351348
" - `Explain this to me like a topic on a children's educational network show teaching elementary students.`\n",
@@ -387,9 +384,9 @@
387384
"\n",
388385
"#### Zero-Shot Prompting\n",
389386
"\n",
390-
"Large language models like Llama 2 are unique because they are capable of following instructions and producing responses without having previously seen an example of a task. Prompting without examples is called \"zero-shot prompting\".\n",
387+
"Large language models like Llama 3 are unique because they are capable of following instructions and producing responses without having previously seen an example of a task. Prompting without examples is called \"zero-shot prompting\".\n",
391388
"\n",
392-
"Let's try using Llama 2 as a sentiment detector. You may notice that output format varies - we can improve this with better prompting."
389+
"Let's try using Llama 3 as a sentiment detector. You may notice that output format varies - we can improve this with better prompting."
393390
]
394391
},
395392
{
@@ -459,9 +456,9 @@
459456
"source": [
460457
"### Role Prompting\n",
461458
"\n",
462-
"Llama 2 will often give more consistent responses when given a role ([Kong et al. (2023)](https://browse.arxiv.org/pdf/2308.07702.pdf)). Roles give context to the LLM on what type of answers are desired.\n",
459+
"Llama will often give more consistent responses when given a role ([Kong et al. (2023)](https://browse.arxiv.org/pdf/2308.07702.pdf)). Roles give context to the LLM on what type of answers are desired.\n",
463460
"\n",
464-
"Let's use Llama 2 to create a more focused, technical response for a question around the pros and cons of using PyTorch."
461+
"Let's use Llama 3 to create a more focused, technical response for a question around the pros and cons of using PyTorch."
465462
]
466463
},
467464
{
@@ -484,7 +481,9 @@
484481
"source": [
485482
"### Chain-of-Thought\n",
486483
"\n",
487-
"Simply adding a phrase encouraging step-by-step thinking \"significantly improves the ability of large language models to perform complex reasoning\" ([Wei et al. (2022)](https://arxiv.org/abs/2201.11903)). This technique is called \"CoT\" or \"Chain-of-Thought\" prompting:"
484+
"Simply adding a phrase encouraging step-by-step thinking \"significantly improves the ability of large language models to perform complex reasoning\" ([Wei et al. (2022)](https://arxiv.org/abs/2201.11903)). This technique is called \"CoT\" or \"Chain-of-Thought\" prompting.\n",
485+
"\n",
486+
"Llama 3 now reasons step-by-step naturally without the addition of the phrase. This section remains for completeness."
488487
]
489488
},
490489
{
@@ -493,10 +492,12 @@
493492
"metadata": {},
494493
"outputs": [],
495494
"source": [
496-
"complete_and_print(\"Who lived longer Elvis Presley or Mozart?\")\n",
497-
"# Often gives incorrect answer of \"Mozart\"\n",
495+
"prompt = \"Who lived longer, Mozart or Elvis?\"\n",
496+
"\n",
497+
"complete_and_print(prompt)\n",
498+
"# Llama 2 would often give the incorrect answer of \"Mozart\"\n",
498499
"\n",
499-
"complete_and_print(\"Who lived longer Elvis Presley or Mozart? Let's think through this carefully, step by step.\")\n",
500+
"complete_and_print(f\"{prompt} Let's think through this carefully, step by step.\")\n",
500501
"# Gives the correct answer \"Elvis\""
501502
]
502503
},
@@ -523,10 +524,9 @@
523524
" response = completion(\n",
524525
" \"John found that the average of 15 numbers is 40.\"\n",
525526
" \"If 10 is added to each number then the mean of the numbers is?\"\n",
526-
" \"Report the answer surrounded by three backticks, for example: ```123```\",\n",
527-
" model = LLAMA2_70B_CHAT\n",
527+
" \"Report the answer surrounded by backticks (example: `123`)\",\n",
528528
" )\n",
529-
" match = re.search(r'```(\\d+)```', response)\n",
529+
" match = re.search(r'`(\\d+)`', response)\n",
530530
" if match is None:\n",
531531
" return None\n",
532532
" return match.group(1)\n",
@@ -538,10 +538,10 @@
538538
" f\"Final answer: {mode(answers)}\",\n",
539539
" )\n",
540540
"\n",
541-
"# Sample runs of Llama-2-70B (all correct):\n",
542-
"# [50, 50, 750, 50, 50] -> 50\n",
543-
"# [130, 10, 750, 50, 50] -> 50\n",
544-
"# [50, None, 10, 50, 50] -> 50"
541+
"# Sample runs of Llama-3-70B (all correct):\n",
542+
"# ['60', '50', '50', '50', '50'] -> 50\n",
543+
"# ['50', '50', '50', '60', '50'] -> 50\n",
544+
"# ['50', '50', '60', '50', '50'] -> 50"
545545
]
546546
},
547547
{
@@ -560,7 +560,7 @@
560560
"metadata": {},
561561
"outputs": [],
562562
"source": [
563-
"complete_and_print(\"What is the capital of the California?\", model = LLAMA2_70B_CHAT)\n",
563+
"complete_and_print(\"What is the capital of the California?\")\n",
564564
"# Gives the correct answer \"Sacramento\""
565565
]
566566
},
@@ -677,7 +677,6 @@
677677
" \"\"\"\n",
678678
" # Python code to calculate: ((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))\n",
679679
" \"\"\",\n",
680-
" model=\"meta/codellama-34b:67942fd0f55b66da802218a19a8f0e1d73095473674061a6ea19f2dc8c053152\"\n",
681680
")"
682681
]
683682
},
@@ -687,12 +686,10 @@
687686
"metadata": {},
688687
"outputs": [],
689688
"source": [
690-
"# The following code was generated by Code Llama 34B:\n",
689+
"# The following code was generated by Llama 3 70B:\n",
691690
"\n",
692-
"num1 = (-5 + 93 * 4 - 0)\n",
693-
"num2 = (4**4 + -7 + 0 * 5)\n",
694-
"answer = num1 * num2\n",
695-
"print(answer)"
691+
"result = ((-5 + 93 * 4 - 0) * (4**4 - 7 + 0 * 5))\n",
692+
"print(result)"
696693
]
697694
},
698695
{
@@ -715,7 +712,6 @@
715712
"source": [
716713
"complete_and_print(\n",
717714
" \"Give me the zip code for Menlo Park in JSON format with the field 'zip_code'\",\n",
718-
" model = LLAMA2_70B_CHAT,\n",
719715
")\n",
720716
"# Likely returns the JSON and also \"Sure! Here's the JSON...\"\n",
721717
"\n",
@@ -726,7 +722,6 @@
726722
" Example question: What is the zip code of the Empire State Building? Example answer: {'zip_code': 10118}\n",
727723
" Now here is my question: What is the zip code of Menlo Park?\n",
728724
" \"\"\",\n",
729-
" model = LLAMA2_70B_CHAT,\n",
730725
")\n",
731726
"# \"{'zip_code': 94025}\""
732727
]
@@ -770,7 +765,8 @@
770765
"mimetype": "text/x-python",
771766
"name": "python",
772767
"nbconvert_exporter": "python",
773-
"pygments_lexer": "ipython3"
768+
"pygments_lexer": "ipython3",
769+
"version": "3.12.3"
774770
},
775771
"last_base_url": "https://bento.edge.x2p.facebook.net/",
776772
"last_kernel_id": "161e2a7b-2d2b-4995-87f3-d1539860ecac",

0 commit comments

Comments
 (0)