|
7 | 7 | "source": [
|
8 | 8 | "<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/Prompt_Engineering_with_Llama_3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
|
9 | 9 | "\n",
|
10 |
| - "# Prompt Engineering with Llama 3\n", |
| 10 | + "# Prompt Engineering with Llama 3.1\n", |
11 | 11 | "\n",
|
12 | 12 | "Prompt engineering is using natural language to produce a desired response from a large language model (LLM).\n",
|
13 | 13 | "\n",
|
14 |
| - "This interactive guide covers prompt engineering & best practices with Llama 3." |
| 14 | + "This interactive guide covers prompt engineering & best practices with Llama 3.1." |
15 | 15 | ]
|
16 | 16 | },
|
17 | 17 | {
|
|
45 | 45 | "\n",
|
46 | 46 | "Llama models come in varying parameter sizes. The smaller models are cheaper to deploy and run; the larger models are more capable.\n",
|
47 | 47 | "\n",
|
| 48 | + "#### Llama 3.1\n", |
| 49 | + "1. `llama-3.1-8b` - base pretrained 8 billion parameter model\n", |
| 50 | + "1. `llama-3.1-70b` - base pretrained 70 billion parameter model\n", |
| 51 | + "1. `llama-3.1-405b` - base pretrained 405 billion parameter model\n", |
| 52 | + "1. `llama-3.1-8b-instruct` - instruction fine-tuned 8 billion parameter model\n", |
| 53 | + "1. `llama-3.1-70b-instruct` - instruction fine-tuned 70 billion parameter model\n", |
| 54 | + "1. `llama-3.1-405b-instruct` - instruction fine-tuned 405 billion parameter model (flagship)\n", |
| 55 | + "\n", |
| 56 | + "\n", |
48 | 57 | "#### Llama 3\n",
|
49 | 58 | "1. `llama-3-8b` - base pretrained 8 billion parameter model\n",
|
50 | 59 | "1. `llama-3-70b` - base pretrained 70 billion parameter model\n",
|
|
133 | 142 | "\n",
|
134 | 143 | "Tokens matter most when you consider API pricing and internal behavior (ex. hyperparameters).\n",
|
135 | 144 | "\n",
|
136 |
| - "Each model has a maximum context length that your prompt cannot exceed. That's 8K tokens for Llama 3, 4K for Llama 2, and 100K for Code Llama. \n" |
| 145 | + "Each model has a maximum context length that your prompt cannot exceed. That's 128k tokens for Llama 3.1, 4K for Llama 2, and 100K for Code Llama.\n" |
137 | 146 | ]
|
138 | 147 | },
|
139 | 148 | {
|
|
143 | 152 | "source": [
|
144 | 153 | "## Notebook Setup\n",
|
145 | 154 | "\n",
|
146 |
| - "The following APIs will be used to call LLMs throughout the guide. As an example, we'll call Llama 3 chat using [Grok](https://console.groq.com/playground?model=llama3-70b-8192).\n", |
| 155 | + "The following APIs will be used to call LLMs throughout the guide. As an example, we'll call Llama 3.1 chat using [Grok](https://console.groq.com/playground?model=llama3-70b-8192).\n", |
147 | 156 | "\n",
|
148 | 157 | "To install prerequisites run:"
|
149 | 158 | ]
|
|
171 | 180 | "# Get a free API key from https://console.groq.com/keys\n",
|
172 | 181 | "os.environ[\"GROQ_API_KEY\"] = \"YOUR_GROQ_API_KEY\"\n",
|
173 | 182 | "\n",
|
174 |
| - "LLAMA3_70B_INSTRUCT = \"llama3-70b-8192\"\n", |
175 |
| - "LLAMA3_8B_INSTRUCT = \"llama3-8b-8192\"\n", |
| 183 | + "LLAMA3_405B_INSTRUCT = \"llama-3.1-405b-reasoning\" # Note: Groq currently only gives access here to paying customers for 405B model\n", |
| 184 | + "LLAMA3_70B_INSTRUCT = \"llama-3.1-70b-versatile\"\n", |
| 185 | + "LLAMA3_8B_INSTRUCT = \"llama3.1-8b-instant\"\n", |
176 | 186 | "\n",
|
177 | 187 | "DEFAULT_MODEL = LLAMA3_70B_INSTRUCT\n",
|
178 | 188 | "\n",
|
|
225 | 235 | "source": [
|
226 | 236 | "### Completion APIs\n",
|
227 | 237 | "\n",
|
228 |
| - "Let's try Llama 3!" |
| 238 | + "Let's try Llama 3.1!" |
229 | 239 | ]
|
230 | 240 | },
|
231 | 241 | {
|
|
488 | 498 | "\n",
|
489 | 499 | "Simply adding a phrase encouraging step-by-step thinking \"significantly improves the ability of large language models to perform complex reasoning\" ([Wei et al. (2022)](https://arxiv.org/abs/2201.11903)). This technique is called \"CoT\" or \"Chain-of-Thought\" prompting.\n",
|
490 | 500 | "\n",
|
491 |
| - "Llama 3 now reasons step-by-step naturally without the addition of the phrase. This section remains for completeness." |
| 501 | + "Llama 3.1 now reasons step-by-step naturally without the addition of the phrase. This section remains for completeness." |
492 | 502 | ]
|
493 | 503 | },
|
494 | 504 | {
|
|
704 | 714 | "source": [
|
705 | 715 | "### Limiting Extraneous Tokens\n",
|
706 | 716 | "\n",
|
707 |
| - "A common struggle with Llama 2 is getting output without extraneous tokens (ex. \"Sure! Here's more information on...\"), even if explicit instructions are given to Llama 2 to be concise and no preamble. Llama 3 can better follow instructions.\n", |
| 717 | + "A common struggle with Llama 2 is getting output without extraneous tokens (ex. \"Sure! Here's more information on...\"), even if explicit instructions are given to Llama 2 to be concise and no preamble. Llama 3.x can better follow instructions.\n", |
708 | 718 | "\n",
|
709 | 719 | "Check out this improvement that combines a role, rules and restrictions, explicit instructions, and an example:"
|
710 | 720 | ]
|
|
0 commit comments