|
6 | 6 | "id": "LERqQn5v8-ak"
|
7 | 7 | },
|
8 | 8 | "source": [
|
9 |
| - "# **Getting to know Llama 2: Everything you need to start building**\n", |
10 |
| - "Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects." |
| 9 | + "# **Getting to know Llama 3: Everything you need to start building**\n", |
| 10 | + "Our goal in this session is to provide a guided tour of Llama 3, including understanding different Llama 3 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 3 projects." |
| 11 | + ] |
| 12 | + }, |
| 13 | + { |
| 14 | + "cell_type": "markdown", |
| 15 | + "metadata": { |
| 16 | + "id": "h3YGMDJidHtH" |
| 17 | + }, |
| 18 | + "source": [ |
| 19 | + "### **Install dependencies**" |
| 20 | + ] |
| 21 | + }, |
| 22 | + { |
| 23 | + "cell_type": "code", |
| 24 | + "execution_count": null, |
| 25 | + "metadata": { |
| 26 | + "id": "VhN6hXwx7FCp" |
| 27 | + }, |
| 28 | + "outputs": [], |
| 29 | + "source": [ |
| 30 | + "# Install dependencies and initialize\n", |
| 31 | + "%pip install \\\n", |
| 32 | + " langchain==0.1.19 \\\n", |
| 33 | + " matplotlib \\\n", |
| 34 | + " octoai-sdk==0.10.1 \\\n", |
| 35 | + " openai \\\n", |
| 36 | + " sentence_transformers \\\n", |
| 37 | + " pdf2image \\\n", |
| 38 | + " pdfminer \\\n", |
| 39 | + " pdfminer.six \\\n", |
| 40 | + " unstructured \\\n", |
| 41 | + " faiss-cpu \\\n", |
| 42 | + " pillow-heif \\\n", |
| 43 | + " opencv-python \\\n", |
| 44 | + " unstructured-inference \\\n", |
| 45 | + " pikepdf" |
11 | 46 | ]
|
12 | 47 | },
|
13 | 48 | {
|
|
58 | 93 | " A[Users] --> B(Applications e.g. mobile, web)\n",
|
59 | 94 | " B --> |Hosted API|C(Platforms e.g. Custom, OctoAI, HuggingFace, Replicate)\n",
|
60 | 95 | " B -- optional --> E(Frameworks e.g. LangChain)\n",
|
61 |
| - " C-->|User Input|D[Llama 2]\n", |
| 96 | + " C-->|User Input|D[Llama 3]\n", |
62 | 97 | " D-->|Model Output|C\n",
|
63 | 98 | " E --> C\n",
|
64 | 99 | " classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
|
|
69 | 104 | " flowchart TD\n",
|
70 | 105 | " A[User Prompts] --> B(Frameworks e.g. LangChain)\n",
|
71 | 106 | " B <--> |Database, Docs, XLS|C[fa:fa-database External Data]\n",
|
72 |
| - " B -->|API|D[Llama 2]\n", |
| 107 | + " B -->|API|D[Llama 3]\n", |
73 | 108 | " classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
|
74 | 109 | " \"\"\")\n",
|
75 | 110 | "\n",
|
76 |
| - "def llama2_family():\n", |
| 111 | + "def llama3_family():\n", |
77 | 112 | " mm(\"\"\"\n",
|
78 | 113 | " graph LR;\n",
|
79 |
| - " llama-2 --> llama-2-7b\n", |
80 |
| - " llama-2 --> llama-2-13b\n", |
81 |
| - " llama-2 --> llama-2-70b\n", |
82 |
| - " llama-2-7b --> llama-2-7b-chat\n", |
83 |
| - " llama-2-13b --> llama-2-13b-chat\n", |
84 |
| - " llama-2-70b --> llama-2-70b-chat\n", |
| 114 | + " llama-3 --> llama-3-8b-instruct\n", |
| 115 | + " llama-3 --> llama-3-70b-instruct\n", |
85 | 116 | " classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
|
86 | 117 | " \"\"\")\n",
|
87 | 118 | "\n",
|
|
91 | 122 | " users --> apps\n",
|
92 | 123 | " apps --> frameworks\n",
|
93 | 124 | " frameworks --> platforms\n",
|
94 |
| - " platforms --> Llama 2\n", |
| 125 | + " platforms --> Llama 3\n", |
95 | 126 | " classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n",
|
96 | 127 | " \"\"\")\n",
|
97 | 128 | "\n",
|
|
115 | 146 | " user --> prompt\n",
|
116 | 147 | " prompt --> i_safety\n",
|
117 | 148 | " i_safety --> context\n",
|
118 |
| - " context --> Llama_2\n", |
119 |
| - " Llama_2 --> output\n", |
| 149 | + " context --> Llama_3\n", |
| 150 | + " Llama_3 --> output\n", |
120 | 151 | " output --> o_safety\n",
|
121 | 152 | " i_safety --> memory\n",
|
122 | 153 | " o_safety --> memory\n",
|
|
165 | 196 | "id": "i4Np_l_KtIno"
|
166 | 197 | },
|
167 | 198 | "source": [
|
168 |
| - "##**1 - Understanding Llama 2**" |
| 199 | + "##**1 - Understanding Llama 3**" |
169 | 200 | ]
|
170 | 201 | },
|
171 | 202 | {
|
|
174 | 205 | "id": "PGPSI3M5PGTi"
|
175 | 206 | },
|
176 | 207 | "source": [
|
177 |
| - "### **1.1 - What is Llama 2?**\n", |
| 208 | + "### **1.1 - What is Llama 3?**\n", |
178 | 209 | "\n",
|
179 | 210 | "* State of the art (SOTA), Open Source LLM\n",
|
180 |
| - "* 7B, 13B, 70B\n", |
| 211 | + "* Llama 3 8B, 70B\n", |
181 | 212 | "* Pretrained + Chat\n",
|
182 | 213 | "* Choosing model: Size, Quality, Cost, Speed\n",
|
183 |
| - "* [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n", |
184 |
| - "\n", |
| 214 | + "* [Llama 3 blog](https://ai.meta.com/blog/meta-llama-3/)\n", |
185 | 215 | "* [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/)"
|
186 | 216 | ]
|
187 | 217 | },
|
|
208 | 238 | },
|
209 | 239 | "outputs": [],
|
210 | 240 | "source": [
|
211 |
| - "llama2_family()" |
| 241 | + "llama3_family()" |
212 | 242 | ]
|
213 | 243 | },
|
214 | 244 | {
|
|
217 | 247 | "id": "aYeHVVh45bdT"
|
218 | 248 | },
|
219 | 249 | "source": [
|
220 |
| - "###**1.2 - Accessing Llama 2**\n", |
| 250 | + "###**1.2 - Accessing Llama 3**\n", |
221 | 251 | "* Download + Self Host (on-premise)\n",
|
222 | 252 | "* Hosted API Platform (e.g. [OctoAI](https://octoai.cloud/), [Replicate](https://replicate.com/meta))\n",
|
223 |
| - "* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139))\n", |
224 |
| - "\n" |
| 253 | + "* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139))" |
225 | 254 | ]
|
226 | 255 | },
|
227 | 256 | {
|
|
230 | 259 | "id": "kBuSay8vtzL4"
|
231 | 260 | },
|
232 | 261 | "source": [
|
233 |
| - "### **1.3 - Use Cases of Llama 2**\n", |
| 262 | + "### **1.3 - Use Cases of Llama 3**\n", |
234 | 263 | "* Content Generation\n",
|
235 | 264 | "* Chatbots\n",
|
236 | 265 | "* Summarization\n",
|
|
245 | 274 | "id": "sd54g0OHuqBY"
|
246 | 275 | },
|
247 | 276 | "source": [
|
248 |
| - "##**2 - Using Llama 2**\n", |
| 277 | + "##**2 - Using Llama 3**\n", |
249 | 278 | "\n",
|
250 |
| - "In this notebook, we are going to access [Llama 13b chat model](https://octoai.cloud/tools/text/chat?mode=demo&model=llama-2-13b-chat-fp16) using hosted API from OctoAI." |
251 |
| - ] |
252 |
| - }, |
253 |
| - { |
254 |
| - "cell_type": "markdown", |
255 |
| - "metadata": { |
256 |
| - "id": "h3YGMDJidHtH" |
257 |
| - }, |
258 |
| - "source": [ |
259 |
| - "### **2.1 - Install dependencies**" |
260 |
| - ] |
261 |
| - }, |
262 |
| - { |
263 |
| - "cell_type": "code", |
264 |
| - "execution_count": null, |
265 |
| - "metadata": { |
266 |
| - "id": "VhN6hXwx7FCp" |
267 |
| - }, |
268 |
| - "outputs": [], |
269 |
| - "source": [ |
270 |
| - "# Install dependencies and initialize\n", |
271 |
| - "%pip install -qU \\\n", |
272 |
| - " octoai-sdk \\\n", |
273 |
| - " langchain \\\n", |
274 |
| - " sentence_transformers \\\n", |
275 |
| - " pdf2image \\\n", |
276 |
| - " pdfminer \\\n", |
277 |
| - " pdfminer.six \\\n", |
278 |
| - " unstructured \\\n", |
279 |
| - " faiss-cpu \\\n", |
280 |
| - " pillow-heif \\\n", |
281 |
| - " opencv-python \\\n", |
282 |
| - " unstructured-inference \\\n", |
283 |
| - " pikepdf" |
| 279 | + "In this notebook, we are going to access [Llama 3 8b instruct model](https://octoai.cloud/text/chat?model=meta-llama-3-8b-instruct&mode=api) using hosted API from OctoAI." |
284 | 280 | ]
|
285 | 281 | },
|
286 | 282 | {
|
|
292 | 288 | "outputs": [],
|
293 | 289 | "source": [
|
294 | 290 | "# model on OctoAI platform that we will use for inferencing\n",
|
295 |
| - "# We will use llama 13b chat model hosted on OctoAI server ()\n", |
| 291 | + "# We will use llama 3 8b instruct model hosted on OctoAI server\n", |
296 | 292 | "\n",
|
297 |
| - "llama2_13b = \"llama-2-13b-chat-fp16\"" |
| 293 | + "llama3_8b = \"meta-llama-3-8b-instruct\"" |
298 | 294 | ]
|
299 | 295 | },
|
300 | 296 | {
|
|
326 | 322 | },
|
327 | 323 | "outputs": [],
|
328 | 324 | "source": [
|
329 |
| - "# we will use OctoAI's hosted API\n", |
330 |
| - "from octoai.client import Client\n", |
| 325 | + "# We will use OpenAI's APIs to talk to OctoAI's hosted model endpoint\n", |
| 326 | + "from openai import OpenAI\n", |
331 | 327 | "\n",
|
332 |
| - "client = Client(OCTOAI_API_TOKEN)\n", |
| 328 | + "client = OpenAI(\n", |
| 329 | + " base_url = \"https://text.octoai.run/v1\",\n", |
| 330 | + " api_key = os.environ[\"OCTOAI_API_TOKEN\"]\n", |
| 331 | + ")\n", |
333 | 332 | "\n",
|
334 | 333 | "# text completion with input prompt\n",
|
335 | 334 | "def Completion(prompt):\n",
|
336 | 335 | " output = client.chat.completions.create(\n",
|
337 | 336 | " messages=[\n",
|
338 |
| - " {\n", |
339 |
| - " \"role\": \"user\",\n", |
340 |
| - " \"content\": prompt\n", |
341 |
| - " }\n", |
| 337 | + " {\"role\": \"user\", \"content\": prompt}\n", |
342 | 338 | " ],\n",
|
343 |
| - " model=\"llama-2-13b-chat-fp16\",\n", |
| 339 | + " model=llama3_8b,\n", |
344 | 340 | " max_tokens=1000\n",
|
345 | 341 | " )\n",
|
346 | 342 | " return output.choices[0].message.content\n",
|
|
349 | 345 | "def ChatCompletion(prompt, system_prompt=None):\n",
|
350 | 346 | " output = client.chat.completions.create(\n",
|
351 | 347 | " messages=[\n",
|
352 |
| - " {\n", |
353 |
| - " \"role\": \"system\",\n", |
354 |
| - " \"content\": system_prompt\n", |
355 |
| - " },\n", |
356 |
| - " {\n", |
357 |
| - " \"role\": \"user\",\n", |
358 |
| - " \"content\": prompt\n", |
359 |
| - " }\n", |
| 348 | + " {\"role\": \"system\", \"content\": system_prompt},\n", |
| 349 | + " {\"role\": \"user\", \"content\": prompt}\n", |
360 | 350 | " ],\n",
|
361 |
| - " model=\"llama-2-13b-chat-fp16\",\n", |
| 351 | + " model=llama3_8b,\n", |
362 | 352 | " max_tokens=1000\n",
|
363 | 353 | " )\n",
|
364 | 354 | " return output.choices[0].message.content"
|
|
370 | 360 | "id": "5Jxq0pmf6L73"
|
371 | 361 | },
|
372 | 362 | "source": [
|
373 |
| - "### **2.2 - Basic completion**" |
| 363 | + "# **2.1 - Basic completion**" |
374 | 364 | ]
|
375 | 365 | },
|
376 | 366 | {
|
|
391 | 381 | "id": "StccjUDh6W0Q"
|
392 | 382 | },
|
393 | 383 | "source": [
|
394 |
| - "### **2.3 - System prompts**\n" |
| 384 | + "## **2.2 - System prompts**\n" |
395 | 385 | ]
|
396 | 386 | },
|
397 | 387 | {
|
|
415 | 405 | "id": "Hp4GNa066pYy"
|
416 | 406 | },
|
417 | 407 | "source": [
|
418 |
| - "### **2.4 - Response formats**\n", |
| 408 | + "### **2.3 - Response formats**\n", |
419 | 409 | "* Can support different formatted outputs e.g. text, JSON, etc."
|
420 | 410 | ]
|
421 | 411 | },
|
|
483 | 473 | "\n",
|
484 | 474 | "* User Prompts\n",
|
485 | 475 | "* Input Safety\n",
|
486 |
| - "* Llama 2\n", |
| 476 | + "* Llama 3\n", |
487 | 477 | "* Output Safety\n",
|
488 | 478 | "\n",
|
489 | 479 | "* Memory & Context"
|
|
743 | 733 | "### **4.3 - Retrieval Augmented Generation (RAG)**\n",
|
744 | 734 | "* Prompt Eng Limitations - Knowledge cutoff & lack of specialized data\n",
|
745 | 735 | "\n",
|
746 |
| - "* Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.\n", |
747 |
| - "\n", |
748 |
| - "For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama!\n", |
| 736 | + "* Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 3.\n", |
749 | 737 | "\n",
|
750 |
| - "\n", |
751 |
| - "\n" |
| 738 | + "For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama!" |
752 | 739 | ]
|
753 | 740 | },
|
754 | 741 | {
|
|
797 | 784 | "source": [
|
798 | 785 | "# langchain setup\n",
|
799 | 786 | "from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
|
800 |
| - "# Use the Llama 2 model hosted on OctoAI\n", |
801 |
| - "# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n", |
| 787 | + "\n", |
| 788 | + "# Use the Llama 3 model hosted on OctoAI\n", |
| 789 | + "# max_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n", |
| 790 | + "# temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n", |
802 | 791 | "# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens\n",
|
803 |
| - "# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n", |
804 | 792 | "llama_model = OctoAIEndpoint(\n",
|
805 |
| - " endpoint_url=\"https://text.octoai.run/v1/chat/completions\",\n", |
806 |
| - " model_kwargs={\n", |
807 |
| - " \"model\": llama2_13b,\n", |
808 |
| - " \"messages\": [\n", |
809 |
| - " {\n", |
810 |
| - " \"role\": \"system\",\n", |
811 |
| - " \"content\": \"You are a helpful, respectful and honest assistant.\"\n", |
812 |
| - " }\n", |
813 |
| - " ],\n", |
814 |
| - " \"max_tokens\": 1000,\n", |
815 |
| - " \"top_p\": 1,\n", |
816 |
| - " \"temperature\": 0.75\n", |
817 |
| - " },\n", |
| 793 | + " model=llama3_8b,\n", |
| 794 | + " max_tokens=1000,\n", |
| 795 | + " temperature=0.75,\n", |
| 796 | + " top_p=1\n", |
818 | 797 | ")"
|
819 | 798 | ]
|
820 | 799 | },
|
|
973 | 952 | },
|
974 | 953 | "source": [
|
975 | 954 | "#### **Resources**\n",
|
976 |
| - "- [GitHub - Llama 2](https://github.com/facebookresearch/llama)\n", |
977 |
| - "- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes)\n", |
978 |
| - "- [Llama 2](https://ai.meta.com/llama/)\n", |
979 |
| - "- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n", |
| 955 | + "- [GitHub - Llama](https://github.com/facebookresearch/llama)\n", |
| 956 | + "- [Github - LLama Recipes](https://github.com/facebookresearch/llama-recipes)\n", |
| 957 | + "- [Llama](https://ai.meta.com/llama/)\n", |
| 958 | + "- [Research Paper on Llama 2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n", |
| 959 | + "- [Llama 3 Page](https://ai.meta.com/blog/meta-llama-3/)\n", |
980 | 960 | "- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)\n",
|
981 | 961 | "- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n",
|
982 | 962 | "- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/)\n",
|
|
992 | 972 | "source": [
|
993 | 973 | "#### **Authors & Contact**\n",
|
994 | 974 | " * [email protected], [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/)\n",
|
995 |
| - " * [email protected], [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/)\n", |
| 975 | + " * [email protected], [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/dr-thierry-moreau/)\n", |
996 | 976 | "\n",
|
997 |
| - "Adapted to run on OctoAI by Thierry Moreau - [email protected]" |
| 977 | + "Adapted to run on OctoAI and use Llama 3 by [email protected] [Thierry Moreay | LinkedIn]()" |
998 | 978 | ]
|
999 | 979 | }
|
1000 | 980 | ],
|
|
0 commit comments