|
196 | 196 | "### **1.1 - What is Llama 3?**\n",
|
197 | 197 | "\n",
|
198 | 198 | "* State of the art (SOTA), Open Source LLM\n",
|
199 |
| - "* 8B, 70B\n", |
| 199 | + "* 8B, 70B - base and instruct models\n", |
200 | 200 | "* Choosing model: Size, Quality, Cost, Speed\n",
|
201 | 201 | "* Pretrained + Chat\n",
|
202 | 202 | "* [Meta Llama 3 Blog](https://ai.meta.com/blog/meta-llama-3/)\n",
|
|
275 | 275 | "source": [
|
276 | 276 | "## **2 - Using and Comparing Llama 3 and Llama 2**\n",
|
277 | 277 | "\n",
|
278 |
| - "In this notebook, we will use the Llama 2 70b chat and Llama 3 8b and 70b instruct models hosted on [Groq](https://console.groq.com/). You'll need to first [sign in](https://console.groq.com/) with your github or gmail account, then get an [API token](https://console.groq.com/keys) to try Groq out for free. (Groq runs Llama models very fast and they only support one Llama 2 model: the Llama 2 70b chat).\n", |
279 |
| - "\n", |
280 |
| - "**Note: You can also use other Llama hosting providers such as [Replicate](https://replicate.com/blog/run-llama-3-with-an-api?input=python), [Togther](https://docs.together.ai/docs/quickstart). Simply click the links here to see how to run `pip install` and use their freel trial API key with example code to modify the following three cells in 2.1 and 2.2.**\n" |
| 278 | + "We will be using Llama 2 7b & 70b chat and Llama 3 8b & 70b instruct models hosted on [Replicate](https://replicate.com/search?query=llama) to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. You can also use other Llama 3 cloud providers such as [Groq](https://console.groq.com/), [Together](https://api.together.xyz/playground/language/meta-llama/Llama-3-8b-hf), or [Anyscale](https://app.endpoints.anyscale.com/playground).\n" |
281 | 279 | ]
|
282 | 280 | },
|
283 | 281 | {
|
|
297 | 295 | },
|
298 | 296 | "outputs": [],
|
299 | 297 | "source": [
|
300 |
| - "!pip install groq" |
| 298 | + "!pip install replicate" |
301 | 299 | ]
|
302 | 300 | },
|
303 | 301 | {
|
304 | 302 | "cell_type": "markdown",
|
305 | 303 | "metadata": {},
|
306 | 304 | "source": [
|
307 | 305 | "### **2.2 - Create helpers for Llama 2 and Llama 3**\n",
|
308 |
| - "First, set your Groq API token as environment variables.\n" |
| 306 | + "First, set your Replicate API token as environment variables.\n" |
309 | 307 | ]
|
310 | 308 | },
|
311 | 309 | {
|
|
319 | 317 | "import os\n",
|
320 | 318 | "from getpass import getpass\n",
|
321 | 319 | "\n",
|
322 |
| - "GROQ_API_TOKEN = getpass()\n", |
| 320 | + "REPLICATE_API_TOKEN = getpass()\n", |
323 | 321 | "\n",
|
324 |
| - "os.environ[\"GROQ_API_KEY\"] = GROQ_API_TOKEN" |
| 322 | + "os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN" |
325 | 323 | ]
|
326 | 324 | },
|
327 | 325 | {
|
328 | 326 | "cell_type": "markdown",
|
329 | 327 | "metadata": {},
|
330 | 328 | "source": [
|
331 |
| - "Create Llama 2 and Llama 3 helper functions - for chatbot type of apps, we'll use Llama 3 8b/70b instruct models, not the base models." |
| 329 | + "Create Llama 2 and Llama 3 helper functions - for chatbot type of apps, we'll use Llama 3 instruct and Llama 2 chat models, not the base models." |
332 | 330 | ]
|
333 | 331 | },
|
334 | 332 | {
|
|
339 | 337 | },
|
340 | 338 | "outputs": [],
|
341 | 339 | "source": [
|
342 |
| - "from groq import Groq\n", |
343 |
| - "\n", |
344 |
| - "client = Groq(\n", |
345 |
| - " api_key=os.environ.get(\"GROQ_API_KEY\"),\n", |
346 |
| - ")\n", |
| 340 | + "import replicate\n", |
347 | 341 | "\n",
|
348 |
| - "def llama2(prompt, temperature=0.0, input_print=True):\n", |
349 |
| - " chat_completion = client.chat.completions.create(\n", |
350 |
| - " messages=[\n", |
351 |
| - " {\n", |
352 |
| - " \"role\": \"user\",\n", |
353 |
| - " \"content\": prompt,\n", |
354 |
| - " }\n", |
355 |
| - " ],\n", |
356 |
| - " model=\"llama2-70b-4096\",\n", |
357 |
| - " temperature=temperature,\n", |
358 |
| - " )\n", |
| 342 | + "def llama2_7b(prompt):\n", |
| 343 | + " output = replicate.run(\n", |
| 344 | + " \"meta/llama-2-7b-chat\",\n", |
| 345 | + " input={\"prompt\": prompt}\n", |
| 346 | + " )\n", |
| 347 | + " return ''.join(output)\n", |
359 | 348 | "\n",
|
360 |
| - " return (chat_completion.choices[0].message.content)\n", |
| 349 | + "def llama2_70b(prompt):\n", |
| 350 | + " output = replicate.run(\n", |
| 351 | + " \"meta/llama-2-70b-chat\",\n", |
| 352 | + " input={\"prompt\": prompt}\n", |
| 353 | + " )\n", |
| 354 | + " return ''.join(output)\n", |
361 | 355 | "\n",
|
362 |
| - "def llama3_8b(prompt, temperature=0.0, input_print=True):\n", |
363 |
| - " chat_completion = client.chat.completions.create(\n", |
364 |
| - " messages=[\n", |
365 |
| - " {\n", |
366 |
| - " \"role\": \"user\",\n", |
367 |
| - " \"content\": prompt,\n", |
368 |
| - " }\n", |
369 |
| - " ],\n", |
370 |
| - " model=\"llama3-8b-8192\",\n", |
371 |
| - " temperature=temperature,\n", |
372 |
| - " )\n", |
| 356 | + "def llama3_8b(prompt):\n", |
| 357 | + " output = replicate.run(\n", |
| 358 | + " \"meta/meta-llama-3-8b-instruct\",\n", |
| 359 | + " input={\"prompt\": prompt}\n", |
| 360 | + " )\n", |
| 361 | + " return ''.join(output)\n", |
373 | 362 | "\n",
|
374 |
| - " return (chat_completion.choices[0].message.content)\n", |
375 |
| - "\n", |
376 |
| - "def llama3_70b(prompt, temperature=0.0, input_print=True):\n", |
377 |
| - " chat_completion = client.chat.completions.create(\n", |
378 |
| - " messages=[\n", |
379 |
| - " {\n", |
380 |
| - " \"role\": \"user\",\n", |
381 |
| - " \"content\": prompt,\n", |
382 |
| - " }\n", |
383 |
| - " ],\n", |
384 |
| - " model=\"llama3-70b-8192\",\n", |
385 |
| - " temperature=temperature,\n", |
386 |
| - " )\n", |
387 |
| - "\n", |
388 |
| - " return (chat_completion.choices[0].message.content)" |
| 363 | + "def llama3_70b(prompt):\n", |
| 364 | + " output = replicate.run(\n", |
| 365 | + " \"meta/meta-llama-3-70b-instruct\",\n", |
| 366 | + " input={\"prompt\": prompt}\n", |
| 367 | + " )\n", |
| 368 | + " return ''.join(output)" |
389 | 369 | ]
|
390 | 370 | },
|
391 | 371 | {
|
|
406 | 386 | "outputs": [],
|
407 | 387 | "source": [
|
408 | 388 | "prompt = \"The typical color of a llama is: \"\n",
|
409 |
| - "output = llama2(prompt)\n", |
| 389 | + "output = llama2_7b(prompt)\n", |
410 | 390 | "md(output)"
|
411 | 391 | ]
|
412 | 392 | },
|
|
420 | 400 | "md(output)"
|
421 | 401 | ]
|
422 | 402 | },
|
| 403 | + { |
| 404 | + "cell_type": "code", |
| 405 | + "execution_count": null, |
| 406 | + "metadata": {}, |
| 407 | + "outputs": [], |
| 408 | + "source": [ |
| 409 | + "output = llama2_7b(\"The typical color of a llama is what? Answer in one word.\")\n", |
| 410 | + "md(output)" |
| 411 | + ] |
| 412 | + }, |
423 | 413 | {
|
424 | 414 | "cell_type": "code",
|
425 | 415 | "execution_count": null,
|
|
430 | 420 | "md(output)"
|
431 | 421 | ]
|
432 | 422 | },
|
| 423 | + { |
| 424 | + "cell_type": "markdown", |
| 425 | + "metadata": {}, |
| 426 | + "source": [ |
| 427 | + "**Note: Llama 3 follows instructions better than Llama 2 in single-turn chat.**" |
| 428 | + ] |
| 429 | + }, |
433 | 430 | {
|
434 | 431 | "cell_type": "markdown",
|
435 | 432 | "metadata": {
|
|
457 | 454 | "outputs": [],
|
458 | 455 | "source": [
|
459 | 456 | "prompt_chat = \"What is the average lifespan of a Llama? Answer the question in few words.\"\n",
|
460 |
| - "output = llama2(prompt_chat)\n", |
| 457 | + "output = llama2_7b(prompt_chat)\n", |
461 | 458 | "md(output)"
|
462 | 459 | ]
|
463 | 460 | },
|
|
483 | 480 | "source": [
|
484 | 481 | "# example without previous context. LLM's are stateless and cannot understand \"they\" without previous context\n",
|
485 | 482 | "prompt_chat = \"What animal family are they? Answer the question in few words.\"\n",
|
486 |
| - "output = llama2(prompt_chat)\n", |
| 483 | + "output = llama2_7b(prompt_chat)\n", |
487 | 484 | "md(output)"
|
488 | 485 | ]
|
489 | 486 | },
|
|
497 | 494 | "md(output)"
|
498 | 495 | ]
|
499 | 496 | },
|
| 497 | + { |
| 498 | + "cell_type": "code", |
| 499 | + "execution_count": null, |
| 500 | + "metadata": {}, |
| 501 | + "outputs": [], |
| 502 | + "source": [ |
| 503 | + "output = llama2_70b(prompt_chat)\n", |
| 504 | + "md(output)" |
| 505 | + ] |
| 506 | + }, |
500 | 507 | {
|
501 | 508 | "cell_type": "code",
|
502 | 509 | "execution_count": null,
|
|
536 | 543 | "Assistant: 15-20 years.\n",
|
537 | 544 | "User: What animal family are they?\n",
|
538 | 545 | "\"\"\"\n",
|
539 |
| - "output = llama2(prompt_chat)\n", |
| 546 | + "output = llama2_7b(prompt_chat)\n", |
540 | 547 | "md(output)"
|
541 | 548 | ]
|
542 | 549 | },
|
|
579 | 586 | "\n",
|
580 | 587 | "Answer the question with one word.\n",
|
581 | 588 | "\"\"\"\n",
|
582 |
| - "output = llama2(prompt_chat)\n", |
| 589 | + "output = llama2_7b(prompt_chat)\n", |
| 590 | + "md(output)" |
| 591 | + ] |
| 592 | + }, |
| 593 | + { |
| 594 | + "cell_type": "code", |
| 595 | + "execution_count": null, |
| 596 | + "metadata": {}, |
| 597 | + "outputs": [], |
| 598 | + "source": [ |
| 599 | + "output = llama2_70b(prompt_chat)\n", |
583 | 600 | "md(output)"
|
584 | 601 | ]
|
585 | 602 | },
|
|
597 | 614 | "cell_type": "markdown",
|
598 | 615 | "metadata": {},
|
599 | 616 | "source": [
|
600 |
| - "**Both Llama 3 8b and Llama 2 70b follows instructions (e.g. \"Answer the question with one word\") better than Llama 2 7b.**" |
| 617 | + "**Both Llama 3 8b and Llama 2 70b follows instructions (e.g. \"Answer the question with one word\") better than Llama 2 7b in multi-turn chat.**" |
601 | 618 | ]
|
602 | 619 | },
|
603 | 620 | {
|
|
640 | 657 | "\n",
|
641 | 658 | "Give one word response.\n",
|
642 | 659 | "'''\n",
|
643 |
| - "output = llama2(prompt)\n", |
| 660 | + "output = llama2_7b(prompt)\n", |
644 | 661 | "md(output)"
|
645 | 662 | ]
|
646 | 663 | },
|
|
684 | 701 | "Give one word response.\n",
|
685 | 702 | "'''\n",
|
686 | 703 | "\n",
|
687 |
| - "output = llama2(prompt)\n", |
| 704 | + "output = llama2_7b(prompt)\n", |
688 | 705 | "md(output)"
|
689 | 706 | ]
|
690 | 707 | },
|
|
704 | 721 | "cell_type": "markdown",
|
705 | 722 | "metadata": {},
|
706 | 723 | "source": [
|
707 |
| - "**Note: Llama 2, with few shots, has the same output \"Neutral\" as Llama 3.**" |
| 724 | + "**Note: Llama 2, with few shots, has the same output \"Neutral\" as Llama 3, but Llama 2 doesn't follow instructions (Give one word response) well.**" |
708 | 725 | ]
|
709 | 726 | },
|
710 | 727 | {
|
|
894 | 911 | "outputs": [],
|
895 | 912 | "source": [
|
896 | 913 | "!pip install langchain\n",
|
| 914 | + "!pip install langchain-community\n", |
897 | 915 | "!pip install sentence-transformers\n",
|
898 | 916 | "!pip install faiss-cpu\n",
|
899 | 917 | "!pip install bs4\n",
|
|
936 | 954 | "vectorstore = FAISS.from_documents(all_splits, HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-mpnet-base-v2\"))"
|
937 | 955 | ]
|
938 | 956 | },
|
| 957 | + { |
| 958 | + "cell_type": "markdown", |
| 959 | + "metadata": {}, |
| 960 | + "source": [ |
| 961 | + "You'll need to first sign in at [Groq](https://console.groq.com/login) with your github or gmail account, then get an API token to try Groq out for free." |
| 962 | + ] |
| 963 | + }, |
939 | 964 | {
|
940 | 965 | "cell_type": "code",
|
941 | 966 | "execution_count": null,
|
942 | 967 | "metadata": {},
|
943 | 968 | "outputs": [],
|
944 | 969 | "source": [
|
945 |
| - "from langchain_groq import ChatGroq\n", |
946 |
| - "llm = ChatGroq(temperature=0, model_name=\"llama3-8b-8192\")\n", |
| 970 | + "import os\n", |
| 971 | + "from getpass import getpass\n", |
947 | 972 | "\n",
|
948 |
| - "from langchain.chains import ConversationalRetrievalChain\n", |
949 |
| - "chain = ConversationalRetrievalChain.from_llm(llm,\n", |
950 |
| - " vectorstore.as_retriever(),\n", |
951 |
| - " return_source_documents=True)\n", |
| 973 | + "GROQ_API_TOKEN = getpass()\n", |
952 | 974 | "\n",
|
953 |
| - "result = chain({\"question\": \"What’s new with Llama 3?\", \"chat_history\": []})\n", |
954 |
| - "md(result['answer'])\n" |
| 975 | + "os.environ[\"GROQ_API_KEY\"] = GROQ_API_TOKEN" |
955 | 976 | ]
|
956 | 977 | },
|
957 | 978 | {
|
958 | 979 | "cell_type": "code",
|
959 | 980 | "execution_count": null,
|
960 |
| - "metadata": { |
961 |
| - "id": "NmEhBe3Kiyre" |
962 |
| - }, |
| 981 | + "metadata": {}, |
| 982 | + "outputs": [], |
| 983 | + "source": [ |
| 984 | + "from langchain_groq import ChatGroq\n", |
| 985 | + "llm = ChatGroq(temperature=0, model_name=\"llama3-8b-8192\")" |
| 986 | + ] |
| 987 | + }, |
| 988 | + { |
| 989 | + "cell_type": "code", |
| 990 | + "execution_count": null, |
| 991 | + "metadata": {}, |
963 | 992 | "outputs": [],
|
964 | 993 | "source": [
|
965 |
| - "# Query against your own data\n", |
966 | 994 | "from langchain.chains import ConversationalRetrievalChain\n",
|
967 |
| - "chain = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)\n", |
968 | 995 | "\n",
|
969 |
| - "chat_history = []\n", |
970 |
| - "query = \"What’s new with Llama 3?\"\n", |
971 |
| - "result = chain({\"question\": query, \"chat_history\": chat_history})\n", |
972 |
| - "md(result['answer'])" |
| 996 | + "# Query against your own data\n", |
| 997 | + "chain = ConversationalRetrievalChain.from_llm(llm,\n", |
| 998 | + " vectorstore.as_retriever(),\n", |
| 999 | + " return_source_documents=True)\n", |
| 1000 | + "\n", |
| 1001 | + "# no chat history passed\n", |
| 1002 | + "result = chain({\"question\": \"What’s new with Llama 3?\", \"chat_history\": []})\n", |
| 1003 | + "md(result['answer'])\n" |
973 | 1004 | ]
|
974 | 1005 | },
|
975 | 1006 | {
|
|
1083 | 1114 | "name": "python",
|
1084 | 1115 | "nbconvert_exporter": "python",
|
1085 | 1116 | "pygments_lexer": "ipython3",
|
1086 |
| - "version": "3.11.7" |
| 1117 | + "version": "3.10.14" |
1087 | 1118 | }
|
1088 | 1119 | },
|
1089 | 1120 | "nbformat": 4,
|
|
0 commit comments