diff --git a/meta/.gitignore b/meta/.gitignore new file mode 100644 index 0000000..c872bf1 --- /dev/null +++ b/meta/.gitignore @@ -0,0 +1,2 @@ +**/.ipynb_checkpoints/ + diff --git a/meta/README.md b/meta/README.md new file mode 100644 index 0000000..59a0bbd --- /dev/null +++ b/meta/README.md @@ -0,0 +1,31 @@ + +# Meta's Llama Models on AWS + +This project includes sample notebooks that demonstrate how to use Meta's LLaMA models via Amazon Bedrock for natural language tasks such as question answering and PDF interaction. It leverages Bedrock, S3 and SageMaker Studio for development and execution. + +--- + +## Sample Notebooks + +| Notebook | Description | +|----------------------|-------------| +| [`rag-chatbot.ipynb`](samples/rag-chatbot.ipynb) | Demonstrates how to build a simple RAG (Retrieval-Augmented Generation) chatbot using Meta's LLM via Amazon Bedrock, with documents stored in Amazon S3 and vector embeddings generated using Amazon Titan. Uses **LangChain** for document loading, embedding, and retrieval logic. | +| [`summarize-pdf.ipynb`](samples/summarize-pdf.ipynb) | Shows how to use Meta's Llama 3 model via Amazon Bedrock to summarize content extracted from a PDF document. | + +--- + +## Quick Start + +### Enable Models' Access +Visit the Bedrock Console > Model Access: https://console.aws.amazon.com/bedrock/home and enable the following model: +- `meta.llama3-8b-instruct-v1:0` + +In addition the [`rag-chatbot.ipynb`](samples/rag-chatbot.ipynb) sample also requires the following model access +- `amazon.titan-embed-text-v1` + + +### Clone the Repository and Open the Notebook +In your SageMaker space clone this repository +```bash +git clone https://github.com/${GITHUB_ACTOR}/${GITHUB_REPOSITORY}.git +cd $(basename ${GITHUB_REPOSITORY})/meta/samples diff --git a/meta/samples/media/sample.pdf b/meta/samples/media/sample.pdf new file mode 100644 index 0000000..8baa214 Binary files /dev/null and b/meta/samples/media/sample.pdf differ diff --git a/meta/samples/rag-chatbot.ipynb b/meta/samples/rag-chatbot.ipynb new file mode 100644 index 0000000..53ab243 --- /dev/null +++ b/meta/samples/rag-chatbot.ipynb @@ -0,0 +1,384 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d6ca1cf4-fb82-4a2c-bb7a-9c9f4f91a8d2", + "metadata": {}, + "source": [ + "# Retrieval-Augmented Generation (RAG) Chatbot using Meta LLMs and Amazon S3\n", + "\n", + "This notebook demonstrates how to build a simple Retrieval-Augmented Generation (RAG) chatbot using Meta's LLM via Amazon Bedrock. Source documents are stored in Amazon S3, and Amazon Titan is used to generate vector embeddings. The flow is orchestrated using LangChain, which handles document loading, embedding, and semantic retrieval." + ] + }, + { + "cell_type": "markdown", + "id": "da7f9dc5-a851-4f04-8e93-b10047ee95ee", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "Install required libraries." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "fb4371c2-02f3-4a15-9cd9-9767fd645dce", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:07:56.419295Z", + "iopub.status.busy": "2025-07-14T11:07:56.418567Z", + "iopub.status.idle": "2025-07-14T11:08:02.462672Z", + "shell.execute_reply": "2025-07-14T11:08:02.461212Z", + "shell.execute_reply.started": "2025-07-14T11:07:56.419267Z" + } + }, + "outputs": [], + "source": [ + "!pip install boto3 langchain faiss-cpu s3fs pymupdf --quiet" + ] + }, + { + "cell_type": "markdown", + "id": "16fdad72-8edf-4344-8151-77162fbc9c3d", + "metadata": {}, + "source": [ + "Import dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "f00e7847-6121-404e-90a5-b61d7cd084b3", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:02.467688Z", + "iopub.status.busy": "2025-07-14T11:08:02.467233Z", + "iopub.status.idle": "2025-07-14T11:08:03.954408Z", + "shell.execute_reply": "2025-07-14T11:08:03.953614Z", + "shell.execute_reply.started": "2025-07-14T11:08:02.467657Z" + } + }, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "from langchain.vectorstores import FAISS\n", + "from langchain.embeddings import BedrockEmbeddings\n", + "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", + "from langchain.chains import RetrievalQA\n", + "from langchain_aws import BedrockEmbeddings\n", + "from langchain_aws import ChatBedrock" + ] + }, + { + "cell_type": "markdown", + "id": "5908bceb-d5f0-42e6-a3a0-223bd1186242", + "metadata": {}, + "source": [ + "Setup constants " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "f9590e13-e323-4aeb-bda1-69bac9f032c7", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:03.955793Z", + "iopub.status.busy": "2025-07-14T11:08:03.955261Z", + "iopub.status.idle": "2025-07-14T11:08:03.959598Z", + "shell.execute_reply": "2025-07-14T11:08:03.958848Z", + "shell.execute_reply.started": "2025-07-14T11:08:03.955762Z" + } + }, + "outputs": [], + "source": [ + "AWS_REGION = \"us-east-1\"\n", + "S3_BUCKET = \"lior-llama3-rag-chatbot-data-20255\"\n", + "EMBED_MODEL_ID = \"amazon.titan-embed-text-v1\"\n", + "LLM_MODEL_ID = \"meta.llama3-70b-instruct-v1:0\"\n", + "PDF_FILE = 'media/sample.pdf'\n", + "PDF_S3_KEY = 'media/sample.pdf'" + ] + }, + { + "cell_type": "markdown", + "id": "d8eab237-505e-46f6-bdb1-413593b7dafe", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-06T15:09:10.734843Z", + "iopub.status.busy": "2025-07-06T15:09:10.734455Z", + "iopub.status.idle": "2025-07-06T15:09:10.942716Z", + "shell.execute_reply": "2025-07-06T15:09:10.942003Z", + "shell.execute_reply.started": "2025-07-06T15:09:10.734814Z" + } + }, + "source": [ + "### Setup S3\n", + "\n", + "As a demostration, we upload the local PDF to S3 in order to demostrated a full S3 integration." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "7c44efed-565d-4e9e-aef7-18418c8919ec", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:03.961843Z", + "iopub.status.busy": "2025-07-14T11:08:03.961496Z", + "iopub.status.idle": "2025-07-14T11:08:04.291281Z", + "shell.execute_reply": "2025-07-14T11:08:04.290440Z", + "shell.execute_reply.started": "2025-07-14T11:08:03.961812Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Bucket 'lior-llama3-rag-chatbot-data-20255' exists.\n", + "Uploaded media/sample.pdf to s3://lior-llama3-rag-chatbot-data-20255/media/sample.pdf\n" + ] + } + ], + "source": [ + "from s3_utils import upload_file_to_s3\n", + "\n", + "upload_file_to_s3(\n", + " pdf_file=PDF_FILE,\n", + " bucket=S3_BUCKET,\n", + " key=PDF_S3_KEY,\n", + " region=AWS_REGION\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "id": "d328c87d-d840-48f2-986e-43f49cd41b13", + "metadata": {}, + "source": [ + "## Load Documents from Amazon S3" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "3c85614a-ebf3-436f-a89c-1e02c1592383", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:04.293581Z", + "iopub.status.busy": "2025-07-14T11:08:04.293180Z", + "iopub.status.idle": "2025-07-14T11:08:04.514151Z", + "shell.execute_reply": "2025-07-14T11:08:04.513328Z", + "shell.execute_reply.started": "2025-07-14T11:08:04.293556Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Downloaded PDF from S3 to media/sample.pdf\n" + ] + } + ], + "source": [ + "from langchain.schema import Document\n", + "import fitz\n", + "\n", + "s3 = boto3.client(\"s3\", region_name=AWS_REGION)\n", + "s3.download_file(S3_BUCKET, PDF_S3_KEY, PDF_FILE)\n", + "print(f\"Downloaded PDF from S3 to {PDF_FILE}\")\n", + "\n", + "doc_text = \"\"\n", + "with fitz.open(PDF_FILE) as doc:\n", + " for page in doc:\n", + " doc_text += page.get_text()\n", + "\n", + "# --- Wrap into LangChain Document\n", + "documents = [Document(page_content=doc_text, metadata={\"source\": \"sample.pdf\"})]" + ] + }, + { + "cell_type": "markdown", + "id": "2466c257-9377-49d1-9596-5243d6f86657", + "metadata": {}, + "source": [ + "## Chunk and Embed Documents\n", + "\n", + "Split documents into manageable chunks and embed them using Amazon Titan Embeddings." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "8d8ab954-0f69-41a7-b73c-78915c224a5e", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:04.515648Z", + "iopub.status.busy": "2025-07-14T11:08:04.515221Z", + "iopub.status.idle": "2025-07-14T11:08:09.632915Z", + "shell.execute_reply": "2025-07-14T11:08:09.631910Z", + "shell.execute_reply.started": "2025-07-14T11:08:04.515617Z" + } + }, + "outputs": [], + "source": [ + "splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)\n", + "texts = splitter.split_documents(documents)\n", + "\n", + "embedding = BedrockEmbeddings(model_id=EMBED_MODEL_ID)\n", + "vectordb = FAISS.from_documents(texts, embedding)\n", + "retriever = vectordb.as_retriever(search_kwargs={\"k\": 3})" + ] + }, + { + "cell_type": "markdown", + "id": "f0318ffa-0fa4-4ca8-863e-38e9f350ec6f", + "metadata": {}, + "source": [ + "## Build Meta-Powered RAG Chain\n", + "\n", + "We use Meta's Llama 3 model via Bedrock to power the chatbot with retrieval-augmented context." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "22662901-1a35-4af4-b77b-1b1bcc3623a4", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:09.634581Z", + "iopub.status.busy": "2025-07-14T11:08:09.634154Z", + "iopub.status.idle": "2025-07-14T11:08:09.644276Z", + "shell.execute_reply": "2025-07-14T11:08:09.643586Z", + "shell.execute_reply.started": "2025-07-14T11:08:09.634548Z" + } + }, + "outputs": [], + "source": [ + "llm = ChatBedrock(model_id=LLM_MODEL_ID)\n", + "qa_chain = RetrievalQA.from_chain_type(\n", + " llm=llm,\n", + " retriever=retriever,\n", + " return_source_documents=True,\n", + " chain_type=\"stuff\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "7f5d38dd-0e04-47d2-8871-c9b2bd683aa7", + "metadata": {}, + "source": [ + "## Chat\n", + "Ask a question about the document that was embbeded, see the answer and the chunks that were used in order to generate it." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "8b5448a7-ac05-41ac-9bb0-60c399d23014", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:13:44.223463Z", + "iopub.status.busy": "2025-07-14T11:13:44.222959Z", + "iopub.status.idle": "2025-07-14T11:13:45.767895Z", + "shell.execute_reply": "2025-07-14T11:13:45.767266Z", + "shell.execute_reply.started": "2025-07-14T11:13:44.223385Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Answer:\n", + "Alexander Adell is a character in a story, likely a science fiction story. He is one of the two main characters, along with Lupov, and appears to be an expert or technician involved with a giant computer called Multivac.\n", + "\n", + "Retrieved Chunks:\n", + "\n", + "--- Chunk 1 ---\n", + "of each other and the bottle. \n", + " \n", + "\"It's amazing when you think of it,\" said Adell. His broad face had lines of weariness in it, and he stirred \n", + "his drink slowly with a glass rod, watching the cubes of ice slur clumsily about. \"All the energy we can \n", + "possibly ever use for free. Enough energy, if we wanted to draw on it, to melt all Earth into a big drop of \n", + "impure liquid iron, and still never miss the energy so used. All the energy we could ever use, forever and \n", + "forever and forever.\"\n", + "\n", + "--- Chunk 2 ---\n", + "that giant computer. They had at least a vague notion of the general plan of relays and circuits that had \n", + "long since grown past the point where any single human could possibly have a firm grasp of the whole. \n", + " \n", + "Multivac was self-adjusting and self-correcting. It had to be, for nothing human could adjust and correct it \n", + "quickly enough or even adequately enough -- so Adell and Lupov attended the monstrous giant only\n", + "\n", + "--- Chunk 3 ---\n", + "to another sun.\" \n", + " \n", + "There was silence for a while. Adell put his glass to his lips only occasionally, and Lupov's eyes slowly \n", + "closed. They rested. \n", + " \n", + "Then Lupov's eyes snapped open. \"You're thinking we'll switch to another sun when ours is done, aren't \n", + "you?\" \n", + " \n", + "\"I'm not thinking.\" \n", + " \n", + "\"Sure you are. You're weak on logic, that's the trouble with you. You're like the guy in the story who was \n", + "caught in a sudden shower and Who ran to a grove of trees and got under one. He wasn't worried, you\n" + ] + } + ], + "source": [ + "query = \"Who is Alexander Adell?\"\n", + "qa_chain.invoke({\"query\": query})\n", + "\n", + "# Print the answer\n", + "print(\"\\nAnswer:\")\n", + "print(result[\"result\"])\n", + "\n", + "# Print the retrieved chunks used for the answer\n", + "print(\"\\nRetrieved Chunks:\")\n", + "for i, doc in enumerate(result[\"source_documents\"]):\n", + " print(f\"\\n--- Chunk {i+1} ---\")\n", + " print(doc.page_content.strip())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13fd63ba-2bd8-42c0-9b3b-654c7b4256d8", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/meta/samples/s3_utils.py b/meta/samples/s3_utils.py new file mode 100644 index 0000000..ee9fde7 --- /dev/null +++ b/meta/samples/s3_utils.py @@ -0,0 +1,43 @@ +# s3_utils.py + +import os +import boto3 +from botocore.exceptions import ClientError + +def upload_file_to_s3(pdf_file, bucket, key, region="us-east-1"): + """ + Upload a local file to S3. If the bucket doesn't exist, create it. + + Args: + pdf_file (str): Local file path + bucket (str): S3 bucket name + key (str): S3 object key + region (str): AWS region + """ + s3 = boto3.client("s3", region_name=region) + + # Ensure bucket exists + try: + s3.head_bucket(Bucket=bucket) + print(f"Bucket '{bucket}' exists.") + except ClientError as e: + error_code = int(e.response["Error"]["Code"]) + if error_code == 404: + print(f"Bucket '{bucket}' not found. Creating it...") + if region == "us-east-1": + s3.create_bucket(Bucket=bucket) + else: + s3.create_bucket( + Bucket=bucket, + CreateBucketConfiguration={"LocationConstraint": region} + ) + print(f"Bucket '{bucket}' created.") + else: + raise + + # Upload file + if os.path.exists(pdf_file): + s3.upload_file(pdf_file, bucket, key) + print(f"Uploaded {pdf_file} to s3://{bucket}/{key}") + else: + raise FileNotFoundError(f"File not found: {pdf_file}") diff --git a/meta/samples/summarize-pdf.ipynb b/meta/samples/summarize-pdf.ipynb new file mode 100644 index 0000000..36c5ecf --- /dev/null +++ b/meta/samples/summarize-pdf.ipynb @@ -0,0 +1,388 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1ca5b45b", + "metadata": {}, + "source": [ + "# Summarize text from PDF using Meta Llama 3 on Amazon Bedrock" + ] + }, + { + "cell_type": "markdown", + "id": "7b523878", + "metadata": {}, + "source": [ + "This notebook demonstrates how to use Meta's Llama 3 model via Amazon Bedrock to chat with content extracted from a PDF document. The PDF is loaded from S3, it's content is then sent as part of a prompt to Llama 3" + ] + }, + { + "cell_type": "markdown", + "id": "8f41832d-2978-4a37-b37d-bda530984e40", + "metadata": {}, + "source": [ + "## Setup\n", + "To run this notebook you would need to install dependencies - boto3 and botocore." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "f64163a3", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:44.381896Z", + "iopub.status.busy": "2025-07-14T11:08:44.381409Z", + "iopub.status.idle": "2025-07-14T11:08:46.143254Z", + "shell.execute_reply": "2025-07-14T11:08:46.142157Z", + "shell.execute_reply.started": "2025-07-14T11:08:44.381846Z" + } + }, + "outputs": [], + "source": [ + "!pip install boto3 pymupdf --quiet" + ] + }, + { + "cell_type": "markdown", + "id": "c2a123c3-d140-4193-8f56-7dc21fb5789e", + "metadata": { + "execution": { + "iopub.execute_input": "2025-06-22T13:22:26.078922Z", + "iopub.status.busy": "2025-06-22T13:22:26.078543Z", + "iopub.status.idle": "2025-06-22T13:22:26.084905Z", + "shell.execute_reply": "2025-06-22T13:22:26.083742Z", + "shell.execute_reply.started": "2025-06-22T13:22:26.078887Z" + } + }, + "source": [ + "Import the necessary libraries." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "c557d86b", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:46.145010Z", + "iopub.status.busy": "2025-07-14T11:08:46.144697Z", + "iopub.status.idle": "2025-07-14T11:08:46.418010Z", + "shell.execute_reply": "2025-07-14T11:08:46.417301Z", + "shell.execute_reply.started": "2025-07-14T11:08:46.144985Z" + } + }, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import fitz\n", + "import os" + ] + }, + { + "cell_type": "markdown", + "id": "d8fe4dd0-c0e6-4e22-bf8f-c1fd2dd7fe11", + "metadata": {}, + "source": [ + "## Initialization\n", + "Setup constants " + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "e4c684cb-8b75-4183-ade9-bb65ec18b302", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:46.419557Z", + "iopub.status.busy": "2025-07-14T11:08:46.419127Z", + "iopub.status.idle": "2025-07-14T11:08:46.423093Z", + "shell.execute_reply": "2025-07-14T11:08:46.422259Z", + "shell.execute_reply.started": "2025-07-14T11:08:46.419524Z" + } + }, + "outputs": [], + "source": [ + "AWS_REGION = \"us-east-1\"\n", + "BEDROCK_MODEL_ID = 'meta.llama3-8b-instruct-v1:0'\n", + "S3_BUCKET = 'llama3-chat-data'\n", + "PDF_FILE = 'media/sample.pdf'\n", + "PDF_S3_KEY = 'media/sample.pdf'" + ] + }, + { + "cell_type": "markdown", + "id": "f2364756-b3d7-41e9-8799-356410bfa6e0", + "metadata": {}, + "source": [ + "### Setup S3\n", + "As a demostration, we upload the local PDF to S3 in order to demostrated a full S3 integration." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "720f8364-a3d8-4262-99c1-bfb2b8a81166", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:46.425302Z", + "iopub.status.busy": "2025-07-14T11:08:46.425065Z", + "iopub.status.idle": "2025-07-14T11:08:46.645616Z", + "shell.execute_reply": "2025-07-14T11:08:46.644770Z", + "shell.execute_reply.started": "2025-07-14T11:08:46.425280Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Bucket 'llama3-chat-data' exists.\n", + "Uploaded media/sample.pdf to s3://llama3-chat-data/media/sample.pdf\n" + ] + } + ], + "source": [ + "from s3_utils import upload_file_to_s3\n", + "\n", + "upload_file_to_s3(\n", + " pdf_file=PDF_FILE,\n", + " bucket=S3_BUCKET,\n", + " key=PDF_S3_KEY,\n", + " region=AWS_REGION\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "id": "2a8c6ff0-05ee-4a77-ba55-beda35600a37", + "metadata": {}, + "source": [ + "### Initilize clients" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "c7fac8eb-694c-45b3-a4ca-4f6ab5f8a27e", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:46.647167Z", + "iopub.status.busy": "2025-07-14T11:08:46.646915Z", + "iopub.status.idle": "2025-07-14T11:08:46.664276Z", + "shell.execute_reply": "2025-07-14T11:08:46.663553Z", + "shell.execute_reply.started": "2025-07-14T11:08:46.647146Z" + } + }, + "outputs": [], + "source": [ + "bedrock = boto3.client('bedrock-runtime', region_name=AWS_REGION)\n", + "s3 = boto3.client('s3', region_name=AWS_REGION)" + ] + }, + { + "cell_type": "markdown", + "id": "4ce16b94-3f72-4d1a-bcf8-122374f229fe", + "metadata": {}, + "source": [ + "## Handle PDF\n", + "Upload PDF and extract text." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "4040ceea", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:46.665665Z", + "iopub.status.busy": "2025-07-14T11:08:46.665244Z", + "iopub.status.idle": "2025-07-14T11:08:46.730335Z", + "shell.execute_reply": "2025-07-14T11:08:46.729622Z", + "shell.execute_reply.started": "2025-07-14T11:08:46.665636Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Uploaded media/sample.pdf to s3://llama3-chat-data/media/sample.pdf\n" + ] + } + ], + "source": [ + "if os.path.exists(PDF_FILE):\n", + " s3.upload_file(PDF_FILE, S3_BUCKET, PDF_S3_KEY)\n", + " print(f\"Uploaded {PDF_FILE} to s3://{S3_BUCKET}/{PDF_S3_KEY}\")\n", + "else:\n", + " print(f\"PDF file '{PDF_FILE}' not found. Please upload one.\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "f8fad7b9", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:46.731411Z", + "iopub.status.busy": "2025-07-14T11:08:46.731119Z", + "iopub.status.idle": "2025-07-14T11:08:46.766529Z", + "shell.execute_reply": "2025-07-14T11:08:46.765531Z", + "shell.execute_reply.started": "2025-07-14T11:08:46.731390Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The Last Question by Isaac Asimov © 1956 \n", + " \n", + "The last question was asked for the first time, half in jest, on May 21, 2061, at a time when humanity first \n", + "stepped into the light. The question came about as a result of a five dollar bet over highballs, and it \n", + "happened this way: \n", + " \n", + "Alexander Adell and Bertram Lupov were two of the faithful attendants of Multivac. As well as any human \n", + "beings could, they knew what lay behind the cold, clicking, flashing face -- miles and miles of face -- of \n", + "that giant computer. They had at least a vague notion of the general plan of relays and circuits that had \n", + "long since grown past the point where any single human could possibly have a firm grasp of the whole. \n", + " \n", + "Multivac was self-adjusting and self-correcting. It had to be, for nothing human could adjust and correct it \n", + "quickly enough or even adequately enough -- so Adell and Lupov attended the monstrous giant only \n", + "lightly and superficially, yet as well as any men could. They fed it data, adjusted qu\n" + ] + } + ], + "source": [ + "# 📄 Extract text from PDF\n", + "doc_text = \"\"\n", + "if os.path.exists(PDF_FILE):\n", + " with fitz.open(PDF_FILE) as doc:\n", + " for page in doc:\n", + " doc_text += page.get_text()\n", + "\n", + "print(doc_text[:1000]) # preview\n" + ] + }, + { + "cell_type": "markdown", + "id": "a19e17c3-c001-4e47-8fee-780c817bb843", + "metadata": {}, + "source": [ + "## Query\n", + "Now query the PDF." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "fe6a5087", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:08:46.767749Z", + "iopub.status.busy": "2025-07-14T11:08:46.767531Z", + "iopub.status.idle": "2025-07-14T11:08:46.773647Z", + "shell.execute_reply": "2025-07-14T11:08:46.772752Z", + "shell.execute_reply.started": "2025-07-14T11:08:46.767731Z" + } + }, + "outputs": [], + "source": [ + "def query_llama3(prompt):\n", + " body = {\n", + " \"prompt\": prompt,\n", + " \"max_gen_len\": 231,\n", + " \"temperature\": 0.7,\n", + " \"top_p\": 0.9\n", + " }\n", + " response = bedrock.invoke_model(\n", + " modelId=BEDROCK_MODEL_ID,\n", + " body=json.dumps(body),\n", + " contentType=\"application/json\",\n", + " accept=\"application/json\"\n", + " )\n", + " result = json.loads(response['body'].read())\n", + " return result['generation']\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "19a6a031-2d81-4643-962b-5e5240cf2508", + "metadata": { + "execution": { + "iopub.execute_input": "2025-07-14T11:11:33.280126Z", + "iopub.status.busy": "2025-07-14T11:11:33.279855Z", + "iopub.status.idle": "2025-07-14T11:11:36.037676Z", + "shell.execute_reply": "2025-07-14T11:11:36.036832Z", + "shell.execute_reply.started": "2025-07-14T11:11:33.280105Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Llama 3 Summary:\n", + "\n", + "THE END. \n", + "```\n", + "\n", + "\n", + "\n", + "* * *\n", + "\n", + "\n", + "\n", + "The story is a thought-provoking exploration of the concept of entropy and the possibility of reversing it. The narrative spans billions of years, from the early days of humanity's first computer, Multivac, to the eventual demise of the Universe.\n", + "\n", + "Throughout the story, Asimov raises questions about the nature of existence, the relationship between humans and technology, and the potential for reversal of entropy. The character of Zee Prime, who is concerned about the fate of the stars and the eventual end of the Universe, serves as a foil to the more optimistic perspectives of other characters.\n", + "\n", + "The story's climax, in which AC (the advanced computer) learns how to reverse the direction of entropy, is both a satisfying resolution and a thought-provoking commentary on the potential consequences of such a discovery.\n", + "\n", + "Overall, \"The Last Question\" is a masterpiece of science fiction that explores the human condition, the relationship between humans and technology, and the mysteries of the Universe. It is a testament to Asimov's skill as a storyteller and his ability to craft thought-provoking, engaging narratives that continue\n" + ] + } + ], + "source": [ + "# Example\n", + "#print(\"Text:\" + doc_text)\n", + "response = query_llama3(\"Create a short summary of this story:\" + doc_text)\n", + "print(\"\\nLlama 3 Summary:\\n\")\n", + "print(response)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fab43940-d5a7-4d5d-a4fc-208cb865ce79", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}