diff --git a/README.md b/README.md index bdc2db72b0c..ea335a754de 100644 --- a/README.md +++ b/README.md @@ -228,6 +228,26 @@ Made with [`contrib.rocks`](https://contrib.rocks). * [What is the first CPU generation you support with OpenVINO?](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/system-requirements.html) * [Are there any success stories about deploying real-world solutions with OpenVINO?](https://www.intel.com/content/www/us/en/internet-of-things/ai-in-production/success-stories.html) +## ๐Ÿ” Training Data Transparency + +This repository focuses on demonstrating **inference and model optimization workflows** using the OpenVINOโ„ข Toolkit with pre-trained models. + +- The notebooks do **not perform model training, re-training, or fine-tuning**. +- Instead, they leverage pre-trained models and showcase **efficient inference and deployment techniques** across different hardware backends. +- In addition to inference, the notebooks demonstrate **model optimization workflows**, including: + - model conversion (e.g., FP32 โ†’ FP16 / INT8) + - quantization and compression techniques + - performance tuning for CPU, GPU, and other supported devices + +Training data details, dataset composition, and associated biases are defined by the **original model providers** and are not modified within this repository. + +Users are encouraged to: +- Review the original model documentation for dataset sources and training details +- Understand potential biases and limitations of pre-trained models +- Evaluate models appropriately for their specific use cases and deployment environments +- Follow responsible AI and data governance practices when integrating these models into applications + +This section aims to clarify the scope of the repository and improve transparency regarding how models are used within OpenVINO notebooks. --- \* Other names and brands may be claimed as the property of others. diff --git a/notebooks/local-agentic-rag/README.md b/notebooks/local-agentic-rag/README.md new file mode 100644 index 00000000000..394071e4b48 --- /dev/null +++ b/notebooks/local-agentic-rag/README.md @@ -0,0 +1,110 @@ +# ๐Ÿค– Local RAG Pipeline with Ollama and Optional Agentic Workflow + +This notebook demonstrates a **minimal, fully local Retrieval-Augmented Generation (RAG) pipeline** using Ollama, ChromaDB, and an optional agentic workflow with LangGraph. + +The implementation is designed to be **educational, modular, and CPU-friendly**, requiring no cloud APIs after initial setup. + +--- + +## ๐Ÿ“š Overview + +This notebook walks through building a complete local AI pipeline: + +- Local LLM inference using Ollama +- Document embedding and storage with ChromaDB +- Retrieval-Augmented Generation (RAG) +- Optional agentic workflow using LangGraph +- Optional OpenVINOโ„ข integration for optimized inference + +The goal is to provide a **clear and reproducible introduction** to local-first AI systems. + +--- + +## ๐Ÿ” What is RAG? + +**Retrieval-Augmented Generation (RAG)** enhances LLM responses by retrieving relevant context from a knowledge base before generating an answer. + +This helps: +- Reduce hallucinations +- Incorporate domain-specific knowledge +- Improve factual accuracy + +--- + +## ๐Ÿค– Optional Agentic Workflow + +This notebook includes an **optional agentic extension** using LangGraph. + +In this setup, the system can: +- Decide whether retrieval is needed +- Route queries dynamically +- Use simple tools such as a calculator + +> โš ๏ธ This section is optional and intended for learning purposes. +> The core RAG pipeline works independently without the agentic extension. + +--- +--- + +## ๐Ÿ”ง Recent Changes & Fixes + +### Added: Dependency Check (Before Agent Section) + +**Issue:** Running the LangGraph agent section (Section 7) would fail with `NameError: name 'ask_llm' is not defined` if prerequisite cells were not executed first. + +**Solution:** Added an automatic dependency check that: +- โœ… Verifies all required functions are available before building the agent +- โœ… Provides clear error messages if functions are missing +- โœ… Shows exactly which sections to run and in what order + +**Impact:** Users can now run cells in any orderโ€”the dependency check catches missing prerequisites with helpful instructions. + +### Execution Order Requirements + +To run the full pipeline successfully, execute sections in this order: + +1. **Section 1:** Environment Setup (install packages) +2. **Section 1** (Ollama): Configuration & model verification +3. **Section 2:** Basic LLM Inference (`ask_llm` function) +4. **Section 3:** Document Preparation (creates chunks) +5. **Section 4:** ChromaDB Setup (vector store) +6. **Section 5:** Retrieval (`retrieve_documents` function) +7. **Section 6:** RAG Pipeline (`build_rag_prompt` function) +8. **Section 7+:** Agentic workflow (now safe to run) + +> ๐Ÿ’ก The dependency check will remind you if you skip steps! + +--- +## โšก OpenVINOโ„ข Integration + +OpenVINOโ„ข is Intelโ€™s toolkit for optimizing and deploying deep learning models. + +This notebook is designed to be **compatible with OpenVINO optimization workflows**, including: + +- Model conversion (FP32 โ†’ FP16 / INT8) +- Quantization and compression +- CPU, GPU, and NPU performance optimization + +> ๐Ÿ’ก OpenVINO integration is optional. The notebook can run without it. + +--- + +## ๐Ÿ’ป Requirements + +| Component | Requirement | +|----------|-------------| +| Python | 3.9+ | +| RAM | 8 GB minimum, 16 GB recommended | +| Storage | ~5 GB free | +| OS | Windows, Linux, or macOS | + +> โœ… No GPU is required. + +--- + +## ๐Ÿ› ๏ธ Setup Instructions + +### 1. Install Python dependencies + +```bash +pip install ollama chromadb langgraph langchain sentence-transformers jupyter \ No newline at end of file diff --git a/notebooks/local-agentic-rag/local-agentic-rag-ollama-openvino.ipynb b/notebooks/local-agentic-rag/local-agentic-rag-ollama-openvino.ipynb new file mode 100644 index 00000000000..093a04772cf --- /dev/null +++ b/notebooks/local-agentic-rag/local-agentic-rag-ollama-openvino.ipynb @@ -0,0 +1,1272 @@ + "# ๐Ÿค– Local Agentic RAG with Ollama, ChromaDB, and LangGraph\n", + "\n", + "**A fully local, privacy-preserving Retrieval-Augmented Generation pipeline**\n", + "\n", + "[![Colab](https://colab.research.google.com/assets/colab-badge.svg)]()\n", + "\n", + "---\n", + "\n", + "## What You'll Learn\n", + "\n", + "In this notebook, you will build a **minimal end-to-end Agentic RAG pipeline** that runs entirely on your local machine โ€” no internet connection or cloud API required after setup.\n", + "\n", + "### ๐Ÿ” What is RAG?\n", + "\n", + "**Retrieval-Augmented Generation (RAG)** is a technique that enhances a language model's responses by first *retrieving* relevant documents from a knowledge base, then *augmenting* the prompt with that context before generating an answer. This is especially useful when:\n", + "\n", + "- Your LLM doesn't know about domain-specific or up-to-date information\n", + "- You want answers grounded in specific documents (e.g., internal manuals, research papers)\n", + "- You want to reduce hallucinations\n", + "\n", + "### ๐Ÿค– What Makes It \"Agentic\"?\n", + "\n", + "A standard RAG pipeline is a fixed sequence: retrieve โ†’ generate. An **Agentic RAG** pipeline uses a reasoning loop where the model can:\n", + "\n", + "- Decide *whether* retrieval is needed\n", + "- Call tools (e.g., a calculator, search function)\n", + "- Reflect on intermediate results before producing a final answer\n", + "\n", + "We implement this loop using **LangGraph**, a lightweight graph-based orchestration library.\n", + "\n", + "### ๐Ÿ”’ Why Local-First?\n", + "\n", + "Running AI entirely on your own hardware provides:\n", + "\n", + "- **Privacy** โ€” your data never leaves your machine\n", + "- **Offline capability** โ€” works without internet\n", + "- **Cost control** โ€” no per-token API charges\n", + "- **Reproducibility** โ€” same model version every run\n", + "\n", + "### โšก Where Does OpenVINO Fit?\n", + "\n", + "[OpenVINOโ„ข](https://github.com/openvinotoolkit/openvino) is Intel's open-source toolkit for optimizing and deploying deep learning models. It can accelerate LLM inference on Intel CPUs, iGPUs, and NPUs by:\n", + "\n", + "- Quantizing models (e.g., INT4/INT8) to reduce memory usage\n", + "- Compiling computation graphs for hardware-specific optimization\n", + "- Enabling faster token generation on CPU compared to vanilla PyTorch\n", + "\n", + "In this notebook, OpenVINO is shown as an **optional enhancement** โ€” the pipeline runs fine without it, and we show how to plug it in.\n", + "\n", + "---\n", + "\n", + "### ๐Ÿ—บ๏ธ Pipeline Overview\n", + "\n", + "```\n", + "User Query\n", + " โ”‚\n", + " โ–ผ\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ LangGraph Agent Loop โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\n", + "โ”‚ โ”‚ Decide โ”‚โ”€โ”€โ”€โ–ถโ”‚ Retrieve Docs โ”‚ โ”‚\n", + "โ”‚ โ”‚ (LLM) โ”‚ โ”‚ (ChromaDB) โ”‚ โ”‚\n", + "โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\n", + "โ”‚ โ–ฒ โ”‚ โ”‚\n", + "โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\n", + "โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚ Generate Answer โ”‚ โ”‚\n", + "โ”‚ โ”‚ (Ollama LLM) โ”‚ โ”‚\n", + "โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + " โ”‚\n", + " โ–ผ\n", + "Final Answer\n", + "```\n", + "\n", + "---\n", + "\n", + "### ๐Ÿ“‹ Prerequisites\n", + "\n", + "| Requirement | Minimum | Recommended |\n", + "|---|---|---|\n", + "| RAM | 8 GB | 16 GB |\n", + "| CPU | Any x86-64 / ARM64 | Intel Core i5+ |\n", + "| Storage | 5 GB free | 10 GB free |\n", + "| OS | Linux / macOS / Windows | Ubuntu 22.04+ |\n", + "| Python | 3.9+ | 3.11 |\n", + "\n", + "> โœ… **No GPU required.** This notebook is designed to run on CPU only." + ] + }, + { + "cell_type": "markdown", + "id": "setup-header", + "metadata": {}, + "source": [ + "---\n", + "## ๐Ÿ“ฆ Section 1: Environment Setup\n", + "\n", + "We install the required Python packages. These are all lightweight and widely used:\n", + "\n", + "| Package | Purpose |\n", + "|---|---|\n", + "| `ollama` | Python client for local Ollama LLM server |\n", + "| `chromadb` | Local vector database for document embeddings |\n", + "| `langgraph` | Agent loop orchestration |\n", + "| `langchain-community` | Utility helpers (text splitters, etc.) |\n", + "| `sentence-transformers` | Lightweight local embedding model |\n", + "| `tqdm` | Progress bars |\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "install-deps", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Note: you may need to restart the kernel to use updated packages.\n", + "โœ… All packages installed successfully.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " WARNING: The script websockets.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script tqdm.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script isympy.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script dotenv.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script pybase64.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script watchfiles.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script uvicorn.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The scripts torchfrtrace.exe and torchrun.exe are installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script onnxruntime_test.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script markdown-it.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script pyproject-build.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script typer.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The scripts hf.exe and tiny-agents.exe are installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script transformers.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + " WARNING: The script chroma.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", + "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "gtts 2.5.4 requires click<8.2,>=7.1, but you have click 8.3.1 which is incompatible.\n" + ] + } + ], + "source": [ + "# Install required packages\n", + "# This may take 1-2 minutes on first run\n", + "%pip install -q \\\n", + " ollama \\\n", + " chromadb \\\n", + " langgraph \\\n", + " langchain \\\n", + " langchain-community \\\n", + " sentence-transformers \\\n", + " tqdm\n", + "\n", + "print(\"โœ… All packages installed successfully.\")" + ] + }, + { + "cell_type": "markdown", + "id": "ollama-setup-md", + "metadata": {}, + "source": [ + "### ๐Ÿฆ™ Installing and Starting Ollama\n", + "\n", + "**Ollama** is a tool for running large language models locally. It handles model download, quantization, and serves an OpenAI-compatible REST API on `http://localhost:11434`.\n", + "\n", + "**Installation:**\n", + "\n", + "```bash\n", + "# Linux / macOS:\n", + "curl -fsSL https://ollama.com/install.sh | sh\n", + "\n", + "# Windows: Download installer from https://ollama.com/download\n", + "```\n", + "\n", + "**Start the Ollama server** (in a separate terminal or as a background service):\n", + "\n", + "```bash\n", + "ollama serve\n", + "```\n", + "\n", + "**Pull a lightweight model** suitable for CPU inference:\n", + "\n", + "```bash\n", + "# ~2.3 GB โ€” good balance of quality and speed on CPU\n", + "ollama pull qwen2.5:3b\n", + "\n", + "# Smaller alternative (~1.1 GB) if RAM is limited:\n", + "# ollama pull qwen2.5:1.5b\n", + "```\n", + "\n", + "> ๐Ÿ’ก **Why Qwen2.5?** It delivers strong instruction-following performance in small sizes (1.5Bโ€“7B), making it ideal for CPU-only environments. The 3B variant comfortably fits in 8 GB RAM.\n", + "\n", + "After pulling the model, verify Ollama is running by executing the next cell." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "check-ollama", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "๐Ÿฆ™ Ollama is running!\n", + " Available models: ['qwen2.5:3b', 'gemma3:4b', 'mxbai-embed-large:latest', 'gemma3:latest']\n", + "\n", + "โœ… Model 'qwen2.5:3b' is ready to use.\n" + ] + } + ], + "source": [ + "import ollama\n", + "\n", + "# โ”€โ”€ Configuration โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "# Change MODEL_NAME to match whichever model you pulled with `ollama pull`\n", + "MODEL_NAME = \"qwen2.5:3b\" # Recommended: good quality on CPU\n", + "# MODEL_NAME = \"qwen2.5:1.5b\" # Uncomment for lower RAM usage (~1.1 GB)\n", + "# MODEL_NAME = \"llama3.2:3b\" # Alternative: Meta LLaMA 3.2 3B\n", + "# โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "\n", + "# Verify Ollama is running and the model is available\n", + "try:\n", + " available_models = [m.model for m in ollama.list().models]\n", + " print(\"๐Ÿฆ™ Ollama is running!\")\n", + " print(f\" Available models: {available_models}\")\n", + " \n", + " if MODEL_NAME not in available_models:\n", + " print(f\"\\nโš ๏ธ Model '{MODEL_NAME}' not found locally.\")\n", + " print(f\" Please run in a terminal: ollama pull {MODEL_NAME}\")\n", + " else:\n", + " print(f\"\\nโœ… Model '{MODEL_NAME}' is ready to use.\")\n", + " \n", + "except Exception as e:\n", + " print(f\"โŒ Could not connect to Ollama: {e}\")\n", + " print(\" Make sure Ollama is installed and running: `ollama serve`\")" + ] + }, + { + "cell_type": "markdown", + "id": "llm-inference-md", + "metadata": {}, + "source": [ + "---\n", + "## ๐Ÿ’ฌ Section 2: Basic LLM Inference with Ollama\n", + "\n", + "Before building the full pipeline, let's confirm the LLM works with a simple prompt-response test.\n", + "\n", + "We use the `ollama` Python client, which communicates with the locally running Ollama server. The interface mirrors the OpenAI Chat Completions API, so the concepts transfer directly." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "basic-llm", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Sending a test prompt to the LLM...\n", + "\n", + "LLM Response:\n", + "Intel OpenVINO is an open-source platform optimized for real-time inference of deep learning models on resource-constrained devices.\n" + ] + } + ], + "source": [ + "def ask_llm(prompt: str, model: str = MODEL_NAME, system: str = None) -> str:\n", + " \"\"\"\n", + " Send a prompt to the local Ollama LLM and return the response text.\n", + " \n", + " Args:\n", + " prompt: The user message / question.\n", + " model: Ollama model name to use.\n", + " system: Optional system prompt to set the assistant's behaviour.\n", + " \n", + " Returns:\n", + " The model's response as a plain string.\n", + " \"\"\"\n", + " messages = []\n", + " if system:\n", + " messages.append({\"role\": \"system\", \"content\": system})\n", + " messages.append({\"role\": \"user\", \"content\": prompt})\n", + " \n", + " response = ollama.chat(model=model, messages=messages)\n", + " return response[\"message\"][\"content\"]\n", + "\n", + "\n", + "# โ”€โ”€ Simple test โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "print(\"Sending a test prompt to the LLM...\\n\")\n", + "\n", + "test_response = ask_llm(\n", + " prompt=\"In one sentence, what is Intel OpenVINO?\",\n", + " system=\"You are a concise technical assistant.\"\n", + ")\n", + "\n", + "print(f\"LLM Response:\\n{test_response}\")" + ] + }, + { + "cell_type": "markdown", + "id": "documents-md", + "metadata": {}, + "source": [ + "---\n", + "## ๐Ÿ“„ Section 3: Document Preparation\n", + "\n", + "For RAG to work, we need a **knowledge base** โ€” a set of documents the system can search through to answer questions.\n", + "\n", + "In this example, we use a small set of manually written paragraphs about Intel OpenVINO and related AI topics. In a real project, you would replace these with:\n", + "\n", + "- PDF or text files loaded from disk\n", + "- Web pages fetched via scraping\n", + "- Database records\n", + "- Any structured or unstructured text\n", + "\n", + "We also split long documents into smaller **chunks**. This is important because:\n", + "\n", + "1. Embedding models have a maximum input length (typically 256โ€“512 tokens)\n", + "2. Smaller chunks improve retrieval precision โ€” you return only the relevant paragraph, not an entire page\n", + "3. The LLM context window is limited; smaller chunks fit more retrieved results" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "prepare-docs", + "metadata": {}, + "outputs": [ + { + "ename": "ModuleNotFoundError", + "evalue": "No module named 'langchain.text_splitter'", + "output_type": "error", + "traceback": [ + "\u001b[31m---------------------------------------------------------------------------\u001b[39m", + "\u001b[31mModuleNotFoundError\u001b[39m Traceback (most recent call last)", + "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[4]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m langchain.text_splitter \u001b[38;5;28;01mimport\u001b[39;00m RecursiveCharacterTextSplitter\n\u001b[32m 2\u001b[39m \n\u001b[32m 3\u001b[39m \u001b[38;5;66;03m# โ”€โ”€ Sample knowledge base โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\u001b[39;00m\n\u001b[32m 4\u001b[39m \u001b[38;5;66;03m# These short documents form our local knowledge base.\u001b[39;00m\n", + "\u001b[31mModuleNotFoundError\u001b[39m: No module named 'langchain.text_splitter'" + ] + } + ], + "source": [ + "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", + "\n", + "# โ”€โ”€ Sample knowledge base โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "# These short documents form our local knowledge base.\n", + "# Replace or extend with your own content.\n", + "\n", + "RAW_DOCUMENTS = [\n", + " {\n", + " \"id\": \"doc_openvino_overview\",\n", + " \"text\": (\n", + " \"Intel OpenVINO (Open Visual Inference and Neural network Optimization) is an open-source \"\n", + " \"toolkit for optimizing and deploying AI inference. It supports models from frameworks \"\n", + " \"like PyTorch, TensorFlow, and ONNX. OpenVINO converts models into its Intermediate \"\n", + " \"Representation (IR) format for cross-hardware deployment. It targets Intel CPUs, \"\n", + " \"integrated GPUs, and NPUs. Key features include model quantization (INT8/INT4), \"\n", + " \"throughput optimization, and a Python API for easy integration.\"\n", + " ),\n", + " \"source\": \"openvino_overview\",\n", + " },\n", + " {\n", + " \"id\": \"doc_rag_explanation\",\n", + " \"text\": (\n", + " \"Retrieval-Augmented Generation (RAG) is a technique that improves LLM responses by \"\n", + " \"fetching relevant documents from an external knowledge base before generating an answer. \"\n", + " \"The workflow has two stages: retrieval, where a query is converted to an embedding and \"\n", + " \"matched against stored document embeddings in a vector database; and generation, where \"\n", + " \"the retrieved documents are concatenated with the original query as context for the LLM. \"\n", + " \"RAG reduces hallucinations and allows the model to answer questions about private or \"\n", + " \"domain-specific data without fine-tuning.\"\n", + " ),\n", + " \"source\": \"rag_explanation\",\n", + " },\n", + " {\n", + " \"id\": \"doc_chromadb\",\n", + " \"text\": (\n", + " \"ChromaDB is a lightweight, open-source vector database designed for AI applications. \"\n", + " \"It stores document embeddings (dense numerical vectors) and supports fast similarity \"\n", + " \"search using cosine or L2 distance. ChromaDB can run fully in-memory for prototyping \"\n", + " \"or persist data to disk for production use. It integrates natively with popular \"\n", + " \"embedding models from HuggingFace and OpenAI, and requires no separate database server.\"\n", + " ),\n", + " \"source\": \"chromadb_overview\",\n", + " },\n", + " {\n", + " \"id\": \"doc_langgraph\",\n", + " \"text\": (\n", + " \"LangGraph is a library for building stateful, multi-step agent workflows using a \"\n", + " \"graph-based computation model. Nodes represent individual actions (e.g., call LLM, \"\n", + " \"retrieve documents, execute tool), and edges define the control flow between them. \"\n", + " \"LangGraph supports conditional routing, loops, and human-in-the-loop checkpoints. \"\n", + " \"It is part of the LangChain ecosystem and can be used with any LLM provider, \"\n", + " \"including locally running models via Ollama.\"\n", + " ),\n", + " \"source\": \"langgraph_overview\",\n", + " },\n", + " {\n", + " \"id\": \"doc_ollama\",\n", + " \"text\": (\n", + " \"Ollama is a tool for running large language models locally on your machine. It \"\n", + " \"provides a simple CLI and REST API for downloading and serving quantized models \"\n", + " \"in GGUF format. Supported models include Llama 3, Qwen2.5, Mistral, Phi-3, and \"\n", + " \"many others. Ollama handles CPU and GPU inference automatically, making local LLM \"\n", + " \"deployment accessible without complex setup. Its API is compatible with the \"\n", + " \"OpenAI Chat Completions specification.\"\n", + " ),\n", + " \"source\": \"ollama_overview\",\n", + " },\n", + "]\n", + "\n", + "print(f\"๐Ÿ“š Knowledge base: {len(RAW_DOCUMENTS)} documents loaded.\")\n", + "\n", + "# โ”€โ”€ Text chunking โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "# For longer documents you would chunk them; our samples are already short.\n", + "# We show the splitter setup for completeness โ€” it's a no-op on small texts.\n", + "\n", + "splitter = RecursiveCharacterTextSplitter(\n", + " chunk_size=400, # characters per chunk\n", + " chunk_overlap=60, # overlap to preserve context across chunk boundaries\n", + " separators=[\"\\n\\n\", \"\\n\", \". \", \" \", \"\"],\n", + ")\n", + "\n", + "chunks = []\n", + "for doc in RAW_DOCUMENTS:\n", + " for i, chunk_text in enumerate(splitter.split_text(doc[\"text\"])):\n", + " chunks.append({\n", + " \"id\": f\"{doc['id']}_chunk{i}\",\n", + " \"text\": chunk_text,\n", + " \"source\": doc[\"source\"],\n", + " })\n", + "\n", + "print(f\"โœ‚๏ธ After chunking: {len(chunks)} text chunks ready for embedding.\")\n", + "print(f\"\\n๐Ÿ“ Example chunk:\\n {chunks[0]['text'][:200]}...\")" + ] + }, + { + "cell_type": "markdown", + "id": "embedding-md", + "metadata": {}, + "source": [ + "---\n", + "## ๐Ÿงฎ Section 4: Embeddings and Vector Storage (ChromaDB)\n", + "\n", + "**Embeddings** are dense numerical representations of text that capture semantic meaning. Texts with similar meanings have embeddings that are close together in vector space.\n", + "\n", + "We use **`sentence-transformers/all-MiniLM-L6-v2`** โ€” a small but effective embedding model:\n", + "- **Size:** ~22 MB (very lightweight)\n", + "- **Embedding dimension:** 384\n", + "- **Runs entirely on CPU** without any configuration\n", + "\n", + "All embeddings are stored in a **ChromaDB** collection, which provides:\n", + "- Persistent local storage (no server needed)\n", + "- Fast approximate nearest-neighbour search\n", + "- Metadata filtering" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "setup-embeddings", + "metadata": {}, + "outputs": [], + "source": [ + "import chromadb\n", + "from chromadb.utils import embedding_functions\n", + "from tqdm import tqdm\n", + "\n", + "# โ”€โ”€ Embedding model โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "# Using SentenceTransformers via ChromaDB's built-in embedding function.\n", + "# The model is downloaded once and cached in ~/.cache/huggingface/\n", + "\n", + "EMBEDDING_MODEL = \"all-MiniLM-L6-v2\" # ~22 MB, excellent for CPU\n", + "\n", + "print(f\"Loading embedding model: {EMBEDDING_MODEL}\")\n", + "print(\"(First run will download ~22 MB โ€” subsequent runs use the cache)\\n\")\n", + "\n", + "embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(\n", + " model_name=EMBEDDING_MODEL\n", + ")\n", + "\n", + "# โ”€โ”€ ChromaDB setup โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "# PersistentClient stores the database on disk at ./chroma_db/\n", + "# Use chromadb.Client() for an in-memory-only version.\n", + "\n", + "DB_PATH = \"./chroma_db\" # Local directory for vector store persistence\n", + "\n", + "chroma_client = chromadb.PersistentClient(path=DB_PATH)\n", + "\n", + "# Create (or load existing) collection\n", + "# get_or_create_collection avoids errors if re-running the notebook\n", + "collection = chroma_client.get_or_create_collection(\n", + " name=\"openvino_rag_demo\",\n", + " embedding_function=embedding_fn,\n", + " metadata={\"hnsw:space\": \"cosine\"}, # Use cosine similarity\n", + ")\n", + "\n", + "print(f\"โœ… ChromaDB collection ready at: {DB_PATH}\")\n", + "print(f\" Collection name: 'openvino_rag_demo'\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "index-docs", + "metadata": {}, + "outputs": [], + "source": [ + "# โ”€โ”€ Index documents โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "# Check how many documents are already in the collection to avoid duplicates.\n", + "\n", + "existing_count = collection.count()\n", + "\n", + "if existing_count >= len(chunks):\n", + " print(f\"โ„น๏ธ Collection already contains {existing_count} documents. Skipping indexing.\")\n", + " print(\" Delete './chroma_db/' and re-run to re-index.\")\n", + "else:\n", + " print(f\"Indexing {len(chunks)} chunks into ChromaDB...\")\n", + " \n", + " # Add all chunks in a single batch call for efficiency\n", + " collection.add(\n", + " ids = [c[\"id\"] for c in chunks],\n", + " documents = [c[\"text\"] for c in chunks],\n", + " metadatas = [{\"source\": c[\"source\"]} for c in chunks],\n", + " )\n", + " \n", + " print(f\"\\nโœ… Indexed {collection.count()} chunks successfully.\")\n", + "\n", + "print(f\"\\n๐Ÿ“Š Vector store summary:\")\n", + "print(f\" Total documents in collection: {collection.count()}\")" + ] + }, + { + "cell_type": "markdown", + "id": "retrieval-md", + "metadata": {}, + "source": [ + "---\n", + "## ๐Ÿ” Section 5: The Retrieval Step\n", + "\n", + "Retrieval works by:\n", + "\n", + "1. Converting the user's query into an embedding using the *same* embedding model used during indexing\n", + "2. Finding the `k` most similar document embeddings in ChromaDB using cosine similarity\n", + "3. Returning those documents as context\n", + "\n", + "The key insight: **semantic similarity, not keyword matching**. A query about \"fast inference\" will retrieve documents about \"optimized deployment\" even if the exact words differ." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "retrieval-fn", + "metadata": {}, + "outputs": [], + "source": [ + "def retrieve_documents(query: str, k: int = 3) -> list[dict]:\n", + " \"\"\"\n", + " Retrieve the top-k most relevant document chunks for a given query.\n", + " \n", + " Args:\n", + " query: The user's search question.\n", + " k: Number of documents to retrieve.\n", + " \n", + " Returns:\n", + " List of dicts with keys: 'id', 'text', 'source', 'distance'\n", + " \"\"\"\n", + " results = collection.query(\n", + " query_texts=[query],\n", + " n_results=k,\n", + " include=[\"documents\", \"metadatas\", \"distances\"],\n", + " )\n", + " \n", + " retrieved = []\n", + " for i in range(len(results[\"ids\"][0])):\n", + " retrieved.append({\n", + " \"id\": results[\"ids\"][0][i],\n", + " \"text\": results[\"documents\"][0][i],\n", + " \"source\": results[\"metadatas\"][0][i][\"source\"],\n", + " \"distance\": results[\"distances\"][0][i], # Lower = more similar\n", + " })\n", + " \n", + " return retrieved\n", + "\n", + "\n", + "# โ”€โ”€ Test retrieval โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "test_query = \"How does OpenVINO speed up AI inference?\"\n", + "print(f\"๐Ÿ” Test query: \\\"{test_query}\\\"\\n\")\n", + "\n", + "retrieved = retrieve_documents(test_query, k=2)\n", + "\n", + "for i, doc in enumerate(retrieved, 1):\n", + " print(f\"--- Result {i} (source: {doc['source']}, distance: {doc['distance']:.4f}) ---\")\n", + " print(f\"{doc['text']}\\n\")" + ] + }, + { + "cell_type": "markdown", + "id": "rag-pipeline-md", + "metadata": {}, + "source": [ + "---\n", + "## โš™๏ธ Section 6: The RAG Pipeline\n", + "\n", + "Now we connect retrieval with generation. The pattern is:\n", + "\n", + "1. **Retrieve** relevant chunks for the user's question\n", + "2. **Format** a prompt that includes the retrieved context\n", + "3. **Generate** a response using the local LLM\n", + "\n", + "A good system prompt instructs the LLM to:\n", + "- Answer only from the provided context\n", + "- Admit when it doesn't know (avoiding hallucination)\n", + "- Be concise" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "rag-pipeline", + "metadata": {}, + "outputs": [], + "source": [ + "def build_rag_prompt(query: str, context_docs: list[dict]) -> str:\n", + " \"\"\"\n", + " Assemble the RAG prompt by combining retrieved context with the user query.\n", + " \n", + " Args:\n", + " query: The user's original question.\n", + " context_docs: List of retrieved document dicts (from retrieve_documents).\n", + " \n", + " Returns:\n", + " A formatted prompt string ready to send to the LLM.\n", + " \"\"\"\n", + " context_str = \"\\n\\n\".join(\n", + " f\"[Source: {doc['source']}]\\n{doc['text']}\"\n", + " for doc in context_docs\n", + " )\n", + " \n", + " prompt = (\n", + " f\"You are a helpful assistant. Answer the question using ONLY the context \"\n", + " f\"provided below. If the context does not contain enough information to answer, \"\n", + " f\"say \\\"I don't have enough information to answer this question.\\\"\\n\\n\"\n", + " f\"Context:\\n{context_str}\\n\\n\"\n", + " f\"Question: {query}\\n\\n\"\n", + " f\"Answer:\"\n", + " )\n", + " return prompt\n", + "\n", + "\n", + "def rag_query(query: str, k: int = 3) -> dict:\n", + " \"\"\"\n", + " Full RAG pipeline: retrieve โ†’ build prompt โ†’ generate.\n", + " \n", + " Args:\n", + " query: The user's question.\n", + " k: Number of documents to retrieve.\n", + " \n", + " Returns:\n", + " Dict with 'query', 'retrieved_docs', and 'answer'.\n", + " \"\"\"\n", + " # Step 1: Retrieve\n", + " docs = retrieve_documents(query, k=k)\n", + " \n", + " # Step 2: Build prompt\n", + " prompt = build_rag_prompt(query, docs)\n", + " \n", + " # Step 3: Generate\n", + " answer = ask_llm(prompt)\n", + " \n", + " return {\"query\": query, \"retrieved_docs\": docs, \"answer\": answer}\n", + "\n", + "\n", + "# โ”€โ”€ Run a test RAG query โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "print(\"Running RAG pipeline...\\n\")\n", + "\n", + "result = rag_query(\"What is ChromaDB and how is it used in AI applications?\")\n", + "\n", + "print(f\"โ“ Question: {result['query']}\\n\")\n", + "print(f\"๐Ÿ“„ Retrieved {len(result['retrieved_docs'])} documents:\")\n", + "for doc in result['retrieved_docs']:\n", + " print(f\" โ€ข {doc['source']} (similarity distance: {doc['distance']:.4f})\")\n", + "print(f\"\\n๐Ÿ’ฌ Answer:\\n{result['answer']}\")" + ] + }, + { + "cell_type": "markdown", + "id": "agentic-loop-md", + "metadata": {}, + "source": [ + "---\n", + "## ๐Ÿ”„ Section 7: Minimal Agentic Loop with LangGraph\n", + "\n", + "So far, our pipeline is a **static sequence**: always retrieve, always generate. An **agentic** pipeline adds reasoning and decision-making.\n", + "\n", + "### How LangGraph Works\n", + "\n", + "LangGraph models a workflow as a **state machine**:\n", + "\n", + "- **State** โ€” a typed dictionary that flows between nodes\n", + "- **Nodes** โ€” Python functions that read/write the state\n", + "- **Edges** โ€” connections between nodes (can be conditional)\n", + "\n", + "### Our Agent Graph\n", + "\n", + "```\n", + "START\n", + " โ”‚\n", + " โ–ผ\n", + "[classify_query] โ† LLM decides: needs retrieval or direct answer?\n", + " โ”‚ โ”‚\n", + " โ”‚ (rag) โ”‚ (direct)\n", + " โ–ผ โ–ผ\n", + "[retrieve] [generate_direct]\n", + " โ”‚ โ”‚\n", + " โ–ผ โ”‚\n", + "[generate_rag] โ”‚\n", + " โ”‚ โ”‚\n", + " โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + " โ–ผ\n", + " END\n", + "```\n", + "\n", + "The agent first classifies the query: if it's a factual question that benefits from document lookup, it uses RAG; otherwise it answers directly. This avoids unnecessary retrieval for simple conversational or computational queries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "langgraph-state", + "metadata": {}, + "outputs": [], + "source": [ + "from typing import TypedDict, Annotated, Literal\n", + "from langgraph.graph import StateGraph, START, END\n", + "\n", + "# โ”€โ”€ Agent State โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "# Every node receives and returns a subset of this state dict.\n", + "# Using TypedDict gives us type hints and IDE support.\n", + "\n", + "class AgentState(TypedDict):\n", + " query: str # Original user question\n", + " route: str # 'rag' or 'direct'\n", + " retrieved_docs: list[dict] # Documents from ChromaDB\n", + " answer: str # Final answer\n", + " tool_result: str # Optional: result of a tool call\n", + "\n", + "\n", + "print(\"โœ… AgentState defined.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "langgraph-nodes", + "metadata": {}, + "outputs": [], + "source": [ + "# โ”€โ”€ Node 1: Query Classifier โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "# Asks the LLM to decide if this query needs document retrieval.\n", + "\n", + "def classify_query(state: AgentState) -> AgentState:\n", + " \"\"\"\n", + " Route the query:\n", + " - 'rag' โ†’ requires searching the knowledge base\n", + " - 'direct' โ†’ can be answered without retrieval (math, simple facts, etc.)\n", + " \"\"\"\n", + " decision_prompt = (\n", + " f\"You are a routing assistant. Given the user query below, decide if it \"\n", + " f\"requires searching a knowledge base about AI tools (OpenVINO, ChromaDB, \"\n", + " f\"Ollama, LangGraph, RAG) to answer correctly, or if it can be answered \"\n", + " f\"directly.\\n\\n\"\n", + " f\"Reply with ONLY one word: 'rag' or 'direct'.\\n\\n\"\n", + " f\"Query: {state['query']}\"\n", + " )\n", + " \n", + " raw = ask_llm(decision_prompt).strip().lower()\n", + " \n", + " # Normalize the output โ€” the LLM might add punctuation\n", + " route = \"rag\" if \"rag\" in raw else \"direct\"\n", + " \n", + " print(f\" ๐Ÿ—บ๏ธ Classifier decision: '{route}' (raw: '{raw}')\")\n", + " return {**state, \"route\": route}\n", + "\n", + "\n", + "# โ”€โ”€ Node 2: Document Retrieval โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "\n", + "def retrieve_node(state: AgentState) -> AgentState:\n", + " \"\"\"Retrieve relevant documents from ChromaDB for the query.\"\"\"\n", + " docs = retrieve_documents(state[\"query\"], k=3)\n", + " print(f\" ๐Ÿ“š Retrieved {len(docs)} documents.\")\n", + " return {**state, \"retrieved_docs\": docs}\n", + "\n", + "\n", + "# โ”€โ”€ Node 3a: RAG Generation โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "\n", + "def generate_rag_node(state: AgentState) -> AgentState:\n", + " \"\"\"Generate an answer grounded in the retrieved documents.\"\"\"\n", + " prompt = build_rag_prompt(state[\"query\"], state[\"retrieved_docs\"])\n", + " answer = ask_llm(prompt)\n", + " return {**state, \"answer\": answer}\n", + "\n", + "\n", + "# โ”€โ”€ Node 3b: Direct Generation โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "\n", + "def generate_direct_node(state: AgentState) -> AgentState:\n", + " \"\"\"Generate an answer directly, without retrieval.\"\"\"\n", + " answer = ask_llm(\n", + " prompt=state[\"query\"],\n", + " system=\"You are a helpful, concise assistant.\"\n", + " )\n", + " return {**state, \"answer\": answer, \"retrieved_docs\": []}\n", + "\n", + "\n", + "# โ”€โ”€ Conditional edge: decides which generation node to call โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "\n", + "def route_decision(state: AgentState) -> Literal[\"retrieve\", \"generate_direct\"]:\n", + " \"\"\"Edge function: returns the name of the next node based on 'route'.\"\"\"\n", + " if state[\"route\"] == \"rag\":\n", + " return \"retrieve\"\n", + " return \"generate_direct\"\n", + "\n", + "\n", + "print(\"โœ… All graph nodes defined.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "langgraph-build", + "metadata": {}, + "outputs": [], + "source": [ + "# โ”€โ”€ Build the LangGraph state machine โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "\n", + "builder = StateGraph(AgentState)\n", + "\n", + "# Add nodes\n", + "builder.add_node(\"classify\", classify_query)\n", + "builder.add_node(\"retrieve\", retrieve_node)\n", + "builder.add_node(\"generate_rag\", generate_rag_node)\n", + "builder.add_node(\"generate_direct\", generate_direct_node)\n", + "\n", + "# Entry point\n", + "builder.add_edge(START, \"classify\")\n", + "\n", + "# Conditional routing after classification\n", + "builder.add_conditional_edges(\n", + " \"classify\",\n", + " route_decision,\n", + " {\n", + " \"retrieve\": \"retrieve\",\n", + " \"generate_direct\": \"generate_direct\",\n", + " }\n", + ")\n", + "\n", + "# After retrieval, always go to RAG generation\n", + "builder.add_edge(\"retrieve\", \"generate_rag\")\n", + "\n", + "# Both generation nodes lead to END\n", + "builder.add_edge(\"generate_rag\", END)\n", + "builder.add_edge(\"generate_direct\", END)\n", + "\n", + "# Compile the graph into a runnable\n", + "agent = builder.compile()\n", + "\n", + "print(\"โœ… LangGraph agent compiled successfully.\")\n", + "print(\"\\nGraph structure:\")\n", + "print(\" START โ†’ classify โ†’ [retrieve โ†’ generate_rag | generate_direct] โ†’ END\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "run-agent", + "metadata": {}, + "outputs": [], + "source": [ + "def run_agent(query: str) -> str:\n", + " \"\"\"\n", + " Run the agentic RAG pipeline for a given query.\n", + " \n", + " Args:\n", + " query: Natural language question.\n", + " \n", + " Returns:\n", + " The agent's final answer as a string.\n", + " \"\"\"\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\"โ“ Query: {query}\")\n", + " print(f\"{'='*60}\")\n", + " \n", + " # Initialize state with defaults\n", + " initial_state: AgentState = {\n", + " \"query\": query,\n", + " \"route\": \"\",\n", + " \"retrieved_docs\": [],\n", + " \"answer\": \"\",\n", + " \"tool_result\": \"\",\n", + " }\n", + " \n", + " # Run the graph\n", + " final_state = agent.invoke(initial_state)\n", + " \n", + " print(f\"\\n๐Ÿ’ฌ Answer:\\n{final_state['answer']}\")\n", + " \n", + " if final_state[\"retrieved_docs\"]:\n", + " sources = list({d[\"source\"] for d in final_state[\"retrieved_docs\"]})\n", + " print(f\"\\n๐Ÿ“– Sources consulted: {', '.join(sources)}\")\n", + " \n", + " return final_state[\"answer\"]\n", + "\n", + "\n", + "# โ”€โ”€ Test 1: Knowledge-base question (should use RAG) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "_ = run_agent(\"How does LangGraph help build AI agents?\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "run-agent-2", + "metadata": {}, + "outputs": [], + "source": [ + "# โ”€โ”€ Test 2: Direct question (should skip retrieval) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "_ = run_agent(\"What is 25 multiplied by 4?\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "run-agent-3", + "metadata": {}, + "outputs": [], + "source": [ + "# โ”€โ”€ Test 3: Your own question โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "# Modify the query below to test with your own questions.\n", + "\n", + "your_query = \"What models does Ollama support and how does it compare to cloud APIs?\"\n", + "_ = run_agent(your_query)" + ] + }, + { + "cell_type": "markdown", + "id": "tool-use-md", + "metadata": {}, + "source": [ + "---\n", + "## ๐Ÿ› ๏ธ Section 8: Optional โ€” Adding a Simple Tool\n", + "\n", + "One of the key features of an agentic pipeline is **tool use**: the ability to call external functions for tasks the LLM can't do reliably on its own (e.g., arithmetic, live search, code execution).\n", + "\n", + "Here we add a minimal **calculator tool** as an example. The same pattern applies to any function: web search, database lookup, API calls, etc.\n", + "\n", + "> **This section is self-contained and optional.** The core pipeline works without it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "tool-use", + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "import math\n", + "\n", + "# โ”€โ”€ Define a simple calculator tool โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "\n", + "def calculator_tool(expression: str) -> str:\n", + " \"\"\"\n", + " Safely evaluate a mathematical expression.\n", + " Supports: +, -, *, /, **, sqrt(), sin(), cos(), pi, e\n", + " \n", + " Args:\n", + " expression: A mathematical expression string.\n", + " \n", + " Returns:\n", + " String representation of the result, or an error message.\n", + " \"\"\"\n", + " # Whitelist safe names to prevent code injection\n", + " safe_names = {\n", + " \"sqrt\": math.sqrt, \"sin\": math.sin, \"cos\": math.cos,\n", + " \"tan\": math.tan, \"log\": math.log, \"pi\": math.pi,\n", + " \"e\": math.e, \"abs\": abs, \"round\": round,\n", + " }\n", + " try:\n", + " # Only allow digits, operators, parentheses, dots, and safe function names\n", + " cleaned = re.sub(r\"[^0-9+\\-*/().^ a-zA-Z]\", \"\", expression)\n", + " result = eval(cleaned, {\"__builtins__\": {}}, safe_names) # noqa: S307\n", + " return str(result)\n", + " except Exception as exc:\n", + " return f\"Error: {exc}\"\n", + "\n", + "\n", + "# โ”€โ”€ Tool-augmented agent node โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "\n", + "def tool_agent_query(query: str) -> str:\n", + " \"\"\"\n", + " A simple tool-calling agent:\n", + " 1. Ask the LLM if this needs a calculator\n", + " 2. If yes, extract the expression and compute it\n", + " 3. Inject the result back into a final prompt\n", + " \"\"\"\n", + " # Step 1: Detect if calculation is needed\n", + " detection_prompt = (\n", + " f\"Does the following query require a mathematical calculation? \"\n", + " f\"Reply with ONLY 'yes' or 'no'.\\n\\nQuery: {query}\"\n", + " )\n", + " needs_calc = \"yes\" in ask_llm(detection_prompt).lower()\n", + " \n", + " tool_context = \"\"\n", + " \n", + " if needs_calc:\n", + " # Step 2: Extract the math expression\n", + " extract_prompt = (\n", + " f\"Extract ONLY the mathematical expression from this query as plain text. \"\n", + " f\"Do not explain. Examples: '25 * 4', 'sqrt(144)', '2**10'\\n\\nQuery: {query}\"\n", + " )\n", + " expression = ask_llm(extract_prompt).strip()\n", + " calc_result = calculator_tool(expression)\n", + " tool_context = f\"Calculator result for '{expression}': {calc_result}\\n\\n\"\n", + " print(f\" ๐Ÿ”ง Tool called: calculator('{expression}') โ†’ {calc_result}\")\n", + " \n", + " # Step 3: Generate final answer using tool result if available\n", + " final_prompt = (\n", + " f\"{tool_context}\"\n", + " f\"Answer the following query concisely.\\n\\nQuery: {query}\"\n", + " )\n", + " return ask_llm(final_prompt)\n", + "\n", + "\n", + "# โ”€โ”€ Test the tool-augmented agent โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "print(\"Testing tool-augmented agent:\\n\")\n", + "\n", + "queries = [\n", + " \"What is the square root of 1764?\",\n", + " \"If I have 256 tokens at 0.002 dollars each, what is the total cost?\",\n", + "]\n", + "\n", + "for q in queries:\n", + " print(f\"\\nโ“ {q}\")\n", + " answer = tool_agent_query(q)\n", + " print(f\"๐Ÿ’ฌ {answer}\")" + ] + }, + { + "cell_type": "markdown", + "id": "openvino-md", + "metadata": {}, + "source": [ + "---\n", + "## โšก Section 9: Optional โ€” OpenVINO Integration\n", + "\n", + "> **This section is informational and optional.** The notebook runs fully without OpenVINO installed. OpenVINO is an enhancement, not a requirement.\n", + "\n", + "### Why Use OpenVINO for Local LLM Inference?\n", + "\n", + "When running LLMs on Intel hardware (CPU, iGPU, NPU), OpenVINO can provide significant speedups through:\n", + "\n", + "| Optimization | Description | Typical Benefit |\n", + "|---|---|---|\n", + "| **INT4 Quantization** | Reduce model weight precision 16-bit โ†’ 4-bit | 2โ€“4ร— memory reduction |\n", + "| **INT8 Quantization** | Quantize activations during inference | 1.5โ€“2ร— speedup |\n", + "| **KV-Cache Optimization** | Efficient attention cache memory layout | Faster long-context generation |\n", + "| **Graph Compilation** | Hardware-specific kernel fusion | Lower latency per token |\n", + "\n", + "### Integration Approaches\n", + "\n", + "**Option A: OpenVINO Model Server (OVMS)** โ€” Drop-in replacement for Ollama/OpenAI API\n", + "\n", + "```bash\n", + "# Convert and serve a Hugging Face model with INT4 quantization\n", + "pip install optimum[openvino]\n", + "\n", + "optimum-cli export openvino \\\n", + " --model Qwen/Qwen2.5-3B-Instruct \\\n", + " --weight-format int4 \\\n", + " --output ./qwen2.5-3b-int4-ov\n", + "```\n", + "\n", + "**Option B: `openvino-genai` Python API** โ€” Direct inference without Ollama\n", + "\n", + "```python\n", + "# pip install openvino-genai\n", + "import openvino_genai as ov_genai\n", + "\n", + "pipe = ov_genai.LLMPipeline(\"./qwen2.5-3b-int4-ov\", device=\"CPU\")\n", + "result = pipe.generate(\"What is OpenVINO?\", max_new_tokens=200)\n", + "print(result)\n", + "```\n", + "\n", + "**Option C: LangChain + OpenVINO** โ€” Plug into the existing RAG pipeline\n", + "\n", + "```python\n", + "# pip install langchain-community openvino\n", + "from langchain_community.llms import HuggingFacePipeline\n", + "from optimum.intel import OVModelForCausalLM\n", + "from transformers import AutoTokenizer, pipeline\n", + "\n", + "model_id = \"./qwen2.5-3b-int4-ov\"\n", + "tokenizer = AutoTokenizer.from_pretrained(model_id)\n", + "ov_model = OVModelForCausalLM.from_pretrained(model_id)\n", + "\n", + "ov_pipeline = pipeline(\"text-generation\", model=ov_model, tokenizer=tokenizer)\n", + "llm = HuggingFacePipeline(pipeline=ov_pipeline)\n", + "\n", + "# Drop-in replacement: use `llm` anywhere ask_llm() is called\n", + "```\n", + "\n", + "### Hardware Support Matrix\n", + "\n", + "| Hardware | OpenVINO Device | Notes |\n", + "|---|---|---|\n", + "| Intel CPU (Core / Xeon) | `CPU` | Fully supported, recommended for CPU-only |\n", + "| Intel iGPU (Iris Xe, Arc) | `GPU` | Requires OpenCL drivers |\n", + "| Intel NPU (Meteor Lake+) | `NPU` | Best for sustained generation tasks |\n", + "| ARM64 (e.g., Oracle A1) | `CPU` | Supported but no hardware-specific tuning |\n", + "\n", + "### Checking OpenVINO Availability" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "openvino-check", + "metadata": {}, + "outputs": [], + "source": [ + "# โ”€โ”€ Optional: Check if OpenVINO is available โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€\n", + "# This cell gracefully handles the case where OpenVINO is not installed.\n", + "\n", + "try:\n", + " import openvino as ov\n", + " \n", + " core = ov.Core()\n", + " available_devices = core.available_devices\n", + " \n", + " print(\"โœ… OpenVINO is installed!\")\n", + " print(f\" Version: {ov.__version__}\")\n", + " print(f\" Available devices: {available_devices}\")\n", + " print()\n", + " \n", + " for device in available_devices:\n", + " try:\n", + " full_name = core.get_property(device, \"FULL_DEVICE_NAME\")\n", + " print(f\" {device}: {full_name}\")\n", + " except Exception:\n", + " print(f\" {device}: (details unavailable)\")\n", + " \n", + " print()\n", + " print(\"๐Ÿ’ก To use OpenVINO for LLM inference, see the integration examples above.\")\n", + " print(\" Recommended: openvino-genai with INT4-quantized Qwen2.5-3B\")\n", + "\n", + "except ImportError:\n", + " print(\"โ„น๏ธ OpenVINO is not installed โ€” the pipeline above works without it.\")\n", + " print()\n", + " print(\" To install OpenVINO for accelerated Intel CPU/GPU inference:\")\n", + " print(\" pip install openvino openvino-genai optimum[openvino]\")\n", + " print()\n", + " print(\" See: https://docs.openvino.ai/latest/get_started.html\")" + ] + }, + { + "cell_type": "markdown", + "id": "conclusion-md", + "metadata": {}, + "source": [ + "---\n", + "## ๐ŸŽ‰ Section 10: Conclusion\n", + "\n", + "Congratulations! You have built a complete **local Agentic RAG pipeline** from scratch. Here's what each component contributed:\n", + "\n", + "| Component | Role | Key Benefit |\n", + "|---|---|---|\n", + "| **Ollama** | Local LLM inference server | Privacy, offline, no API costs |\n", + "| **ChromaDB** | Vector database | Fast semantic document retrieval |\n", + "| **`all-MiniLM-L6-v2`** | Embedding model | Lightweight, CPU-friendly |\n", + "| **LangGraph** | Agent orchestration | Flexible, stateful, loop-capable |\n", + "| **OpenVINO** *(optional)* | Inference optimization | Faster tokens on Intel hardware |\n", + "\n", + "### ๐Ÿงฉ Pipeline Summary\n", + "\n", + "```\n", + "User Query\n", + " โ†“\n", + "Classify: needs retrieval?\n", + " โ”œโ”€โ”€ YES โ†’ ChromaDB similarity search โ†’ RAG prompt โ†’ Ollama LLM\n", + " โ””โ”€โ”€ NO โ†’ Direct prompt โ†’ Ollama LLM\n", + " โ†“\n", + "Answer (+ optional tool results)\n", + "```\n", + "\n", + "### ๐Ÿš€ Suggested Extensions\n", + "\n", + "Here are practical ways to extend this notebook into a production-grade system:\n", + "\n", + "1. **Load real documents** โ€” Use `langchain.document_loaders` to ingest PDFs, web pages, or entire directories\n", + "\n", + "2. **Add conversation memory** โ€” Store chat history in LangGraph state to support multi-turn dialogue\n", + "\n", + "3. **Upgrade the embedding model** โ€” Try `BAAI/bge-small-en-v1.5` or `nomic-ai/nomic-embed-text-v1.5` for better retrieval quality\n", + "\n", + "4. **Add more tools** โ€” Web search (via DuckDuckGo API), code execution, calendar lookup\n", + "\n", + "5. **Plug in OpenVINO** โ€” Follow Section 9 to convert your Ollama model to OpenVINO IR format for faster CPU inference\n", + "\n", + "6. **Add a Gradio UI** โ€” Wrap the `run_agent()` function in a simple web interface with `gr.ChatInterface`\n", + "\n", + "7. **Evaluate retrieval quality** โ€” Use `ragas` library to measure faithfulness, answer relevancy, and context precision\n", + "\n", + "### ๐Ÿ“š Further Reading\n", + "\n", + "- [OpenVINO Documentation](https://docs.openvino.ai)\n", + "- [OpenVINO Notebooks Repository](https://github.com/openvinotoolkit/openvino_notebooks)\n", + "- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)\n", + "- [ChromaDB Documentation](https://docs.trychroma.com)\n", + "- [Ollama Model Library](https://ollama.com/library)\n", + "- [Qwen2.5 Model Card](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)\n", + "\n", + "---\n", + "*This notebook was designed to follow [OpenVINO Notebooks](https://github.com/openvinotoolkit/openvino_notebooks) contribution standards: CPU-first, beginner-friendly, and fully reproducible on consumer hardware.*" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.14.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}