diff --git a/README.md b/README.md
index bdc2db72b0c..ea335a754de 100644
--- a/README.md
+++ b/README.md
@@ -228,6 +228,26 @@ Made with [`contrib.rocks`](https://contrib.rocks).
 * [What is the first CPU generation you support with OpenVINO?](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/system-requirements.html)
 * [Are there any success stories about deploying real-world solutions with OpenVINO?](https://www.intel.com/content/www/us/en/internet-of-things/ai-in-production/success-stories.html)
 
+## 🔍 Training Data Transparency
+
+This repository focuses on demonstrating **inference and model optimization workflows** using the OpenVINO™ Toolkit with pre-trained models.
+
+- The notebooks do **not perform model training, re-training, or fine-tuning**.
+- Instead, they leverage pre-trained models and showcase **efficient inference and deployment techniques** across different hardware backends.
+- In addition to inference, the notebooks demonstrate **model optimization workflows**, including:
+  - model conversion (e.g., FP32 → FP16 / INT8)
+  - quantization and compression techniques
+  - performance tuning for CPU, GPU, and other supported devices
+
+Training data details, dataset composition, and associated biases are defined by the **original model providers** and are not modified within this repository.
+
+Users are encouraged to:
+- Review the original model documentation for dataset sources and training details
+- Understand potential biases and limitations of pre-trained models
+- Evaluate models appropriately for their specific use cases and deployment environments
+- Follow responsible AI and data governance practices when integrating these models into applications
+
+This section aims to clarify the scope of the repository and improve transparency regarding how models are used within OpenVINO notebooks.
 ---
 
 \* Other names and brands may be claimed as the property of others.
diff --git a/notebooks/local-agentic-rag/README.md b/notebooks/local-agentic-rag/README.md
new file mode 100644
index 00000000000..394071e4b48
--- /dev/null
+++ b/notebooks/local-agentic-rag/README.md
@@ -0,0 +1,110 @@
+# 🤖 Local RAG Pipeline with Ollama and Optional Agentic Workflow
+
+This notebook demonstrates a **minimal, fully local Retrieval-Augmented Generation (RAG) pipeline** using Ollama, ChromaDB, and an optional agentic workflow with LangGraph.
+
+The implementation is designed to be **educational, modular, and CPU-friendly**, requiring no cloud APIs after initial setup.
+
+---
+
+## 📚 Overview
+
+This notebook walks through building a complete local AI pipeline:
+
+- Local LLM inference using Ollama
+- Document embedding and storage with ChromaDB
+- Retrieval-Augmented Generation (RAG)
+- Optional agentic workflow using LangGraph
+- Optional OpenVINO™ integration for optimized inference
+
+The goal is to provide a **clear and reproducible introduction** to local-first AI systems.
+
+---
+
+## 🔍 What is RAG?
+
+**Retrieval-Augmented Generation (RAG)** enhances LLM responses by retrieving relevant context from a knowledge base before generating an answer.
+
+This helps:
+- Reduce hallucinations
+- Incorporate domain-specific knowledge
+- Improve factual accuracy
+
+---
+
+## 🤖 Optional Agentic Workflow
+
+This notebook includes an **optional agentic extension** using LangGraph.
+
+In this setup, the system can:
+- Decide whether retrieval is needed
+- Route queries dynamically
+- Use simple tools such as a calculator
+
+> ⚠️ This section is optional and intended for learning purposes.  
+> The core RAG pipeline works independently without the agentic extension.
+
+---
+---
+
+## 🔧 Recent Changes & Fixes
+
+### Added: Dependency Check (Before Agent Section)
+
+**Issue:** Running the LangGraph agent section (Section 7) would fail with `NameError: name 'ask_llm' is not defined` if prerequisite cells were not executed first.
+
+**Solution:** Added an automatic dependency check that:
+- ✅ Verifies all required functions are available before building the agent
+- ✅ Provides clear error messages if functions are missing
+- ✅ Shows exactly which sections to run and in what order
+
+**Impact:** Users can now run cells in any order—the dependency check catches missing prerequisites with helpful instructions.
+
+### Execution Order Requirements
+
+To run the full pipeline successfully, execute sections in this order:
+
+1. **Section 1:** Environment Setup (install packages)
+2. **Section 1** (Ollama): Configuration & model verification
+3. **Section 2:** Basic LLM Inference (`ask_llm` function)
+4. **Section 3:** Document Preparation (creates chunks)
+5. **Section 4:** ChromaDB Setup (vector store)
+6. **Section 5:** Retrieval (`retrieve_documents` function)
+7. **Section 6:** RAG Pipeline (`build_rag_prompt` function)
+8. **Section 7+:** Agentic workflow (now safe to run)
+
+> 💡 The dependency check will remind you if you skip steps!
+
+---
+## ⚡ OpenVINO™ Integration
+
+OpenVINO™ is Intel’s toolkit for optimizing and deploying deep learning models.
+
+This notebook is designed to be **compatible with OpenVINO optimization workflows**, including:
+
+- Model conversion (FP32 → FP16 / INT8)
+- Quantization and compression
+- CPU, GPU, and NPU performance optimization
+
+> 💡 OpenVINO integration is optional. The notebook can run without it.
+
+---
+
+## 💻 Requirements
+
+| Component | Requirement |
+|----------|-------------|
+| Python | 3.9+ |
+| RAM | 8 GB minimum, 16 GB recommended |
+| Storage | ~5 GB free |
+| OS | Windows, Linux, or macOS |
+
+> ✅ No GPU is required.
+
+---
+
+## 🛠️ Setup Instructions
+
+### 1. Install Python dependencies
+
+```bash
+pip install ollama chromadb langgraph langchain sentence-transformers jupyter
\ No newline at end of file
diff --git a/notebooks/local-agentic-rag/local-agentic-rag-ollama-openvino.ipynb b/notebooks/local-agentic-rag/local-agentic-rag-ollama-openvino.ipynb
new file mode 100644
index 00000000000..093a04772cf
--- /dev/null
+++ b/notebooks/local-agentic-rag/local-agentic-rag-ollama-openvino.ipynb
@@ -0,0 +1,1272 @@
+    "# 🤖 Local Agentic RAG with Ollama, ChromaDB, and LangGraph\n",
+    "\n",
+    "**A fully local, privacy-preserving Retrieval-Augmented Generation pipeline**\n",
+    "\n",
+    "[![Colab](https://colab.research.google.com/assets/colab-badge.svg)]()\n",
+    "\n",
+    "---\n",
+    "\n",
+    "## What You'll Learn\n",
+    "\n",
+    "In this notebook, you will build a **minimal end-to-end Agentic RAG pipeline** that runs entirely on your local machine — no internet connection or cloud API required after setup.\n",
+    "\n",
+    "### 🔍 What is RAG?\n",
+    "\n",
+    "**Retrieval-Augmented Generation (RAG)** is a technique that enhances a language model's responses by first *retrieving* relevant documents from a knowledge base, then *augmenting* the prompt with that context before generating an answer. This is especially useful when:\n",
+    "\n",
+    "- Your LLM doesn't know about domain-specific or up-to-date information\n",
+    "- You want answers grounded in specific documents (e.g., internal manuals, research papers)\n",
+    "- You want to reduce hallucinations\n",
+    "\n",
+    "### 🤖 What Makes It \"Agentic\"?\n",
+    "\n",
+    "A standard RAG pipeline is a fixed sequence: retrieve → generate. An **Agentic RAG** pipeline uses a reasoning loop where the model can:\n",
+    "\n",
+    "- Decide *whether* retrieval is needed\n",
+    "- Call tools (e.g., a calculator, search function)\n",
+    "- Reflect on intermediate results before producing a final answer\n",
+    "\n",
+    "We implement this loop using **LangGraph**, a lightweight graph-based orchestration library.\n",
+    "\n",
+    "### 🔒 Why Local-First?\n",
+    "\n",
+    "Running AI entirely on your own hardware provides:\n",
+    "\n",
+    "- **Privacy** — your data never leaves your machine\n",
+    "- **Offline capability** — works without internet\n",
+    "- **Cost control** — no per-token API charges\n",
+    "- **Reproducibility** — same model version every run\n",
+    "\n",
+    "### ⚡ Where Does OpenVINO Fit?\n",
+    "\n",
+    "[OpenVINO™](https://github.com/openvinotoolkit/openvino) is Intel's open-source toolkit for optimizing and deploying deep learning models. It can accelerate LLM inference on Intel CPUs, iGPUs, and NPUs by:\n",
+    "\n",
+    "- Quantizing models (e.g., INT4/INT8) to reduce memory usage\n",
+    "- Compiling computation graphs for hardware-specific optimization\n",
+    "- Enabling faster token generation on CPU compared to vanilla PyTorch\n",
+    "\n",
+    "In this notebook, OpenVINO is shown as an **optional enhancement** — the pipeline runs fine without it, and we show how to plug it in.\n",
+    "\n",
+    "---\n",
+    "\n",
+    "### 🗺️ Pipeline Overview\n",
+    "\n",
+    "```\n",
+    "User Query\n",
+    "    │\n",
+    "    ▼\n",
+    "┌─────────────────────────────────────────┐\n",
+    "│           LangGraph Agent Loop          │\n",
+    "│                                         │\n",
+    "│  ┌──────────┐     ┌──────────────────┐  │\n",
+    "│  │  Decide  │───▶│  Retrieve Docs   │   │\n",
+    "│  │  (LLM)   │     │  (ChromaDB)      │  │\n",
+    "│  └──────────┘     └────────┬─────────┘  │\n",
+    "│       ▲                    │            │\n",
+    "│       │            ┌───────▼──────────┐ │\n",
+    "│       └────────────│  Generate Answer │ │\n",
+    "│                    │  (Ollama LLM)    │ │\n",
+    "│                    └──────────────────┘ │\n",
+    "└─────────────────────────────────────────┘\n",
+    "    │\n",
+    "    ▼\n",
+    "Final Answer\n",
+    "```\n",
+    "\n",
+    "---\n",
+    "\n",
+    "### 📋 Prerequisites\n",
+    "\n",
+    "| Requirement | Minimum | Recommended |\n",
+    "|---|---|---|\n",
+    "| RAM | 8 GB | 16 GB |\n",
+    "| CPU | Any x86-64 / ARM64 | Intel Core i5+ |\n",
+    "| Storage | 5 GB free | 10 GB free |\n",
+    "| OS | Linux / macOS / Windows | Ubuntu 22.04+ |\n",
+    "| Python | 3.9+ | 3.11 |\n",
+    "\n",
+    "> ✅ **No GPU required.** This notebook is designed to run on CPU only."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "setup-header",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 📦 Section 1: Environment Setup\n",
+    "\n",
+    "We install the required Python packages. These are all lightweight and widely used:\n",
+    "\n",
+    "| Package | Purpose |\n",
+    "|---|---|\n",
+    "| `ollama` | Python client for local Ollama LLM server |\n",
+    "| `chromadb` | Local vector database for document embeddings |\n",
+    "| `langgraph` | Agent loop orchestration |\n",
+    "| `langchain-community` | Utility helpers (text splitters, etc.) |\n",
+    "| `sentence-transformers` | Lightweight local embedding model |\n",
+    "| `tqdm` | Progress bars |\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "install-deps",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Note: you may need to restart the kernel to use updated packages.\n",
+      "✅ All packages installed successfully.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "  WARNING: The script websockets.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script tqdm.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script isympy.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script dotenv.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script pybase64.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script watchfiles.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script uvicorn.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The scripts torchfrtrace.exe and torchrun.exe are installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script onnxruntime_test.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script markdown-it.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script pyproject-build.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script typer.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The scripts hf.exe and tiny-agents.exe are installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script transformers.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "  WARNING: The script chroma.exe is installed in 'C:\\Users\\DELL\\AppData\\Roaming\\Python\\Python314\\Scripts' which is not on PATH.\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
+      "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
+      "gtts 2.5.4 requires click<8.2,>=7.1, but you have click 8.3.1 which is incompatible.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Install required packages\n",
+    "# This may take 1-2 minutes on first run\n",
+    "%pip install -q \\\n",
+    "    ollama \\\n",
+    "    chromadb \\\n",
+    "    langgraph \\\n",
+    "    langchain \\\n",
+    "    langchain-community \\\n",
+    "    sentence-transformers \\\n",
+    "    tqdm\n",
+    "\n",
+    "print(\"✅ All packages installed successfully.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ollama-setup-md",
+   "metadata": {},
+   "source": [
+    "### 🦙 Installing and Starting Ollama\n",
+    "\n",
+    "**Ollama** is a tool for running large language models locally. It handles model download, quantization, and serves an OpenAI-compatible REST API on `http://localhost:11434`.\n",
+    "\n",
+    "**Installation:**\n",
+    "\n",
+    "```bash\n",
+    "# Linux / macOS:\n",
+    "curl -fsSL https://ollama.com/install.sh | sh\n",
+    "\n",
+    "# Windows: Download installer from https://ollama.com/download\n",
+    "```\n",
+    "\n",
+    "**Start the Ollama server** (in a separate terminal or as a background service):\n",
+    "\n",
+    "```bash\n",
+    "ollama serve\n",
+    "```\n",
+    "\n",
+    "**Pull a lightweight model** suitable for CPU inference:\n",
+    "\n",
+    "```bash\n",
+    "# ~2.3 GB — good balance of quality and speed on CPU\n",
+    "ollama pull qwen2.5:3b\n",
+    "\n",
+    "# Smaller alternative (~1.1 GB) if RAM is limited:\n",
+    "# ollama pull qwen2.5:1.5b\n",
+    "```\n",
+    "\n",
+    "> 💡 **Why Qwen2.5?** It delivers strong instruction-following performance in small sizes (1.5B–7B), making it ideal for CPU-only environments. The 3B variant comfortably fits in 8 GB RAM.\n",
+    "\n",
+    "After pulling the model, verify Ollama is running by executing the next cell."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "check-ollama",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "🦙 Ollama is running!\n",
+      "   Available models: ['qwen2.5:3b', 'gemma3:4b', 'mxbai-embed-large:latest', 'gemma3:latest']\n",
+      "\n",
+      "✅ Model 'qwen2.5:3b' is ready to use.\n"
+     ]
+    }
+   ],
+   "source": [
+    "import ollama\n",
+    "\n",
+    "# ── Configuration ────────────────────────────────────────────────────────────\n",
+    "# Change MODEL_NAME to match whichever model you pulled with `ollama pull`\n",
+    "MODEL_NAME = \"qwen2.5:3b\"       # Recommended: good quality on CPU\n",
+    "# MODEL_NAME = \"qwen2.5:1.5b\"  # Uncomment for lower RAM usage (~1.1 GB)\n",
+    "# MODEL_NAME = \"llama3.2:3b\"    # Alternative: Meta LLaMA 3.2 3B\n",
+    "# ─────────────────────────────────────────────────────────────────────────────\n",
+    "\n",
+    "# Verify Ollama is running and the model is available\n",
+    "try:\n",
+    "    available_models = [m.model for m in ollama.list().models]\n",
+    "    print(\"🦙 Ollama is running!\")\n",
+    "    print(f\"   Available models: {available_models}\")\n",
+    "    \n",
+    "    if MODEL_NAME not in available_models:\n",
+    "        print(f\"\\n⚠️  Model '{MODEL_NAME}' not found locally.\")\n",
+    "        print(f\"   Please run in a terminal:  ollama pull {MODEL_NAME}\")\n",
+    "    else:\n",
+    "        print(f\"\\n✅ Model '{MODEL_NAME}' is ready to use.\")\n",
+    "        \n",
+    "except Exception as e:\n",
+    "    print(f\"❌ Could not connect to Ollama: {e}\")\n",
+    "    print(\"   Make sure Ollama is installed and running: `ollama serve`\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "llm-inference-md",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 💬 Section 2: Basic LLM Inference with Ollama\n",
+    "\n",
+    "Before building the full pipeline, let's confirm the LLM works with a simple prompt-response test.\n",
+    "\n",
+    "We use the `ollama` Python client, which communicates with the locally running Ollama server. The interface mirrors the OpenAI Chat Completions API, so the concepts transfer directly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "basic-llm",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sending a test prompt to the LLM...\n",
+      "\n",
+      "LLM Response:\n",
+      "Intel OpenVINO is an open-source platform optimized for real-time inference of deep learning models on resource-constrained devices.\n"
+     ]
+    }
+   ],
+   "source": [
+    "def ask_llm(prompt: str, model: str = MODEL_NAME, system: str = None) -> str:\n",
+    "    \"\"\"\n",
+    "    Send a prompt to the local Ollama LLM and return the response text.\n",
+    "    \n",
+    "    Args:\n",
+    "        prompt:  The user message / question.\n",
+    "        model:   Ollama model name to use.\n",
+    "        system:  Optional system prompt to set the assistant's behaviour.\n",
+    "    \n",
+    "    Returns:\n",
+    "        The model's response as a plain string.\n",
+    "    \"\"\"\n",
+    "    messages = []\n",
+    "    if system:\n",
+    "        messages.append({\"role\": \"system\", \"content\": system})\n",
+    "    messages.append({\"role\": \"user\", \"content\": prompt})\n",
+    "    \n",
+    "    response = ollama.chat(model=model, messages=messages)\n",
+    "    return response[\"message\"][\"content\"]\n",
+    "\n",
+    "\n",
+    "# ── Simple test ───────────────────────────────────────────────────────────────\n",
+    "print(\"Sending a test prompt to the LLM...\\n\")\n",
+    "\n",
+    "test_response = ask_llm(\n",
+    "    prompt=\"In one sentence, what is Intel OpenVINO?\",\n",
+    "    system=\"You are a concise technical assistant.\"\n",
+    ")\n",
+    "\n",
+    "print(f\"LLM Response:\\n{test_response}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "documents-md",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 📄 Section 3: Document Preparation\n",
+    "\n",
+    "For RAG to work, we need a **knowledge base** — a set of documents the system can search through to answer questions.\n",
+    "\n",
+    "In this example, we use a small set of manually written paragraphs about Intel OpenVINO and related AI topics. In a real project, you would replace these with:\n",
+    "\n",
+    "- PDF or text files loaded from disk\n",
+    "- Web pages fetched via scraping\n",
+    "- Database records\n",
+    "- Any structured or unstructured text\n",
+    "\n",
+    "We also split long documents into smaller **chunks**. This is important because:\n",
+    "\n",
+    "1. Embedding models have a maximum input length (typically 256–512 tokens)\n",
+    "2. Smaller chunks improve retrieval precision — you return only the relevant paragraph, not an entire page\n",
+    "3. The LLM context window is limited; smaller chunks fit more retrieved results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "prepare-docs",
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "ModuleNotFoundError",
+     "evalue": "No module named 'langchain.text_splitter'",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+      "\u001b[31mModuleNotFoundError\u001b[39m                       Traceback (most recent call last)",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[4]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m langchain.text_splitter \u001b[38;5;28;01mimport\u001b[39;00m RecursiveCharacterTextSplitter\n\u001b[32m      2\u001b[39m \n\u001b[32m      3\u001b[39m \u001b[38;5;66;03m# ── Sample knowledge base ─────────────────────────────────────────────────────\u001b[39;00m\n\u001b[32m      4\u001b[39m \u001b[38;5;66;03m# These short documents form our local knowledge base.\u001b[39;00m\n",
+      "\u001b[31mModuleNotFoundError\u001b[39m: No module named 'langchain.text_splitter'"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "\n",
+    "# ── Sample knowledge base ─────────────────────────────────────────────────────\n",
+    "# These short documents form our local knowledge base.\n",
+    "# Replace or extend with your own content.\n",
+    "\n",
+    "RAW_DOCUMENTS = [\n",
+    "    {\n",
+    "        \"id\": \"doc_openvino_overview\",\n",
+    "        \"text\": (\n",
+    "            \"Intel OpenVINO (Open Visual Inference and Neural network Optimization) is an open-source \"\n",
+    "            \"toolkit for optimizing and deploying AI inference. It supports models from frameworks \"\n",
+    "            \"like PyTorch, TensorFlow, and ONNX. OpenVINO converts models into its Intermediate \"\n",
+    "            \"Representation (IR) format for cross-hardware deployment. It targets Intel CPUs, \"\n",
+    "            \"integrated GPUs, and NPUs. Key features include model quantization (INT8/INT4), \"\n",
+    "            \"throughput optimization, and a Python API for easy integration.\"\n",
+    "        ),\n",
+    "        \"source\": \"openvino_overview\",\n",
+    "    },\n",
+    "    {\n",
+    "        \"id\": \"doc_rag_explanation\",\n",
+    "        \"text\": (\n",
+    "            \"Retrieval-Augmented Generation (RAG) is a technique that improves LLM responses by \"\n",
+    "            \"fetching relevant documents from an external knowledge base before generating an answer. \"\n",
+    "            \"The workflow has two stages: retrieval, where a query is converted to an embedding and \"\n",
+    "            \"matched against stored document embeddings in a vector database; and generation, where \"\n",
+    "            \"the retrieved documents are concatenated with the original query as context for the LLM. \"\n",
+    "            \"RAG reduces hallucinations and allows the model to answer questions about private or \"\n",
+    "            \"domain-specific data without fine-tuning.\"\n",
+    "        ),\n",
+    "        \"source\": \"rag_explanation\",\n",
+    "    },\n",
+    "    {\n",
+    "        \"id\": \"doc_chromadb\",\n",
+    "        \"text\": (\n",
+    "            \"ChromaDB is a lightweight, open-source vector database designed for AI applications. \"\n",
+    "            \"It stores document embeddings (dense numerical vectors) and supports fast similarity \"\n",
+    "            \"search using cosine or L2 distance. ChromaDB can run fully in-memory for prototyping \"\n",
+    "            \"or persist data to disk for production use. It integrates natively with popular \"\n",
+    "            \"embedding models from HuggingFace and OpenAI, and requires no separate database server.\"\n",
+    "        ),\n",
+    "        \"source\": \"chromadb_overview\",\n",
+    "    },\n",
+    "    {\n",
+    "        \"id\": \"doc_langgraph\",\n",
+    "        \"text\": (\n",
+    "            \"LangGraph is a library for building stateful, multi-step agent workflows using a \"\n",
+    "            \"graph-based computation model. Nodes represent individual actions (e.g., call LLM, \"\n",
+    "            \"retrieve documents, execute tool), and edges define the control flow between them. \"\n",
+    "            \"LangGraph supports conditional routing, loops, and human-in-the-loop checkpoints. \"\n",
+    "            \"It is part of the LangChain ecosystem and can be used with any LLM provider, \"\n",
+    "            \"including locally running models via Ollama.\"\n",
+    "        ),\n",
+    "        \"source\": \"langgraph_overview\",\n",
+    "    },\n",
+    "    {\n",
+    "        \"id\": \"doc_ollama\",\n",
+    "        \"text\": (\n",
+    "            \"Ollama is a tool for running large language models locally on your machine. It \"\n",
+    "            \"provides a simple CLI and REST API for downloading and serving quantized models \"\n",
+    "            \"in GGUF format. Supported models include Llama 3, Qwen2.5, Mistral, Phi-3, and \"\n",
+    "            \"many others. Ollama handles CPU and GPU inference automatically, making local LLM \"\n",
+    "            \"deployment accessible without complex setup. Its API is compatible with the \"\n",
+    "            \"OpenAI Chat Completions specification.\"\n",
+    "        ),\n",
+    "        \"source\": \"ollama_overview\",\n",
+    "    },\n",
+    "]\n",
+    "\n",
+    "print(f\"📚 Knowledge base: {len(RAW_DOCUMENTS)} documents loaded.\")\n",
+    "\n",
+    "# ── Text chunking ─────────────────────────────────────────────────────────────\n",
+    "# For longer documents you would chunk them; our samples are already short.\n",
+    "# We show the splitter setup for completeness — it's a no-op on small texts.\n",
+    "\n",
+    "splitter = RecursiveCharacterTextSplitter(\n",
+    "    chunk_size=400,          # characters per chunk\n",
+    "    chunk_overlap=60,        # overlap to preserve context across chunk boundaries\n",
+    "    separators=[\"\\n\\n\", \"\\n\", \". \", \" \", \"\"],\n",
+    ")\n",
+    "\n",
+    "chunks = []\n",
+    "for doc in RAW_DOCUMENTS:\n",
+    "    for i, chunk_text in enumerate(splitter.split_text(doc[\"text\"])):\n",
+    "        chunks.append({\n",
+    "            \"id\":     f\"{doc['id']}_chunk{i}\",\n",
+    "            \"text\":   chunk_text,\n",
+    "            \"source\": doc[\"source\"],\n",
+    "        })\n",
+    "\n",
+    "print(f\"✂️  After chunking: {len(chunks)} text chunks ready for embedding.\")\n",
+    "print(f\"\\n📝 Example chunk:\\n   {chunks[0]['text'][:200]}...\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "embedding-md",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 🧮 Section 4: Embeddings and Vector Storage (ChromaDB)\n",
+    "\n",
+    "**Embeddings** are dense numerical representations of text that capture semantic meaning. Texts with similar meanings have embeddings that are close together in vector space.\n",
+    "\n",
+    "We use **`sentence-transformers/all-MiniLM-L6-v2`** — a small but effective embedding model:\n",
+    "- **Size:** ~22 MB (very lightweight)\n",
+    "- **Embedding dimension:** 384\n",
+    "- **Runs entirely on CPU** without any configuration\n",
+    "\n",
+    "All embeddings are stored in a **ChromaDB** collection, which provides:\n",
+    "- Persistent local storage (no server needed)\n",
+    "- Fast approximate nearest-neighbour search\n",
+    "- Metadata filtering"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "setup-embeddings",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import chromadb\n",
+    "from chromadb.utils import embedding_functions\n",
+    "from tqdm import tqdm\n",
+    "\n",
+    "# ── Embedding model ───────────────────────────────────────────────────────────\n",
+    "# Using SentenceTransformers via ChromaDB's built-in embedding function.\n",
+    "# The model is downloaded once and cached in ~/.cache/huggingface/\n",
+    "\n",
+    "EMBEDDING_MODEL = \"all-MiniLM-L6-v2\"  # ~22 MB, excellent for CPU\n",
+    "\n",
+    "print(f\"Loading embedding model: {EMBEDDING_MODEL}\")\n",
+    "print(\"(First run will download ~22 MB — subsequent runs use the cache)\\n\")\n",
+    "\n",
+    "embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(\n",
+    "    model_name=EMBEDDING_MODEL\n",
+    ")\n",
+    "\n",
+    "# ── ChromaDB setup ────────────────────────────────────────────────────────────\n",
+    "# PersistentClient stores the database on disk at ./chroma_db/\n",
+    "# Use chromadb.Client() for an in-memory-only version.\n",
+    "\n",
+    "DB_PATH = \"./chroma_db\"  # Local directory for vector store persistence\n",
+    "\n",
+    "chroma_client = chromadb.PersistentClient(path=DB_PATH)\n",
+    "\n",
+    "# Create (or load existing) collection\n",
+    "# get_or_create_collection avoids errors if re-running the notebook\n",
+    "collection = chroma_client.get_or_create_collection(\n",
+    "    name=\"openvino_rag_demo\",\n",
+    "    embedding_function=embedding_fn,\n",
+    "    metadata={\"hnsw:space\": \"cosine\"},  # Use cosine similarity\n",
+    ")\n",
+    "\n",
+    "print(f\"✅ ChromaDB collection ready at: {DB_PATH}\")\n",
+    "print(f\"   Collection name: 'openvino_rag_demo'\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "index-docs",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ── Index documents ───────────────────────────────────────────────────────────\n",
+    "# Check how many documents are already in the collection to avoid duplicates.\n",
+    "\n",
+    "existing_count = collection.count()\n",
+    "\n",
+    "if existing_count >= len(chunks):\n",
+    "    print(f\"ℹ️  Collection already contains {existing_count} documents. Skipping indexing.\")\n",
+    "    print(\"   Delete './chroma_db/' and re-run to re-index.\")\n",
+    "else:\n",
+    "    print(f\"Indexing {len(chunks)} chunks into ChromaDB...\")\n",
+    "    \n",
+    "    # Add all chunks in a single batch call for efficiency\n",
+    "    collection.add(\n",
+    "        ids       = [c[\"id\"]     for c in chunks],\n",
+    "        documents = [c[\"text\"]   for c in chunks],\n",
+    "        metadatas = [{\"source\": c[\"source\"]} for c in chunks],\n",
+    "    )\n",
+    "    \n",
+    "    print(f\"\\n✅ Indexed {collection.count()} chunks successfully.\")\n",
+    "\n",
+    "print(f\"\\n📊 Vector store summary:\")\n",
+    "print(f\"   Total documents in collection: {collection.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "retrieval-md",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 🔍 Section 5: The Retrieval Step\n",
+    "\n",
+    "Retrieval works by:\n",
+    "\n",
+    "1. Converting the user's query into an embedding using the *same* embedding model used during indexing\n",
+    "2. Finding the `k` most similar document embeddings in ChromaDB using cosine similarity\n",
+    "3. Returning those documents as context\n",
+    "\n",
+    "The key insight: **semantic similarity, not keyword matching**. A query about \"fast inference\" will retrieve documents about \"optimized deployment\" even if the exact words differ."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "retrieval-fn",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def retrieve_documents(query: str, k: int = 3) -> list[dict]:\n",
+    "    \"\"\"\n",
+    "    Retrieve the top-k most relevant document chunks for a given query.\n",
+    "    \n",
+    "    Args:\n",
+    "        query: The user's search question.\n",
+    "        k:     Number of documents to retrieve.\n",
+    "    \n",
+    "    Returns:\n",
+    "        List of dicts with keys: 'id', 'text', 'source', 'distance'\n",
+    "    \"\"\"\n",
+    "    results = collection.query(\n",
+    "        query_texts=[query],\n",
+    "        n_results=k,\n",
+    "        include=[\"documents\", \"metadatas\", \"distances\"],\n",
+    "    )\n",
+    "    \n",
+    "    retrieved = []\n",
+    "    for i in range(len(results[\"ids\"][0])):\n",
+    "        retrieved.append({\n",
+    "            \"id\":       results[\"ids\"][0][i],\n",
+    "            \"text\":     results[\"documents\"][0][i],\n",
+    "            \"source\":   results[\"metadatas\"][0][i][\"source\"],\n",
+    "            \"distance\": results[\"distances\"][0][i],  # Lower = more similar\n",
+    "        })\n",
+    "    \n",
+    "    return retrieved\n",
+    "\n",
+    "\n",
+    "# ── Test retrieval ────────────────────────────────────────────────────────────\n",
+    "test_query = \"How does OpenVINO speed up AI inference?\"\n",
+    "print(f\"🔍 Test query: \\\"{test_query}\\\"\\n\")\n",
+    "\n",
+    "retrieved = retrieve_documents(test_query, k=2)\n",
+    "\n",
+    "for i, doc in enumerate(retrieved, 1):\n",
+    "    print(f\"--- Result {i} (source: {doc['source']}, distance: {doc['distance']:.4f}) ---\")\n",
+    "    print(f\"{doc['text']}\\n\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "rag-pipeline-md",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## ⚙️ Section 6: The RAG Pipeline\n",
+    "\n",
+    "Now we connect retrieval with generation. The pattern is:\n",
+    "\n",
+    "1. **Retrieve** relevant chunks for the user's question\n",
+    "2. **Format** a prompt that includes the retrieved context\n",
+    "3. **Generate** a response using the local LLM\n",
+    "\n",
+    "A good system prompt instructs the LLM to:\n",
+    "- Answer only from the provided context\n",
+    "- Admit when it doesn't know (avoiding hallucination)\n",
+    "- Be concise"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "rag-pipeline",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def build_rag_prompt(query: str, context_docs: list[dict]) -> str:\n",
+    "    \"\"\"\n",
+    "    Assemble the RAG prompt by combining retrieved context with the user query.\n",
+    "    \n",
+    "    Args:\n",
+    "        query:        The user's original question.\n",
+    "        context_docs: List of retrieved document dicts (from retrieve_documents).\n",
+    "    \n",
+    "    Returns:\n",
+    "        A formatted prompt string ready to send to the LLM.\n",
+    "    \"\"\"\n",
+    "    context_str = \"\\n\\n\".join(\n",
+    "        f\"[Source: {doc['source']}]\\n{doc['text']}\"\n",
+    "        for doc in context_docs\n",
+    "    )\n",
+    "    \n",
+    "    prompt = (\n",
+    "        f\"You are a helpful assistant. Answer the question using ONLY the context \"\n",
+    "        f\"provided below. If the context does not contain enough information to answer, \"\n",
+    "        f\"say \\\"I don't have enough information to answer this question.\\\"\\n\\n\"\n",
+    "        f\"Context:\\n{context_str}\\n\\n\"\n",
+    "        f\"Question: {query}\\n\\n\"\n",
+    "        f\"Answer:\"\n",
+    "    )\n",
+    "    return prompt\n",
+    "\n",
+    "\n",
+    "def rag_query(query: str, k: int = 3) -> dict:\n",
+    "    \"\"\"\n",
+    "    Full RAG pipeline: retrieve → build prompt → generate.\n",
+    "    \n",
+    "    Args:\n",
+    "        query: The user's question.\n",
+    "        k:     Number of documents to retrieve.\n",
+    "    \n",
+    "    Returns:\n",
+    "        Dict with 'query', 'retrieved_docs', and 'answer'.\n",
+    "    \"\"\"\n",
+    "    # Step 1: Retrieve\n",
+    "    docs = retrieve_documents(query, k=k)\n",
+    "    \n",
+    "    # Step 2: Build prompt\n",
+    "    prompt = build_rag_prompt(query, docs)\n",
+    "    \n",
+    "    # Step 3: Generate\n",
+    "    answer = ask_llm(prompt)\n",
+    "    \n",
+    "    return {\"query\": query, \"retrieved_docs\": docs, \"answer\": answer}\n",
+    "\n",
+    "\n",
+    "# ── Run a test RAG query ───────────────────────────────────────────────────────\n",
+    "print(\"Running RAG pipeline...\\n\")\n",
+    "\n",
+    "result = rag_query(\"What is ChromaDB and how is it used in AI applications?\")\n",
+    "\n",
+    "print(f\"❓ Question: {result['query']}\\n\")\n",
+    "print(f\"📄 Retrieved {len(result['retrieved_docs'])} documents:\")\n",
+    "for doc in result['retrieved_docs']:\n",
+    "    print(f\"   • {doc['source']} (similarity distance: {doc['distance']:.4f})\")\n",
+    "print(f\"\\n💬 Answer:\\n{result['answer']}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "agentic-loop-md",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 🔄 Section 7: Minimal Agentic Loop with LangGraph\n",
+    "\n",
+    "So far, our pipeline is a **static sequence**: always retrieve, always generate. An **agentic** pipeline adds reasoning and decision-making.\n",
+    "\n",
+    "### How LangGraph Works\n",
+    "\n",
+    "LangGraph models a workflow as a **state machine**:\n",
+    "\n",
+    "- **State** — a typed dictionary that flows between nodes\n",
+    "- **Nodes** — Python functions that read/write the state\n",
+    "- **Edges** — connections between nodes (can be conditional)\n",
+    "\n",
+    "### Our Agent Graph\n",
+    "\n",
+    "```\n",
+    "START\n",
+    "  │\n",
+    "  ▼\n",
+    "[classify_query]   ← LLM decides: needs retrieval or direct answer?\n",
+    "  │         │\n",
+    "  │ (rag)   │ (direct)\n",
+    "  ▼         ▼\n",
+    "[retrieve]  [generate_direct]\n",
+    "  │                │\n",
+    "  ▼                │\n",
+    "[generate_rag]     │\n",
+    "  │                │\n",
+    "  └──────┬──────────┘\n",
+    "         ▼\n",
+    "       END\n",
+    "```\n",
+    "\n",
+    "The agent first classifies the query: if it's a factual question that benefits from document lookup, it uses RAG; otherwise it answers directly. This avoids unnecessary retrieval for simple conversational or computational queries."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "langgraph-state",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import TypedDict, Annotated, Literal\n",
+    "from langgraph.graph import StateGraph, START, END\n",
+    "\n",
+    "# ── Agent State ───────────────────────────────────────────────────────────────\n",
+    "# Every node receives and returns a subset of this state dict.\n",
+    "# Using TypedDict gives us type hints and IDE support.\n",
+    "\n",
+    "class AgentState(TypedDict):\n",
+    "    query:         str                  # Original user question\n",
+    "    route:         str                  # 'rag' or 'direct'\n",
+    "    retrieved_docs: list[dict]          # Documents from ChromaDB\n",
+    "    answer:        str                  # Final answer\n",
+    "    tool_result:   str                  # Optional: result of a tool call\n",
+    "\n",
+    "\n",
+    "print(\"✅ AgentState defined.\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "langgraph-nodes",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ── Node 1: Query Classifier ──────────────────────────────────────────────────\n",
+    "# Asks the LLM to decide if this query needs document retrieval.\n",
+    "\n",
+    "def classify_query(state: AgentState) -> AgentState:\n",
+    "    \"\"\"\n",
+    "    Route the query:\n",
+    "    - 'rag'    → requires searching the knowledge base\n",
+    "    - 'direct' → can be answered without retrieval (math, simple facts, etc.)\n",
+    "    \"\"\"\n",
+    "    decision_prompt = (\n",
+    "        f\"You are a routing assistant. Given the user query below, decide if it \"\n",
+    "        f\"requires searching a knowledge base about AI tools (OpenVINO, ChromaDB, \"\n",
+    "        f\"Ollama, LangGraph, RAG) to answer correctly, or if it can be answered \"\n",
+    "        f\"directly.\\n\\n\"\n",
+    "        f\"Reply with ONLY one word: 'rag' or 'direct'.\\n\\n\"\n",
+    "        f\"Query: {state['query']}\"\n",
+    "    )\n",
+    "    \n",
+    "    raw = ask_llm(decision_prompt).strip().lower()\n",
+    "    \n",
+    "    # Normalize the output — the LLM might add punctuation\n",
+    "    route = \"rag\" if \"rag\" in raw else \"direct\"\n",
+    "    \n",
+    "    print(f\"   🗺️  Classifier decision: '{route}' (raw: '{raw}')\")\n",
+    "    return {**state, \"route\": route}\n",
+    "\n",
+    "\n",
+    "# ── Node 2: Document Retrieval ────────────────────────────────────────────────\n",
+    "\n",
+    "def retrieve_node(state: AgentState) -> AgentState:\n",
+    "    \"\"\"Retrieve relevant documents from ChromaDB for the query.\"\"\"\n",
+    "    docs = retrieve_documents(state[\"query\"], k=3)\n",
+    "    print(f\"   📚 Retrieved {len(docs)} documents.\")\n",
+    "    return {**state, \"retrieved_docs\": docs}\n",
+    "\n",
+    "\n",
+    "# ── Node 3a: RAG Generation ───────────────────────────────────────────────────\n",
+    "\n",
+    "def generate_rag_node(state: AgentState) -> AgentState:\n",
+    "    \"\"\"Generate an answer grounded in the retrieved documents.\"\"\"\n",
+    "    prompt   = build_rag_prompt(state[\"query\"], state[\"retrieved_docs\"])\n",
+    "    answer   = ask_llm(prompt)\n",
+    "    return {**state, \"answer\": answer}\n",
+    "\n",
+    "\n",
+    "# ── Node 3b: Direct Generation ────────────────────────────────────────────────\n",
+    "\n",
+    "def generate_direct_node(state: AgentState) -> AgentState:\n",
+    "    \"\"\"Generate an answer directly, without retrieval.\"\"\"\n",
+    "    answer = ask_llm(\n",
+    "        prompt=state[\"query\"],\n",
+    "        system=\"You are a helpful, concise assistant.\"\n",
+    "    )\n",
+    "    return {**state, \"answer\": answer, \"retrieved_docs\": []}\n",
+    "\n",
+    "\n",
+    "# ── Conditional edge: decides which generation node to call ───────────────────\n",
+    "\n",
+    "def route_decision(state: AgentState) -> Literal[\"retrieve\", \"generate_direct\"]:\n",
+    "    \"\"\"Edge function: returns the name of the next node based on 'route'.\"\"\"\n",
+    "    if state[\"route\"] == \"rag\":\n",
+    "        return \"retrieve\"\n",
+    "    return \"generate_direct\"\n",
+    "\n",
+    "\n",
+    "print(\"✅ All graph nodes defined.\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "langgraph-build",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ── Build the LangGraph state machine ─────────────────────────────────────────\n",
+    "\n",
+    "builder = StateGraph(AgentState)\n",
+    "\n",
+    "# Add nodes\n",
+    "builder.add_node(\"classify\",         classify_query)\n",
+    "builder.add_node(\"retrieve\",         retrieve_node)\n",
+    "builder.add_node(\"generate_rag\",     generate_rag_node)\n",
+    "builder.add_node(\"generate_direct\",  generate_direct_node)\n",
+    "\n",
+    "# Entry point\n",
+    "builder.add_edge(START, \"classify\")\n",
+    "\n",
+    "# Conditional routing after classification\n",
+    "builder.add_conditional_edges(\n",
+    "    \"classify\",\n",
+    "    route_decision,\n",
+    "    {\n",
+    "        \"retrieve\":        \"retrieve\",\n",
+    "        \"generate_direct\": \"generate_direct\",\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "# After retrieval, always go to RAG generation\n",
+    "builder.add_edge(\"retrieve\",        \"generate_rag\")\n",
+    "\n",
+    "# Both generation nodes lead to END\n",
+    "builder.add_edge(\"generate_rag\",    END)\n",
+    "builder.add_edge(\"generate_direct\", END)\n",
+    "\n",
+    "# Compile the graph into a runnable\n",
+    "agent = builder.compile()\n",
+    "\n",
+    "print(\"✅ LangGraph agent compiled successfully.\")\n",
+    "print(\"\\nGraph structure:\")\n",
+    "print(\"  START → classify → [retrieve → generate_rag | generate_direct] → END\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "run-agent",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def run_agent(query: str) -> str:\n",
+    "    \"\"\"\n",
+    "    Run the agentic RAG pipeline for a given query.\n",
+    "    \n",
+    "    Args:\n",
+    "        query: Natural language question.\n",
+    "    \n",
+    "    Returns:\n",
+    "        The agent's final answer as a string.\n",
+    "    \"\"\"\n",
+    "    print(f\"\\n{'='*60}\")\n",
+    "    print(f\"❓ Query: {query}\")\n",
+    "    print(f\"{'='*60}\")\n",
+    "    \n",
+    "    # Initialize state with defaults\n",
+    "    initial_state: AgentState = {\n",
+    "        \"query\":          query,\n",
+    "        \"route\":          \"\",\n",
+    "        \"retrieved_docs\": [],\n",
+    "        \"answer\":         \"\",\n",
+    "        \"tool_result\":    \"\",\n",
+    "    }\n",
+    "    \n",
+    "    # Run the graph\n",
+    "    final_state = agent.invoke(initial_state)\n",
+    "    \n",
+    "    print(f\"\\n💬 Answer:\\n{final_state['answer']}\")\n",
+    "    \n",
+    "    if final_state[\"retrieved_docs\"]:\n",
+    "        sources = list({d[\"source\"] for d in final_state[\"retrieved_docs\"]})\n",
+    "        print(f\"\\n📖 Sources consulted: {', '.join(sources)}\")\n",
+    "    \n",
+    "    return final_state[\"answer\"]\n",
+    "\n",
+    "\n",
+    "# ── Test 1: Knowledge-base question (should use RAG) ──────────────────────────\n",
+    "_ = run_agent(\"How does LangGraph help build AI agents?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "run-agent-2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ── Test 2: Direct question (should skip retrieval) ───────────────────────────\n",
+    "_ = run_agent(\"What is 25 multiplied by 4?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "run-agent-3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ── Test 3: Your own question ─────────────────────────────────────────────────\n",
+    "# Modify the query below to test with your own questions.\n",
+    "\n",
+    "your_query = \"What models does Ollama support and how does it compare to cloud APIs?\"\n",
+    "_ = run_agent(your_query)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "tool-use-md",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 🛠️ Section 8: Optional — Adding a Simple Tool\n",
+    "\n",
+    "One of the key features of an agentic pipeline is **tool use**: the ability to call external functions for tasks the LLM can't do reliably on its own (e.g., arithmetic, live search, code execution).\n",
+    "\n",
+    "Here we add a minimal **calculator tool** as an example. The same pattern applies to any function: web search, database lookup, API calls, etc.\n",
+    "\n",
+    "> **This section is self-contained and optional.** The core pipeline works without it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "tool-use",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import re\n",
+    "import math\n",
+    "\n",
+    "# ── Define a simple calculator tool ──────────────────────────────────────────\n",
+    "\n",
+    "def calculator_tool(expression: str) -> str:\n",
+    "    \"\"\"\n",
+    "    Safely evaluate a mathematical expression.\n",
+    "    Supports: +, -, *, /, **, sqrt(), sin(), cos(), pi, e\n",
+    "    \n",
+    "    Args:\n",
+    "        expression: A mathematical expression string.\n",
+    "    \n",
+    "    Returns:\n",
+    "        String representation of the result, or an error message.\n",
+    "    \"\"\"\n",
+    "    # Whitelist safe names to prevent code injection\n",
+    "    safe_names = {\n",
+    "        \"sqrt\": math.sqrt, \"sin\": math.sin, \"cos\": math.cos,\n",
+    "        \"tan\": math.tan,   \"log\": math.log, \"pi\": math.pi,\n",
+    "        \"e\": math.e,       \"abs\": abs,      \"round\": round,\n",
+    "    }\n",
+    "    try:\n",
+    "        # Only allow digits, operators, parentheses, dots, and safe function names\n",
+    "        cleaned = re.sub(r\"[^0-9+\\-*/().^ a-zA-Z]\", \"\", expression)\n",
+    "        result  = eval(cleaned, {\"__builtins__\": {}}, safe_names)  # noqa: S307\n",
+    "        return str(result)\n",
+    "    except Exception as exc:\n",
+    "        return f\"Error: {exc}\"\n",
+    "\n",
+    "\n",
+    "# ── Tool-augmented agent node ─────────────────────────────────────────────────\n",
+    "\n",
+    "def tool_agent_query(query: str) -> str:\n",
+    "    \"\"\"\n",
+    "    A simple tool-calling agent:\n",
+    "    1. Ask the LLM if this needs a calculator\n",
+    "    2. If yes, extract the expression and compute it\n",
+    "    3. Inject the result back into a final prompt\n",
+    "    \"\"\"\n",
+    "    # Step 1: Detect if calculation is needed\n",
+    "    detection_prompt = (\n",
+    "        f\"Does the following query require a mathematical calculation? \"\n",
+    "        f\"Reply with ONLY 'yes' or 'no'.\\n\\nQuery: {query}\"\n",
+    "    )\n",
+    "    needs_calc = \"yes\" in ask_llm(detection_prompt).lower()\n",
+    "    \n",
+    "    tool_context = \"\"\n",
+    "    \n",
+    "    if needs_calc:\n",
+    "        # Step 2: Extract the math expression\n",
+    "        extract_prompt = (\n",
+    "            f\"Extract ONLY the mathematical expression from this query as plain text. \"\n",
+    "            f\"Do not explain. Examples: '25 * 4', 'sqrt(144)', '2**10'\\n\\nQuery: {query}\"\n",
+    "        )\n",
+    "        expression = ask_llm(extract_prompt).strip()\n",
+    "        calc_result = calculator_tool(expression)\n",
+    "        tool_context = f\"Calculator result for '{expression}': {calc_result}\\n\\n\"\n",
+    "        print(f\"   🔧 Tool called: calculator('{expression}') → {calc_result}\")\n",
+    "    \n",
+    "    # Step 3: Generate final answer using tool result if available\n",
+    "    final_prompt = (\n",
+    "        f\"{tool_context}\"\n",
+    "        f\"Answer the following query concisely.\\n\\nQuery: {query}\"\n",
+    "    )\n",
+    "    return ask_llm(final_prompt)\n",
+    "\n",
+    "\n",
+    "# ── Test the tool-augmented agent ─────────────────────────────────────────────\n",
+    "print(\"Testing tool-augmented agent:\\n\")\n",
+    "\n",
+    "queries = [\n",
+    "    \"What is the square root of 1764?\",\n",
+    "    \"If I have 256 tokens at 0.002 dollars each, what is the total cost?\",\n",
+    "]\n",
+    "\n",
+    "for q in queries:\n",
+    "    print(f\"\\n❓ {q}\")\n",
+    "    answer = tool_agent_query(q)\n",
+    "    print(f\"💬 {answer}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "openvino-md",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## ⚡ Section 9: Optional — OpenVINO Integration\n",
+    "\n",
+    "> **This section is informational and optional.** The notebook runs fully without OpenVINO installed. OpenVINO is an enhancement, not a requirement.\n",
+    "\n",
+    "### Why Use OpenVINO for Local LLM Inference?\n",
+    "\n",
+    "When running LLMs on Intel hardware (CPU, iGPU, NPU), OpenVINO can provide significant speedups through:\n",
+    "\n",
+    "| Optimization | Description | Typical Benefit |\n",
+    "|---|---|---|\n",
+    "| **INT4 Quantization** | Reduce model weight precision 16-bit → 4-bit | 2–4× memory reduction |\n",
+    "| **INT8 Quantization** | Quantize activations during inference | 1.5–2× speedup |\n",
+    "| **KV-Cache Optimization** | Efficient attention cache memory layout | Faster long-context generation |\n",
+    "| **Graph Compilation** | Hardware-specific kernel fusion | Lower latency per token |\n",
+    "\n",
+    "### Integration Approaches\n",
+    "\n",
+    "**Option A: OpenVINO Model Server (OVMS)** — Drop-in replacement for Ollama/OpenAI API\n",
+    "\n",
+    "```bash\n",
+    "# Convert and serve a Hugging Face model with INT4 quantization\n",
+    "pip install optimum[openvino]\n",
+    "\n",
+    "optimum-cli export openvino \\\n",
+    "    --model Qwen/Qwen2.5-3B-Instruct \\\n",
+    "    --weight-format int4 \\\n",
+    "    --output ./qwen2.5-3b-int4-ov\n",
+    "```\n",
+    "\n",
+    "**Option B: `openvino-genai` Python API** — Direct inference without Ollama\n",
+    "\n",
+    "```python\n",
+    "# pip install openvino-genai\n",
+    "import openvino_genai as ov_genai\n",
+    "\n",
+    "pipe = ov_genai.LLMPipeline(\"./qwen2.5-3b-int4-ov\", device=\"CPU\")\n",
+    "result = pipe.generate(\"What is OpenVINO?\", max_new_tokens=200)\n",
+    "print(result)\n",
+    "```\n",
+    "\n",
+    "**Option C: LangChain + OpenVINO** — Plug into the existing RAG pipeline\n",
+    "\n",
+    "```python\n",
+    "# pip install langchain-community openvino\n",
+    "from langchain_community.llms import HuggingFacePipeline\n",
+    "from optimum.intel import OVModelForCausalLM\n",
+    "from transformers import AutoTokenizer, pipeline\n",
+    "\n",
+    "model_id = \"./qwen2.5-3b-int4-ov\"\n",
+    "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
+    "ov_model  = OVModelForCausalLM.from_pretrained(model_id)\n",
+    "\n",
+    "ov_pipeline = pipeline(\"text-generation\", model=ov_model, tokenizer=tokenizer)\n",
+    "llm = HuggingFacePipeline(pipeline=ov_pipeline)\n",
+    "\n",
+    "# Drop-in replacement: use `llm` anywhere ask_llm() is called\n",
+    "```\n",
+    "\n",
+    "### Hardware Support Matrix\n",
+    "\n",
+    "| Hardware | OpenVINO Device | Notes |\n",
+    "|---|---|---|\n",
+    "| Intel CPU (Core / Xeon) | `CPU` | Fully supported, recommended for CPU-only |\n",
+    "| Intel iGPU (Iris Xe, Arc) | `GPU` | Requires OpenCL drivers |\n",
+    "| Intel NPU (Meteor Lake+) | `NPU` | Best for sustained generation tasks |\n",
+    "| ARM64 (e.g., Oracle A1) | `CPU` | Supported but no hardware-specific tuning |\n",
+    "\n",
+    "### Checking OpenVINO Availability"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "openvino-check",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ── Optional: Check if OpenVINO is available ──────────────────────────────────\n",
+    "# This cell gracefully handles the case where OpenVINO is not installed.\n",
+    "\n",
+    "try:\n",
+    "    import openvino as ov\n",
+    "    \n",
+    "    core = ov.Core()\n",
+    "    available_devices = core.available_devices\n",
+    "    \n",
+    "    print(\"✅ OpenVINO is installed!\")\n",
+    "    print(f\"   Version: {ov.__version__}\")\n",
+    "    print(f\"   Available devices: {available_devices}\")\n",
+    "    print()\n",
+    "    \n",
+    "    for device in available_devices:\n",
+    "        try:\n",
+    "            full_name = core.get_property(device, \"FULL_DEVICE_NAME\")\n",
+    "            print(f\"   {device}: {full_name}\")\n",
+    "        except Exception:\n",
+    "            print(f\"   {device}: (details unavailable)\")\n",
+    "    \n",
+    "    print()\n",
+    "    print(\"💡 To use OpenVINO for LLM inference, see the integration examples above.\")\n",
+    "    print(\"   Recommended: openvino-genai with INT4-quantized Qwen2.5-3B\")\n",
+    "\n",
+    "except ImportError:\n",
+    "    print(\"ℹ️  OpenVINO is not installed — the pipeline above works without it.\")\n",
+    "    print()\n",
+    "    print(\"   To install OpenVINO for accelerated Intel CPU/GPU inference:\")\n",
+    "    print(\"   pip install openvino openvino-genai optimum[openvino]\")\n",
+    "    print()\n",
+    "    print(\"   See: https://docs.openvino.ai/latest/get_started.html\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "conclusion-md",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 🎉 Section 10: Conclusion\n",
+    "\n",
+    "Congratulations! You have built a complete **local Agentic RAG pipeline** from scratch. Here's what each component contributed:\n",
+    "\n",
+    "| Component | Role | Key Benefit |\n",
+    "|---|---|---|\n",
+    "| **Ollama** | Local LLM inference server | Privacy, offline, no API costs |\n",
+    "| **ChromaDB** | Vector database | Fast semantic document retrieval |\n",
+    "| **`all-MiniLM-L6-v2`** | Embedding model | Lightweight, CPU-friendly |\n",
+    "| **LangGraph** | Agent orchestration | Flexible, stateful, loop-capable |\n",
+    "| **OpenVINO** *(optional)* | Inference optimization | Faster tokens on Intel hardware |\n",
+    "\n",
+    "### 🧩 Pipeline Summary\n",
+    "\n",
+    "```\n",
+    "User Query\n",
+    "    ↓\n",
+    "Classify: needs retrieval?\n",
+    "    ├── YES → ChromaDB similarity search → RAG prompt → Ollama LLM\n",
+    "    └── NO  → Direct prompt → Ollama LLM\n",
+    "    ↓\n",
+    "Answer (+ optional tool results)\n",
+    "```\n",
+    "\n",
+    "### 🚀 Suggested Extensions\n",
+    "\n",
+    "Here are practical ways to extend this notebook into a production-grade system:\n",
+    "\n",
+    "1. **Load real documents** — Use `langchain.document_loaders` to ingest PDFs, web pages, or entire directories\n",
+    "\n",
+    "2. **Add conversation memory** — Store chat history in LangGraph state to support multi-turn dialogue\n",
+    "\n",
+    "3. **Upgrade the embedding model** — Try `BAAI/bge-small-en-v1.5` or `nomic-ai/nomic-embed-text-v1.5` for better retrieval quality\n",
+    "\n",
+    "4. **Add more tools** — Web search (via DuckDuckGo API), code execution, calendar lookup\n",
+    "\n",
+    "5. **Plug in OpenVINO** — Follow Section 9 to convert your Ollama model to OpenVINO IR format for faster CPU inference\n",
+    "\n",
+    "6. **Add a Gradio UI** — Wrap the `run_agent()` function in a simple web interface with `gr.ChatInterface`\n",
+    "\n",
+    "7. **Evaluate retrieval quality** — Use `ragas` library to measure faithfulness, answer relevancy, and context precision\n",
+    "\n",
+    "### 📚 Further Reading\n",
+    "\n",
+    "- [OpenVINO Documentation](https://docs.openvino.ai)\n",
+    "- [OpenVINO Notebooks Repository](https://github.com/openvinotoolkit/openvino_notebooks)\n",
+    "- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)\n",
+    "- [ChromaDB Documentation](https://docs.trychroma.com)\n",
+    "- [Ollama Model Library](https://ollama.com/library)\n",
+    "- [Qwen2.5 Model Card](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)\n",
+    "\n",
+    "---\n",
+    "*This notebook was designed to follow [OpenVINO Notebooks](https://github.com/openvinotoolkit/openvino_notebooks) contribution standards: CPU-first, beginner-friendly, and fully reproducible on consumer hardware.*"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.14.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}