diff --git a/notebooks/llm-agent-langgraph/README.md b/notebooks/llm-agent-langgraph/README.md new file mode 100644 index 00000000000..28d356507cf --- /dev/null +++ b/notebooks/llm-agent-langgraph/README.md @@ -0,0 +1,31 @@ +# Self-Correcting RAG Agent with LangGraph and OpenVINO + +Standard Retrieval-Augmented Generation (RAG) pipelines follow a linear path: retrieve documents, then generate an answer. If retrieval returns irrelevant documents, the model often hallucinates an answer instead of acknowledging uncertainty. + +This notebook demonstrates a **self-correcting RAG agent** powered by [LangGraph](https://langchain-ai.github.io/langgraph/) and [OpenVINO](https://docs.openvino.ai/). The agent implements a stateful workflow with conditional branching: + +1. **Retrieve** documents from a ChromaDB vector store +2. **Grade** each document for relevance using an OpenVINO-optimized LLM +3. **Generate** an answer if relevant documents are found +4. **Rewrite** the query and re-retrieve if documents are irrelevant + +All LLM inference is accelerated with OpenVINO on Intel CPUs, GPUs, and NPUs. + +## Notebook Contents + +The tutorial consists of the following steps: + +- Install prerequisites +- Select and convert a model (Phi-3-mini, Llama-3.2-1B, or Qwen2.5-1.5B) to OpenVINO IR with INT4 compression +- Build a knowledge base with document chunking and ChromaDB +- Define the LangGraph agent with four nodes: retrieve, grade, generate, rewrite +- Run the self-correcting agent on sample queries +- Interactive demo with Gradio UI showing agent reasoning + +## Installation Instructions + +This is a self-contained example that relies solely on its own code.
+We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. +For details, please refer to [Installation Guide](../../README.md). + + diff --git a/notebooks/llm-agent-langgraph/llm-agent-langgraph.ipynb b/notebooks/llm-agent-langgraph/llm-agent-langgraph.ipynb new file mode 100644 index 00000000000..e1b1d24d423 --- /dev/null +++ b/notebooks/llm-agent-langgraph/llm-agent-langgraph.ipynb @@ -0,0 +1,835 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "top", + "metadata": {}, + "source": [ + "# Self-Correcting RAG Agent with LangGraph and OpenVINO\n", + "\n", + "Standard Retrieval-Augmented Generation (RAG) pipelines follow a linear path: retrieve documents, then generate an answer. If the retrieval step returns irrelevant documents, the model often hallucinates.\n", + "\n", + "This notebook demonstrates a **self-correcting RAG agent** that uses [LangGraph](https://langchain-ai.github.io/langgraph/) to orchestrate a stateful workflow with conditional branching. The agent:\n", + "\n", + "1. **Retrieves** documents from a vector store\n", + "2. **Grades** each document for relevance using an LLM\n", + "3. **Generates** an answer if documents are relevant\n", + "4. **Rewrites** the query and re-retrieves if documents are irrelevant\n", + "\n", + "All LLM inference runs on [OpenVINO](https://docs.openvino.ai/) for optimized performance on Intel hardware.\n", + "\n", + "![self-correcting-rag](https://raw.githubusercontent.com/langchain-ai/langgraph/main/docs/docs/tutorials/rag/img/langgraph_self_rag.png)\n", + "\n", + "\n", + "#### Table of contents:\n", + "\n", + "- [Prerequisites](#Prerequisites)\n", + "- [Prepare Model and Tokenizer](#Prepare-Model-and-Tokenizer)\n", + " - [Select Model](#Select-Model)\n", + " - [Convert and Compress Model](#Convert-and-Compress-Model)\n", + " - [Select Inference Device](#Select-Inference-Device)\n", + " - [Load Model with OpenVINO](#Load-Model-with-OpenVINO)\n", + "- [Build the Knowledge Base](#Build-the-Knowledge-Base)\n", + " - [Load and Chunk Documents](#Load-and-Chunk-Documents)\n", + " - [Create Embeddings and Vector Store](#Create-Embeddings-and-Vector-Store)\n", + "- [Define the RAG Agent Graph](#Define-the-RAG-Agent-Graph)\n", + " - [Define Agent State](#Define-Agent-State)\n", + " - [Retrieval Node](#Retrieval-Node)\n", + " - [Document Grader Node](#Document-Grader-Node)\n", + " - [Answer Generator Node](#Answer-Generator-Node)\n", + " - [Query Rewriter Node](#Query-Rewriter-Node)\n", + " - [Routing Logic](#Routing-Logic)\n", + " - [Assemble the Graph](#Assemble-the-Graph)\n", + "- [Run the Agent](#Run-the-Agent)\n", + "- [Interactive Demo with Gradio](#Interactive-Demo-with-Gradio)\n", + "\n", + "### Installation Instructions\n", + "\n", + "This is a self-contained example that relies solely on its own code.\n", + "\n", + "We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.\n", + "For details, please refer to [Installation Guide](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/README.md#-installation-guide).\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "prereqs-header", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "Install required dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fetch-utils", + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "from pathlib import Path\n", + "\n", + "if not Path(\"notebook_utils.py\").exists():\n", + " r = requests.get(\n", + " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n", + " )\n", + " open(\"notebook_utils.py\", \"w\").write(r.text)\n", + "\n", + "if not Path(\"cmd_helper.py\").exists():\n", + " r = requests.get(url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/cmd_helper.py\")\n", + " open(\"cmd_helper.py\", \"w\", encoding=\"utf-8\").write(r.text)\n", + "\n", + "# Read more about telemetry collection at https://github.com/openvinotoolkit/openvino_notebooks?tab=readme-ov-file#-telemetry\n", + "from notebook_utils import collect_telemetry\n", + "\n", + "collect_telemetry(\"llm-agent-langgraph.ipynb\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "pip-install", + "metadata": {}, + "outputs": [], + "source": "import os\n\nos.environ[\"GIT_CLONE_PROTECTION_ACTIVE\"] = \"false\"\n\n%pip install -Uq pip\n%pip uninstall -q -y optimum optimum-intel optimum-onnx\n%pip install --pre -Uq \"openvino>=2025.3.0\" openvino-tokenizers[transformers] --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly\n%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu \"transformers==4.53.3\" \\\n \"langchain>=0.3.0,<1.0\" \\\n \"langchain-community>=0.3.0,<1.0\" \\\n \"langchain-huggingface>=0.1.2\" \\\n \"langgraph>=0.2.0,<1.0\" \\\n \"chromadb>=0.5.0,<1.0\" \\\n \"sentence-transformers>=3.0.0\" \\\n \"nncf>=2.18.0\" \\\n \"torch==2.8\" \\\n \"torchvision==0.23.0\" \\\n \"datasets<4.0.0\" \\\n \"accelerate\" \\\n \"bs4\" \\\n \"gradio>=4.19,<6\" \\\n \"ipywidgets\" \\\n \"huggingface-hub>=0.26.5\"\n%pip install -q \"git+https://github.com/huggingface/optimum-intel.git\" --extra-index-url https://download.pytorch.org/whl/cpu" + }, + { + "cell_type": "markdown", + "id": "model-header", + "metadata": {}, + "source": [ + "## Prepare Model and Tokenizer\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "We use [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) as our LLM. It is small enough for fast inference yet capable enough for document grading and answer generation. The model is converted to OpenVINO IR format and compressed to INT4 for efficient inference." + ] + }, + { + "cell_type": "markdown", + "id": "select-model-header", + "metadata": {}, + "source": [ + "### Select Model\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "select-model", + "metadata": {}, + "outputs": [], + "source": "import ipywidgets as widgets\n\nmodel_ids = [\n \"microsoft/Phi-3-mini-4k-instruct\",\n \"Qwen/Qwen2.5-1.5B-Instruct\",\n \"meta-llama/Llama-3.2-1B-Instruct\",\n]\n\nllm_model_id = widgets.Dropdown(\n options=model_ids,\n value=model_ids[0],\n description=\"Model:\",\n disabled=False,\n)\n\nllm_model_id" + }, + { + "cell_type": "markdown", + "id": "convert-header", + "metadata": {}, + "source": [ + "### Convert and Compress Model\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "Export the model to OpenVINO IR format with INT4 weight compression using the Optimum CLI. This reduces memory footprint and speeds up inference significantly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "convert-model", + "metadata": {}, + "outputs": [], + "source": "import gc\nfrom cmd_helper import optimum_cli\n\nllm_model_path = llm_model_id.value.split(\"/\")[-1]\n\nmodel_path = Path(llm_model_path) / \"INT4\"\n\nif not model_path.exists():\n optimum_cli(\n llm_model_id.value,\n model_path,\n additional_args={\n \"task\": \"text-generation-with-past\",\n \"weight-format\": \"int4\",\n \"group-size\": \"128\",\n \"ratio\": \"1.0\",\n \"trust-remote-code\": \"\",\n },\n )\n gc.collect()\nelse:\n print(f\"Model already converted: {model_path}\")" + }, + { + "cell_type": "markdown", + "id": "device-header", + "metadata": {}, + "source": [ + "### Select Inference Device\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "select-device", + "metadata": {}, + "outputs": [], + "source": [ + "from notebook_utils import device_widget\n", + "\n", + "device = device_widget(\"CPU\", exclude=[\"NPU\"])\n", + "\n", + "device" + ] + }, + { + "cell_type": "markdown", + "id": "load-model-header", + "metadata": {}, + "source": [ + "### Load Model with OpenVINO\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "OpenVINO models can be run locally through the `HuggingFacePipeline` class in LangChain. To deploy a model with OpenVINO, you can specify the `backend=\"openvino\"` parameter to trigger OpenVINO as backend inference framework. For [more information](https://python.langchain.com/docs/integrations/llms/openvino/)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "load-model", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_huggingface import HuggingFacePipeline, ChatHuggingFace\n", + "\n", + "import openvino.properties as props\n", + "import openvino.properties.hint as hints\n", + "import openvino.properties.streams as streams\n", + "\n", + "ov_config = {hints.performance_mode(): hints.PerformanceMode.LATENCY, streams.num(): \"1\", props.cache_dir(): \"\"}\n", + "\n", + "ov_llm = HuggingFacePipeline.from_model_id(\n", + " model_id=str(model_path),\n", + " task=\"text-generation\",\n", + " backend=\"openvino\",\n", + " model_kwargs={\n", + " \"device\": device.value,\n", + " \"ov_config\": ov_config,\n", + " \"trust_remote_code\": True,\n", + " },\n", + " pipeline_kwargs={\"max_new_tokens\": 2048},\n", + ")\n", + "\n", + "chat_model = ChatHuggingFace(llm=ov_llm, verbose=True)" + ] + }, + { + "cell_type": "markdown", + "id": "kb-header", + "metadata": {}, + "source": [ + "## Build the Knowledge Base\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "We create a small knowledge base from a web page to demonstrate the RAG pipeline. The documents are split into chunks and embedded into a ChromaDB vector store." + ] + }, + { + "cell_type": "markdown", + "id": "load-docs-header", + "metadata": {}, + "source": [ + "### Load and Chunk Documents\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "load-docs", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.document_loaders import WebBaseLoader\n", + "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", + "\n", + "# Load a sample document about OpenVINO\n", + "loader = WebBaseLoader(\n", + " web_paths=[\"https://docs.openvino.ai/latest/about-openvino.html\"],\n", + ")\n", + "docs = loader.load()\n", + "\n", + "# Split into chunks\n", + "text_splitter = RecursiveCharacterTextSplitter(\n", + " chunk_size=500,\n", + " chunk_overlap=100,\n", + ")\n", + "splits = text_splitter.split_documents(docs)\n", + "\n", + "print(f\"Loaded {len(splits)} document chunks\")" + ] + }, + { + "cell_type": "markdown", + "id": "embeddings-header", + "metadata": {}, + "source": [ + "### Create Embeddings and Vector Store\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "We use `bge-small-en-v1.5` as the embedding model and store vectors in ChromaDB." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "create-vectorstore", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_huggingface import HuggingFaceEmbeddings\n", + "from langchain_community.vectorstores import Chroma\n", + "\n", + "embedding_model_id = \"BAAI/bge-small-en-v1.5\"\n", + "\n", + "embeddings = HuggingFaceEmbeddings(\n", + " model_name=embedding_model_id,\n", + " model_kwargs={\"device\": \"cpu\"},\n", + " encode_kwargs={\"normalize_embeddings\": True},\n", + ")\n", + "\n", + "vectorstore = Chroma.from_documents(\n", + " documents=splits,\n", + " embedding=embeddings,\n", + " collection_name=\"openvino-docs\",\n", + ")\n", + "\n", + "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 4})\n", + "\n", + "print(f\"Vector store created with {len(splits)} documents\")" + ] + }, + { + "cell_type": "markdown", + "id": "agent-header", + "metadata": {}, + "source": [ + "## Define the RAG Agent Graph\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "Now we define the core of the self-correcting RAG agent using LangGraph. The agent is a state machine with four nodes:\n", + "\n", + "- **retrieve**: fetch documents from the vector store\n", + "- **grade_documents**: use the LLM to judge document relevance\n", + "- **generate**: produce an answer from relevant documents\n", + "- **rewrite_query**: reformulate the query if documents are irrelevant\n", + "\n", + "The graph has a conditional edge after grading: if documents are relevant, proceed to generation; otherwise, rewrite the query and re-retrieve." + ] + }, + { + "cell_type": "markdown", + "id": "state-header", + "metadata": {}, + "source": [ + "### Define Agent State\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "define-state", + "metadata": {}, + "outputs": [], + "source": [ + "from typing import TypedDict\n", + "from langchain_core.documents import Document\n", + "\n", + "\n", + "class AgentState(TypedDict):\n", + " \"\"\"State of the self-correcting RAG agent.\"\"\"\n", + "\n", + " question: str\n", + " documents: list[Document]\n", + " generation: str\n", + " rewrite_count: int\n", + " max_rewrites: int" + ] + }, + { + "cell_type": "markdown", + "id": "retrieve-header", + "metadata": {}, + "source": [ + "### Retrieval Node\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "retrieve-node", + "metadata": {}, + "outputs": [], + "source": [ + "def retrieve(state: AgentState) -> AgentState:\n", + " \"\"\"Retrieve documents from the vector store.\"\"\"\n", + " question = state[\"question\"]\n", + " display_query = question[:80] + \"...\" if len(question) > 80 else question\n", + " print(f\"--- RETRIEVE (query: {display_query}) ---\")\n", + " documents = retriever.invoke(question)\n", + " return {\"documents\": documents}" + ] + }, + { + "cell_type": "markdown", + "id": "grader-header", + "metadata": {}, + "source": [ + "### Document Grader Node\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "The grader uses the LLM to determine whether each retrieved document is relevant to the question. If at least one document is relevant, the agent proceeds to generation. Otherwise, it rewrites the query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "grader-node", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.prompts import ChatPromptTemplate\n", + "from langchain_core.output_parsers import StrOutputParser\n", + "\n", + "GRADER_PROMPT = ChatPromptTemplate.from_messages(\n", + " [\n", + " (\n", + " \"system\",\n", + " \"You are a document relevance grader. Given a user question and a retrieved document, \"\n", + " \"determine if the document contains information relevant to answering the question. \"\n", + " \"Respond with only 'yes' or 'no'.\",\n", + " ),\n", + " (\n", + " \"human\",\n", + " \"Question: {question}\\n\\nDocument:\\n{document}\\n\\nIs this document relevant? (yes/no)\",\n", + " ),\n", + " ]\n", + ")\n", + "\n", + "grader_chain = GRADER_PROMPT | chat_model | StrOutputParser()\n", + "\n", + "\n", + "def grade_documents(state: AgentState) -> AgentState:\n", + " \"\"\"Grade retrieved documents for relevance.\"\"\"\n", + " print(\"--- GRADE DOCUMENTS ---\")\n", + " question = state[\"question\"]\n", + " documents = state[\"documents\"]\n", + "\n", + " relevant_docs = []\n", + " for doc in documents:\n", + " score = grader_chain.invoke(\n", + " {\"question\": question, \"document\": doc.page_content[:300]}\n", + " )\n", + " grade = \"yes\" if \"yes\" in score.lower() else \"no\"\n", + " if grade == \"yes\":\n", + " print(f\" + Relevant: {doc.page_content[:60]}\")\n", + " relevant_docs.append(doc)\n", + " else:\n", + " print(f\" - Not relevant: {doc.page_content[:60]}\")\n", + "\n", + " return {\"documents\": relevant_docs}" + ] + }, + { + "cell_type": "markdown", + "id": "generate-header", + "metadata": {}, + "source": [ + "### Answer Generator Node\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "generate-node", + "metadata": {}, + "outputs": [], + "source": [ + "GENERATE_PROMPT = ChatPromptTemplate.from_messages(\n", + " [\n", + " (\n", + " \"system\",\n", + " \"You are an assistant for question-answering tasks. Use the following pieces of \"\n", + " \"retrieved context to answer the question. If you don't know the answer, just say \"\n", + " \"that you don't know. Use three sentences maximum and keep the answer concise.\",\n", + " ),\n", + " (\n", + " \"human\",\n", + " \"Context:\\n{context}\\n\\nQuestion: {question}\",\n", + " ),\n", + " ]\n", + ")\n", + "\n", + "generate_chain = GENERATE_PROMPT | chat_model | StrOutputParser()\n", + "\n", + "\n", + "def generate(state: AgentState) -> AgentState:\n", + " \"\"\"Generate an answer using relevant documents.\"\"\"\n", + " print(\"--- GENERATE ---\")\n", + " context = \"\\n\\n\".join(doc.page_content for doc in state[\"documents\"])\n", + " generation = generate_chain.invoke(\n", + " {\"context\": context, \"question\": state[\"question\"]}\n", + " )\n", + " return {\"generation\": generation}" + ] + }, + { + "cell_type": "markdown", + "id": "rewrite-header", + "metadata": {}, + "source": [ + "### Query Rewriter Node\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "rewrite-node", + "metadata": {}, + "outputs": [], + "source": [ + "REWRITE_PROMPT = ChatPromptTemplate.from_messages(\n", + " [\n", + " (\n", + " \"system\",\n", + " \"You are a query rewriter. Given a user question that did not retrieve \"\n", + " \"relevant documents, rewrite it to be more specific and likely to match \"\n", + " \"relevant content. Return only the rewritten question.\",\n", + " ),\n", + " (\n", + " \"human\",\n", + " \"Original question: {question}\\n\\nRewritten question:\",\n", + " ),\n", + " ]\n", + ")\n", + "\n", + "rewrite_chain = REWRITE_PROMPT | chat_model | StrOutputParser()\n", + "\n", + "\n", + "def rewrite_query(state: AgentState) -> AgentState:\n", + " \"\"\"Rewrite the query to improve retrieval.\"\"\"\n", + " print(\"--- REWRITE QUERY ---\")\n", + " new_question = rewrite_chain.invoke({\"question\": state[\"question\"]})\n", + " # Take only the first line to avoid prompt leakage\n", + " new_question = new_question.strip().split(\"\\n\")[0]\n", + " print(f\" Rewritten: {new_question}\")\n", + " rewrite_count = state.get(\"rewrite_count\", 0) + 1\n", + " return {\"question\": new_question, \"rewrite_count\": rewrite_count}" + ] + }, + { + "cell_type": "markdown", + "id": "routing-header", + "metadata": {}, + "source": [ + "### Routing Logic\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "After grading, the agent decides whether to generate an answer or rewrite the query. If no relevant documents were found and we have not exceeded the maximum number of rewrites, the query is rewritten." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "routing", + "metadata": {}, + "outputs": [], + "source": [ + "def decide_next_step(state: AgentState) -> str:\n", + " \"\"\"Route to generation or query rewriting based on grading results.\"\"\"\n", + " if state[\"documents\"]:\n", + " print(\"--- DECISION: Relevant documents found -> GENERATE ---\")\n", + " return \"generate\"\n", + "\n", + " max_rewrites = state.get(\"max_rewrites\", 2)\n", + " rewrite_count = state.get(\"rewrite_count\", 0)\n", + "\n", + " if rewrite_count >= max_rewrites:\n", + " print(f\"--- DECISION: Max rewrites ({max_rewrites}) reached -> GENERATE with available context ---\")\n", + " return \"generate\"\n", + "\n", + " print(\"--- DECISION: No relevant documents -> REWRITE QUERY ---\")\n", + " return \"rewrite\"" + ] + }, + { + "cell_type": "markdown", + "id": "assemble-header", + "metadata": {}, + "source": [ + "### Assemble the Graph\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "assemble-graph", + "metadata": {}, + "outputs": [], + "source": [ + "from langgraph.graph import StateGraph, END\n", + "\n", + "workflow = StateGraph(AgentState)\n", + "\n", + "# Add nodes\n", + "workflow.add_node(\"retrieve\", retrieve)\n", + "workflow.add_node(\"grade_documents\", grade_documents)\n", + "workflow.add_node(\"generate\", generate)\n", + "workflow.add_node(\"rewrite_query\", rewrite_query)\n", + "\n", + "# Set entry point\n", + "workflow.set_entry_point(\"retrieve\")\n", + "\n", + "# Add edges\n", + "workflow.add_edge(\"retrieve\", \"grade_documents\")\n", + "workflow.add_conditional_edges(\n", + " \"grade_documents\",\n", + " decide_next_step,\n", + " {\n", + " \"generate\": \"generate\",\n", + " \"rewrite\": \"rewrite_query\",\n", + " },\n", + ")\n", + "workflow.add_edge(\"rewrite_query\", \"retrieve\")\n", + "workflow.add_edge(\"generate\", END)\n", + "\n", + "# Compile\n", + "app = workflow.compile()\n", + "\n", + "print(\"Agent graph compiled successfully!\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "visualize-graph", + "metadata": {}, + "outputs": [], + "source": [ + "# Visualize the agent graph (optional, requires pygraphviz or mermaid)\n", + "try:\n", + " from IPython.display import Image, display\n", + "\n", + " display(Image(app.get_graph().draw_mermaid_png()))\n", + "except Exception:\n", + " # Fall back to text representation if visualization deps are not available\n", + " print(\"Graph nodes:\", list(app.get_graph().nodes))\n", + " print(\"Graph edges:\", list(app.get_graph().edges))" + ] + }, + { + "cell_type": "markdown", + "id": "run-header", + "metadata": {}, + "source": [ + "## Run the Agent\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "Let's test the self-correcting RAG agent with a question about OpenVINO." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "run-agent", + "metadata": {}, + "outputs": [], + "source": [ + "# Test with a relevant question\n", + "result = app.invoke(\n", + " {\n", + " \"question\": \"What hardware does OpenVINO support for inference?\",\n", + " \"documents\": [],\n", + " \"generation\": \"\",\n", + " \"rewrite_count\": 0,\n", + " \"max_rewrites\": 2,\n", + " }\n", + ")\n", + "\n", + "print(\"\\n\" + \"=\" * 60)\n", + "print(\"FINAL ANSWER:\")\n", + "print(\"=\" * 60)\n", + "print(result[\"generation\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "run-agent-rewrite", + "metadata": {}, + "outputs": [], + "source": [ + "# Test with a vague question that may trigger query rewriting\n", + "result = app.invoke(\n", + " {\n", + " \"question\": \"How do I speed up my model?\",\n", + " \"documents\": [],\n", + " \"generation\": \"\",\n", + " \"rewrite_count\": 0,\n", + " \"max_rewrites\": 2,\n", + " }\n", + ")\n", + "\n", + "print(\"\\n\" + \"=\" * 60)\n", + "print(\"FINAL ANSWER:\")\n", + "print(\"=\" * 60)\n", + "print(result[\"generation\"])" + ] + }, + { + "cell_type": "markdown", + "id": "gradio-header", + "metadata": {}, + "source": [ + "## Interactive Demo with Gradio\n", + "\n", + "[back to top ⬆️](#Table-of-contents:)\n", + "\n", + "Launch an interactive interface to chat with the self-correcting RAG agent. The interface shows each step the agent takes: retrieval, grading, optional query rewriting, and final generation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gradio-demo", + "metadata": {}, + "outputs": [], + "source": [ + "import gradio as gr\n", + "\n", + "\n", + "def run_agent(question: str) -> str:\n", + " \"\"\"Run the self-correcting RAG agent and return a formatted response.\"\"\"\n", + " import io\n", + " import contextlib\n", + "\n", + " # Capture print output to show agent reasoning\n", + " log_buffer = io.StringIO()\n", + " with contextlib.redirect_stdout(log_buffer):\n", + " result = app.invoke(\n", + " {\n", + " \"question\": question,\n", + " \"documents\": [],\n", + " \"generation\": \"\",\n", + " \"rewrite_count\": 0,\n", + " \"max_rewrites\": 2,\n", + " }\n", + " )\n", + "\n", + " reasoning = log_buffer.getvalue()\n", + " answer = result.get(\"generation\", \"No answer generated.\")\n", + "\n", + " output = f\"**Agent Reasoning:**\\n```\\n{reasoning}```\\n\\n**Answer:**\\n{answer}\"\n", + " return output\n", + "\n", + "\n", + "demo = gr.Interface(\n", + " fn=run_agent,\n", + " inputs=gr.Textbox(\n", + " label=\"Question\",\n", + " placeholder=\"Ask a question about OpenVINO...\",\n", + " lines=2,\n", + " ),\n", + " outputs=gr.Markdown(label=\"Response\"),\n", + " title=\"Self-Correcting RAG Agent with OpenVINO\",\n", + " description=\"Ask questions about OpenVINO. The agent retrieves documents, grades their relevance, and rewrites the query if needed.\",\n", + " examples=[\n", + " [\"What hardware does OpenVINO support?\"],\n", + " [\"How do I optimize a model for Intel GPUs?\"],\n", + " [\"What is the difference between FP16 and INT8 quantization?\"],\n", + " ],\n", + " allow_flagging=\"never\",\n", + ")\n", + "\n", + "try:\n", + " demo.launch(debug=True)\n", + "except Exception:\n", + " demo.launch(share=True, debug=True)\n", + "# If you are launching remotely, specify server_name and server_port\n", + "# EXAMPLE: `demo.launch(server_name='your server name', server_port='server port in int')`\n", + "# To learn more please refer to the Gradio docs: https://gradio.app/docs/" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cleanup", + "metadata": {}, + "outputs": [], + "source": [ + "# # Cleanup: uncomment to remove downloaded model files\n", + "# import shutil\n", + "# model_dir = Path(llm_model_id.value.split(\"/\")[-1])\n", + "# if model_dir.exists():\n", + "# shutil.rmtree(model_dir)\n", + "# print(f\"Removed {model_dir}\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.0" + }, + "openvino_notebooks": { + "imageUrl": "https://raw.githubusercontent.com/langchain-ai/langgraph/main/docs/docs/tutorials/rag/img/langgraph_self_rag.png", + "tags": { + "categories": [ + "Model Demos", + "AI Trends" + ], + "libraries": [ + "LangChain", + "LangGraph" + ], + "other": [ + "LLM" + ], + "tasks": [ + "Text Generation" + ] + } + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file