Refactor, add workflow diagram, links

leemthompo · leemthompo · commit da060e963216 · 2025-01-06T16:57:35.000+01:00
diff --git a/docs/reference/images/search/rag-schema.svg b/docs/reference/images/search/rag-schema.svg
@@ -0,0 +1,85 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<svg viewBox="-20 -80 840 380" xmlns="http://www.w3.org/2000/svg">
+    <!-- Title -->
+    <text x="50" y="-55" text-anchor="start" font-family="Arial" font-size="16" font-weight="bold" fill="#333">
+        Retrieval Augmented Generation with Elasticsearch
+    </text>
+
+    <!-- Arrow markers definition -->
+    <defs>
+        <marker id="arrowhead" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
+            <polygon points="0 0, 10 3.5, 0 7" fill="#666"/>
+        </marker>
+        <marker id="arrowhead-blue" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
+            <polygon points="0 0, 10 3.5, 0 7" fill="#0066cc"/>
+        </marker>
+    </defs>
+    
+    <!-- Custom Instructions component -->
+    <g>
+        <text x="510" y="-15" text-anchor="middle" font-family="Arial" font-size="8" fill="#666">Define how the model should</text>
+        <text x="510" y="-5" text-anchor="middle" font-family="Arial" font-size="8" fill="#666">parse and render information</text>
+        <rect x="450" y="5" width="120" height="35" rx="10" fill="#e8f0f9" stroke="#0066cc"/>
+        <text x="510" y="25" text-anchor="middle" font-family="Arial" font-size="12" fill="#333">Custom instructions</text>
+        <line x1="510" y1="40" x2="510" y2="95" stroke="#0066cc" stroke-width="2" stroke-dasharray="2" marker-end="url(#arrowhead-blue)"/>
+    </g>
+
+    <!-- Search Strategy component -->
+    <g>
+        <text x="310" y="-15" text-anchor="middle" font-family="Arial" font-size="8" fill="#666">Use full-text, semantic, or hybrid search</text>
+        <rect x="250" y="5" width="120" height="35" rx="10" fill="#e8f0f9" stroke="#0066cc"/>
+        <text x="310" y="25" text-anchor="middle" font-family="Arial" font-size="12" fill="#333">Search strategy</text>
+        <line x1="310" y1="40" x2="310" y2="95" stroke="#0066cc" stroke-width="2" stroke-dasharray="2" marker-end="url(#arrowhead-blue)"/>
+    </g>
+
+    <!-- Number circles - positioned uniformly -->
+    <g>
+        <circle cx="40" cy="85" r="10" fill="#333"/>
+        <text x="40" y="89" text-anchor="middle" font-family="Arial" font-size="12" fill="white">1</text>
+        
+        <circle cx="240" cy="85" r="10" fill="#333"/>
+        <text x="240" y="89" text-anchor="middle" font-family="Arial" font-size="12" fill="white">2</text>
+        
+        <circle cx="440" cy="85" r="10" fill="#333"/>
+        <text x="440" y="89" text-anchor="middle" font-family="Arial" font-size="12" fill="white">3</text>
+        
+        <circle cx="640" cy="85" r="10" fill="#333"/>
+        <text x="640" y="89" text-anchor="middle" font-family="Arial" font-size="12" fill="white">4</text>
+    </g>
+
+    <!-- Main flow components -->
+    <g>
+        <!-- Input component -->
+        <path d="M50,100 h90 a30,30 0 0 1 0,60 h-90 a30,30 0 0 1 0,-60" fill="#edf7ec" stroke="#006400"/>
+        <text x="95" y="135" text-anchor="middle" font-family="Arial" font-size="12" fill="#006400">User query</text>
+    </g>
+
+    <g>
+        <!-- Search component -->
+        <rect x="250" y="100" width="120" height="60" rx="10" fill="#f0f0f0" stroke="#000"/>
+        <text x="310" y="135" text-anchor="middle" font-family="Arial" font-size="12" fill="#333">Elasticsearch</text>
+        <text x="310" y="180" text-anchor="middle" font-family="Arial" font-size="10" fill="#666">
+            <tspan font-weight="bold">Retrieves</tspan> relevant</text>
+        <text x="310" y="195" text-anchor="middle" font-family="Arial" font-size="10" fill="#666">documents</text>
+    </g>
+
+    <g>
+        <!-- Processing component -->
+        <rect x="450" y="100" width="120" height="60" rx="10" fill="#f0f0f0" stroke="#000"/>
+        <text x="510" y="135" text-anchor="middle" font-family="Arial" font-size="12" fill="#333">Language model</text>
+        <text x="510" y="180" text-anchor="middle" font-family="Arial" font-size="10" fill="#666">Processes context &amp;</text>
+        <text x="510" y="195" text-anchor="middle" font-family="Arial" font-size="10" fill="#666">
+            <tspan font-weight="bold">generates</tspan> answer</text>
+    </g>
+
+    <g>
+        <!-- Output component -->
+        <path d="M710,100 l60,30 l-60,30 l-60,-30 z" fill="#edf7ec" stroke="#006400"/>
+        <text x="710" y="135" text-anchor="middle" font-family="Arial" font-size="12" fill="#006400">Response</text>
+    </g>
+    
+    <!-- Flow connections -->
+    <line x1="170" y1="130" x2="240" y2="130" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+    <line x1="370" y1="130" x2="440" y2="130" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+    <line x1="570" y1="130" x2="640" y2="130" stroke="#666" stroke-width="2" marker-end="url(#arrowhead)"/>
+</svg>
diff --git a/docs/reference/search/search-your-data/retrieval-augmented-generation.asciidoc b/docs/reference/search/search-your-data/retrieval-augmented-generation.asciidoc
@@ -1,13 +1,18 @@
 [rag-elasticsearch]
 == Retrieval augmented generation
 
-Retrieval augmented generation (RAG) is a technique that retrieves additional context  from an external datastore before prompting an LLM.
-This grounds the LLM with in-context learning.
+.🍿 Prefer a video introduction?
+***********************
+Check out https://www.youtube.com/watch?v=OS4ZefUPAks[this short video] from the Elastic Snackable Series.
+***********************
+
+Retrieval augmented generation (RAG) is a technique where additional context is retrieved from an external datastore before prompting a language model to generate a response using the retrieved context.
+This grounds the model with in-context learning.
 Compared to finetuning or continuous pretraining, RAG can be implemented faster and cheaper, and it has several advantages.
 
 image::images/search/rag-venn-diagram.svg[RAG sits at the intersection of information retrieval and generative AI, align=center, width=500]
 
-RAG sits at the intersection of information retrieval and generative AI.
+RAG sits at the intersection of https://www.elastic.co/what-is/information-retrieval[information retrieval] and generative AI.
 {es} is an excellent tool for implementing RAG, because it offers various retrieval capabilities, such as full-text search, vector search, and hybrid search.
 
 [discrete]
@@ -16,21 +21,57 @@ RAG sits at the intersection of information retrieval and generative AI.
 
 RAG has several advantages:
 
-* It enables grounding the LLM with additional, up-to-date and/or private data.
-* It is much cheaper and easier to maintain compared to finetuning or continuously pretraining a model.
-* It ensures data privacy and security because you control what data the model sees. Different indices have different access controls.
-* You can rely on the language model to parse and format the retrieved context in a style or format of your choice.
-* You can start with a simple BM25-based full-text search system and gradually improve it by adding more advanced semantic and hybrid search capabilities.
+* *Improved context:* Enables grounding the LLM with additional, up-to-date, and/or private data.
+* *Reduced hallucination:* Helps minimize factual errors by enabling models to cite authoritative sources.
+* *Cost efficiency:* Requires less maintenance compared to finetuning or continuously pretraining models.
+* *Enhanced security:* Controls data access by leveraging {es}'s <<authorization, user authorization>> features, such as role-based access control, and field/document-level security.
+* *Simplified response parsing:* Eliminates the need for custom parsing logic by letting the language model handle parsing {es} responses and formatting the retrieved context.
+* *Flexible implementation:* Works with basic 
+// TODO: uncomment when page is live <<full-text-search,full-text search>> 
+full-text search and can be gradually updated to use advanced <<semantic-search,semantic search>> and hybrid search capabilities.
+
+[discrete]
+[[rag-elasticsearch-components]]
+=== RAG system overview
+
+The following diagram illustrates a simple RAG system using {es}.
+
+image::images/search/rag-schema.svg[Components of a simple RAG system using Elasticsearch, align=center, width=800]
+
+The system consists of the following components:
+
+. User submits a query
+. Elasticsearch retrieves relevant documents, using full-text search, vector search, or hybrid search
+. Language model processes the context and generates a response, using custom instructions, such as "Cite a source" or "Provide a concise summary of the `content` field in markdown format"
+. Model returns final response to the user
+
+[TIP]
+====
+A more advanced setup might include query rewriting between steps 1 and 2. This intermediate step could use one or more additional language models with different instructions to reformulate queries for more specific and detailed responses.
+====
 
 [discrete]
-[[rag-elasticsearch-example]]
-=== Example
+[[rag-elasticsearch-getting-started]]
+=== Getting started
+
+Start building RAG applications quickly with Playground, which seamlessly integrates {es} with language model providers.
+The Playground UI enables you to build, test, and deploy RAG interfaces on top of your {es} indices.
+
+Playground automatically selects the best retrieval methods for your data, while providing full control over the final {es} queries and language model instructions.
+You can also download the underlying Python code to integrate with your existing applications.
+
+Learn more in the {kibana-ref}/playground.html[documentation] and 
+try the https://www.elastic.co/demo-gallery/ai-playground[interactive lab] for hands-on experience.
+
+[discrete]
+[[rag-elasticsearch-learn-more]]
+=== Learn more
+
+Learn more about building RAG systems using {es} in these blog posts:
 
-Here's a simple example of a RAG system using {es}, where a user has a question about the company travel policy:
+* https://www.elastic.co/blog/beyond-rag-basics-semantic-search-with-elasticsearch[Beyond RAG Basics: Advanced strategies for AI applications]
+* https://www.elastic.co/search-labs/blog/building-a-rag-system-with-gemma-hugging-face-elasticsearch[Building a RAG system with Gemma, Hugging Face, and Elasticsearch]
+* https://www.elastic.co/search-labs/blog/rag-agent-tool-elasticsearch-langchain[Building an agentic RAG tool with Elasticsearch and Langchain]
 
-1. User makes natural language queries about company travel policy
-2. System retrieves relevant documents from {es}
-3. LLM generates response using retrieved context
 
-The result is accurate, up-to-date answers based on company documents.