You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Retrieval augmented generation (RAG) is a technique that retrieves additional context from an external datastore before prompting an LLM.
5
-
This grounds the LLM with in-context learning.
4
+
.🍿 Prefer a video introduction?
5
+
***********************
6
+
Check out https://www.youtube.com/watch?v=OS4ZefUPAks[this short video] from the Elastic Snackable Series.
7
+
***********************
8
+
9
+
Retrieval augmented generation (RAG) is a technique where additional context is retrieved from an external datastore before prompting a language model to generate a response using the retrieved context.
10
+
This grounds the model with in-context learning.
6
11
Compared to finetuning or continuous pretraining, RAG can be implemented faster and cheaper, and it has several advantages.
7
12
8
13
image::images/search/rag-venn-diagram.svg[RAG sits at the intersection of information retrieval and generative AI, align=center, width=500]
9
14
10
-
RAG sits at the intersection of information retrieval and generative AI.
15
+
RAG sits at the intersection of https://www.elastic.co/what-is/information-retrieval[information retrieval] and generative AI.
11
16
{es} is an excellent tool for implementing RAG, because it offers various retrieval capabilities, such as full-text search, vector search, and hybrid search.
12
17
13
18
[discrete]
@@ -16,21 +21,57 @@ RAG sits at the intersection of information retrieval and generative AI.
16
21
17
22
RAG has several advantages:
18
23
19
-
* It enables grounding the LLM with additional, up-to-date and/or private data.
20
-
* It is much cheaper and easier to maintain compared to finetuning or continuously pretraining a model.
21
-
* It ensures data privacy and security because you control what data the model sees. Different indices have different access controls.
22
-
* You can rely on the language model to parse and format the retrieved context in a style or format of your choice.
23
-
* You can start with a simple BM25-based full-text search system and gradually improve it by adding more advanced semantic and hybrid search capabilities.
24
+
* *Improved context:* Enables grounding the LLM with additional, up-to-date, and/or private data.
25
+
* *Reduced hallucination:* Helps minimize factual errors by enabling models to cite authoritative sources.
26
+
* *Cost efficiency:* Requires less maintenance compared to finetuning or continuously pretraining models.
27
+
* *Enhanced security:* Controls data access by leveraging {es}'s <<authorization, user authorization>> features, such as role-based access control, and field/document-level security.
28
+
* *Simplified response parsing:* Eliminates the need for custom parsing logic by letting the language model handle parsing {es} responses and formatting the retrieved context.
29
+
* *Flexible implementation:* Works with basic
30
+
// TODO: uncomment when page is live <<full-text-search,full-text search>>
31
+
full-text search and can be gradually updated to use advanced <<semantic-search,semantic search>> and hybrid search capabilities.
32
+
33
+
[discrete]
34
+
[[rag-elasticsearch-components]]
35
+
=== RAG system overview
36
+
37
+
The following diagram illustrates a simple RAG system using {es}.
38
+
39
+
image::images/search/rag-schema.svg[Components of a simple RAG system using Elasticsearch, align=center, width=800]
40
+
41
+
The system consists of the following components:
42
+
43
+
. User submits a query
44
+
. Elasticsearch retrieves relevant documents, using full-text search, vector search, or hybrid search
45
+
. Language model processes the context and generates a response, using custom instructions, such as "Cite a source" or "Provide a concise summary of the `content` field in markdown format"
46
+
. Model returns final response to the user
47
+
48
+
[TIP]
49
+
====
50
+
A more advanced setup might include query rewriting between steps 1 and 2. This intermediate step could use one or more additional language models with different instructions to reformulate queries for more specific and detailed responses.
51
+
====
24
52
25
53
[discrete]
26
-
[[rag-elasticsearch-example]]
27
-
=== Example
54
+
[[rag-elasticsearch-getting-started]]
55
+
=== Getting started
56
+
57
+
Start building RAG applications quickly with Playground, which seamlessly integrates {es} with language model providers.
58
+
The Playground UI enables you to build, test, and deploy RAG interfaces on top of your {es} indices.
59
+
60
+
Playground automatically selects the best retrieval methods for your data, while providing full control over the final {es} queries and language model instructions.
61
+
You can also download the underlying Python code to integrate with your existing applications.
62
+
63
+
Learn more in the {kibana-ref}/playground.html[documentation] and
64
+
try the https://www.elastic.co/demo-gallery/ai-playground[interactive lab] for hands-on experience.
65
+
66
+
[discrete]
67
+
[[rag-elasticsearch-learn-more]]
68
+
=== Learn more
69
+
70
+
Learn more about building RAG systems using {es} in these blog posts:
28
71
29
-
Here's a simple example of a RAG system using {es}, where a user has a question about the company travel policy:
72
+
* https://www.elastic.co/blog/beyond-rag-basics-semantic-search-with-elasticsearch[Beyond RAG Basics: Advanced strategies for AI applications]
73
+
* https://www.elastic.co/search-labs/blog/building-a-rag-system-with-gemma-hugging-face-elasticsearch[Building a RAG system with Gemma, Hugging Face, and Elasticsearch]
74
+
* https://www.elastic.co/search-labs/blog/rag-agent-tool-elasticsearch-langchain[Building an agentic RAG tool with Elasticsearch and Langchain]
30
75
31
-
1. User makes natural language queries about company travel policy
32
-
2. System retrieves relevant documents from {es}
33
-
3. LLM generates response using retrieved context
34
76
35
-
The result is accurate, up-to-date answers based on company documents.
0 commit comments