|
| 1 | +[rag-elasticsearch] |
| 2 | +== Retrieval augmented generation |
| 3 | + |
| 4 | +Retrieval augmented generation (RAG) is a technique that retrieves additional context from an external datastore before prompting an LLM. |
| 5 | +This grounds the LLM with in-context learning. |
| 6 | +Compared to finetuning or continuous pretraining, RAG can be implemented faster and cheaper, and it has several advantages. |
| 7 | + |
| 8 | +image::images/search/rag-venn-diagram.svg[RAG sits at the intersection of information retrieval and generative AI, align=center, width=500] |
| 9 | + |
| 10 | +RAG sits at the intersection of information retrieval and generative AI. |
| 11 | +{es} is an excellent tool for implementing RAG, because it offers various retrieval capabilities, such as full-text search, vector search, and hybrid search. |
| 12 | + |
| 13 | +[discrete] |
| 14 | +[[rag-elasticsearch-advantages]] |
| 15 | +=== Advantages of RAG |
| 16 | + |
| 17 | +RAG has several advantages: |
| 18 | + |
| 19 | +* It enables grounding the LLM with additional, up-to-date and/or private data. |
| 20 | +* It is much cheaper and easier to maintain compared to finetuning or continuously pretraining a model. |
| 21 | +* It ensures data privacy and security because you control what data the model sees. Different indices have different access controls. |
| 22 | +* You can rely on the language model to parse and format the retrieved context in a style or format of your choice. |
| 23 | +* You can start with a simple BM25-based full-text search system and gradually improve it by adding more advanced semantic and hybrid search capabilities. |
| 24 | + |
| 25 | +[discrete] |
| 26 | +[[rag-elasticsearch-example]] |
| 27 | +=== Example |
| 28 | + |
| 29 | +Here's a simple example of a RAG system using {es}, where a user has a question about the company travel policy: |
| 30 | + |
| 31 | +1. User makes natural language queries about company travel policy |
| 32 | +2. System retrieves relevant documents from {es} |
| 33 | +3. LLM generates response using retrieved context |
| 34 | + |
| 35 | +The result is accurate, up-to-date answers based on company documents. |
| 36 | + |
0 commit comments