You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 20, 2025. It is now read-only.
Copy file name to clipboardExpand all lines: docs/guides/python/llama-rag.mdx
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ updated_at: 2024-11-16
11
11
12
12
# Using Retrieval Augmented Generation to enhance your LLMs
13
13
14
-
This guide shows how to use Retrieval Augmented Generation (RAG) to enhance a large language model (LLM). RAG is the process of enabling an LLM to reference context outside of its intial training data before generating its response. It can be extremely expensive in both time and computing power to train a model that is useful for your own domain-specific purposes. Therefore, using RAG is a cost-effective alternative to extending the capabilities of an existing LLM.
14
+
This guide shows how to use Retrieval Augmented Generation (RAG) to enhance a large language model (LLM). RAG is the process of enabling an LLM to reference context outside of its initial training data before generating its response. It can be extremely expensive in both time and computing power to train a model that is useful for your own domain-specific purposes. Therefore, using RAG is a cost-effective alternative to extending the capabilities of an existing LLM.
15
15
16
16
## Prerequisites
17
17
@@ -64,7 +64,7 @@ We'll organize our project structure like so:
64
64
65
65
## Setting up our LLM
66
66
67
-
Before we even start writing code for our LLM we'll want to download the model into our project. For this project we'll be using Llama 3.2 with a Q4_K_M quant.
67
+
Before we even start writing code for our LLM we'll want to download the model into our project. For this project we'll be using Llama 3.2 with the Q4_K_M quantization.
Now that we have our model we can load it into our code. We'll also define our [embed model](https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings/)- for vectorising our documentation - using a recommend [embed model](https://huggingface.co/BAAI/bge-large-en-v1.5) from Hugging Face. At this point we can also create a prompt template for prompts with our query engine. It will just sanitise some of the hallucinations so that if the model does not know an answer it won't pretend like it does.
76
+
Now that we have our model we can load it into our code. We'll also define our [embed model](https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings/) using a recommend [model](https://huggingface.co/BAAI/bge-large-en-v1.5) from Hugging Face. At this point we can also create a prompt template for prompts with our query engine. It will just sanitize some of the hallucinations so that if the model does not know an answer it won't pretend like it does.
The next step is where we embed our context into the LLM. For this example we can embed the Nitric documentation to allow searchability using the LLM. It's open-source on [GitHub](https://github.com/nitrictech/docs), so we can clone it into our project.
120
+
The next step is where we embed our context into the LLM. For this example we will embed the Nitric documentation. It's open-source on [GitHub](https://github.com/nitrictech/docs), so we can clone it into our project.
0 commit comments