Skip to content

Commit 41a2a3b

Browse files
authored
fix(genapi): Splitting tutorial in two files separating document loading and rag prompting
1 parent 12d4144 commit 41a2a3b

File tree

1 file changed

+69
-28
lines changed
  • tutorials/how-to-implement-rag-generativeapis

1 file changed

+69
-28
lines changed

tutorials/how-to-implement-rag-generativeapis/index.mdx

Lines changed: 69 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -207,49 +207,91 @@ Then, we will embed them as vectors and store these vectors in your PostgreSQL d
207207
```python
208208
#rag.py
209209

210+
import os
211+
from dotenv import load_dotenv
212+
213+
from langchain_openai import OpenAIEmbeddings
214+
from langchain_postgres import PGVector
210215
from langchain import hub
211216
from langchain_core.output_parsers import StrOutputParser
212217
from langchain_core.runnables import RunnablePassthrough
213218
from langchain_openai import ChatOpenAI
214219
```
220+
Note that we need to import Langchain components `StrOutputParser`, `RunnablePassthrough` and `ChatOpenAI` to implement a RAG pipeline.
215221

216-
### Set up LLM for querying
222+
### Configure vector store
217223

218-
Now, set up the RAG system to handle queries
224+
2. Edit `rag.py` to load `.env` file, and configure Embeddings format and Vector store:
219225

220-
```python
221-
#rag.py
222-
223-
llm = ChatOpenAI(
224-
base_url=os.getenv("SCW_GENERATIVE_APIs_ENDPOINT"),
225-
api_key=os.getenv("SCW_SECRET_KEY"),
226-
model="llama-3.1-8b-instruct",
227-
)
226+
```python
227+
228+
load_dotenv()
229+
230+
embeddings = OpenAIEmbeddings(
231+
openai_api_key=os.getenv("SCW_SECRET_KEY"),
232+
openai_api_base=os.getenv("SCW_GENERATIVE_APIs_ENDPOINT"),
233+
model="bge-multilingual-gemma2",
234+
check_embedding_ctx_length=False
235+
)
236+
237+
connection_string = f'postgresql+psycopg2://{os.getenv("SCW_DB_USER")}:{os.getenv("SCW_DB_PASSWORD")}@{os.getenv("SCW_DB_HOST")}:{os.getenv("SCW_DB_PORT")}/{os.getenv("SCW_DB_NAME")}'
238+
vector_store = PGVector(connection=connection_string, embeddings=embeddings)
239+
```
240+
<Message type="tip">
241+
Note that configuration should be similar to the one used in `embed.py` to ensure vectors will be read in the same format as the one used to create and store them.
242+
</Message>
228243

229-
prompt = hub.pull("rlm/rag-prompt")
230-
retriever = vector_store.as_retriever()
231244

245+
### Configure LLM client and create a basic RAG pipeline
232246

233-
rag_chain = (
234-
{"context": retriever, "question": RunnablePassthrough()}
235-
| prompt
236-
| llm
237-
| StrOutputParser()
238-
)
247+
3. Edit `rag.py` to configure LLM client using `ChatOpenAI` and create a simple RAG pipeline:
239248

240-
for r in rag_chain.stream("Your question"):
241-
print(r, end="", flush=True)
242-
time.sleep(0.1)
243-
```
244-
- LLM initialization: we initialize the ChatOpenAI instance using the endpoint and API key from the environment variables, along with the specified model name.
249+
```python
250+
#rag.py
251+
252+
llm = ChatOpenAI(
253+
base_url=os.getenv("SCW_GENERATIVE_APIs_ENDPOINT"),
254+
api_key=os.getenv("SCW_SECRET_KEY"),
255+
model="llama-3.1-8b-instruct",
256+
)
257+
258+
prompt = hub.pull("rlm/rag-prompt")
259+
retriever = vector_store.as_retriever()
260+
261+
rag_chain = (
262+
{"context": retriever, "question": RunnablePassthrough()}
263+
| prompt
264+
| llm
265+
| StrOutputParser()
266+
)
267+
268+
for r in rag_chain.stream("Provide the CLI command to shut down a scaleway instance. Its instance-uuid is example-28f3-4e91-b2af-4c3502562d72"):
269+
print(r, end="", flush=True)
270+
```
271+
- `hub.pull("rlm/rag-prompt")` uses a standard RAG template, ensuring documents content retrieved will be passed as proper context along your prompt to the LLM.
272+
- `vector_store.as_retriever()` configure your vector store as additional context to collect based on your prompt.
273+
- `rag_chain` defines a workflow performing context retrieving, LLM prompting and finall pasing output in a streamlined way.
274+
- `for r in rag_chain.stream("Prompt question")` defines a workflow performing context retrieving, LLM prompting and finall pasing output in a streamlined way.
245275

246-
- Prompt setup: the prompt is pulled from the hub using a predefined template, ensuring consistent query formatting.
276+
4. You can now execute your RAG pipeline with:
247277

248-
- Retriever configuration: we set up the retriever to access the vector store, allowing the RAG system to retrieve relevant information based on the query.
278+
```sh
279+
python rag.py
280+
```
249281

250-
- RAG chain construction: we create the RAG chain, which connects the retriever, prompt, LLM, and output parser in a streamlined workflow.
282+
If you used the Scaleway cheatsheet provided as examples, and asked for a CLI command to power of instance, you should see the following answer:
283+
```sh
284+
scw instance server stop example-28f3-4e91-b2af-4c3502562d72
251285

252-
- Query execution: finally, we stream the output of the RAG chain for a specified question, printing each response with a slight delay for better readability.
286+
This will shut down the instance with the specified instance-uuid.
287+
Please note that this command only stops the instance, it doesn't shut it down completely
288+
```
289+
This command is fully correct and can be used with Scaleway CLI. Note especially that vector embedding enabled the system to retrieve proper document chunks even if the Scaleway cheatsheet doesn't mention `shutdown` but only `power off`.
290+
You can compare this result without RAG (for instance using [Generative APIs Playground](https://console.scaleway.com/generative-api/models/fr-par/playground?modelName=llama-3.1-8b-instruct)):
291+
```sh
292+
scaleway instance shutdown --instance-uuid example-28f3-4e91-b2af-4c3502562d72
293+
```
294+
This is command is not correct at all, and hallucinate in several ways to fit the question prompt content: `scaleway` instead of `scw`, `instance` instead of `instance server`, `shutdown` instead of `stop` and `--instance-uuid` parameter doesn't exist.
253295
254296
### Query the RAG system with your own prompt template
255297
@@ -277,7 +319,6 @@ custom_rag_chain = create_stuff_documents_chain(llm, custom_rag_prompt)
277319
context = retriever.invoke("your question")
278320
for r in custom_rag_chain.stream({"question":"your question", "context": context}):
279321
print(r, end="", flush=True)
280-
time.sleep(0.1)
281322
```
282323
283324
- Prompt template: the prompt template is meticulously crafted to direct the model's responses. It clearly instructs the model on how to leverage the provided context and emphasizes the importance of honesty in cases where it lacks information.

0 commit comments

Comments
 (0)