Apply suggestions from code review

fpagny · jcirinosclwy · web-flow · commit 882efaf356f8 · 2024-11-29T13:30:07.000+01:00
Co-authored-by: Jessica &lt;113192637+jcirinosclwy@users.noreply.github.com&gt;
diff --git a/tutorials/how-to-implement-rag-generativeapis/index.mdx b/tutorials/how-to-implement-rag-generativeapis/index.mdx
@@ -96,7 +96,7 @@ Create an `embed.py` file and add the following code to it:
 
 ### Configure embeddings client
 
-4. Edit `embed.py` to configure [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) class from LangChain to use your API Secret Key, Generative APIs Endpoint URL and a supported model (`bge-multilingual-gemma2` in our example).
+Edit `embed.py` to configure [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) class from LangChain to use your API Secret Key, Generative APIs Endpoint URL, and a supported model (`bge-multilingual-gemma2`, in our example).
 
 ```python
 embeddings = OpenAIEmbeddings(
@@ -109,7 +109,7 @@ embeddings = OpenAIEmbeddings(
 
 ### Configure vector store client
 
-5. Edit `embed.py` to configure connection to your Managed Database for PostgreSQL Instance storing vectors:
+Edit `embed.py` to configure connection to your Managed Database for PostgreSQL Instance storing vectors:
 
    ```python
    connection_string = f'postgresql+psycopg2://{os.getenv("SCW_DB_USER")}:{os.getenv("SCW_DB_PASSWORD")}@{os.getenv("SCW_DB_HOST")}:{os.getenv("SCW_DB_PORT")}/{os.getenv("SCW_DB_NAME")}'
@@ -122,14 +122,14 @@ embeddings = OpenAIEmbeddings(
 
 ## Load and process documents
 
-6. At this stage, you need to have data (e.g. PDF files) stored in your Scaleway Object storage bucket. As examples, you can download our [Instance CLI cheatsheet](https://www-uploads.scaleway.com/Instances_CLI_Cheatsheet_7ae4ed5564.pdf) and our [Kubernetes cheatsheets](https://www.scaleway.com/en/docs/static/be9a6e5821a4e8e268c7c5bd3624e256/scaleway-kubernetes-cheatsheet.pdf) and store them into your [Object Storage bucket](https://console.scaleway.com/object-storage/buckets).
+At this stage, you need to have data (e.g. PDF files) stored in your Scaleway Object storage bucket. As examples, you can download our [Instance CLI cheatsheet](https://www-uploads.scaleway.com/Instances_CLI_Cheatsheet_7ae4ed5564.pdf) and [Kubernetes cheatsheets](https://www.scaleway.com/en/docs/static/be9a6e5821a4e8e268c7c5bd3624e256/scaleway-kubernetes-cheatsheet.pdf) and store them into your [Object Storage bucket](https://console.scaleway.com/object-storage/buckets).
 
 Below we will use LangChain's [`S3DirectoryLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_file.S3FileLoader.html) to load documents, and split them into chunks. 
 Then, we will embed them as vectors and store these vectors in your PostgreSQL database.
 
 ### Import required modules
 
-7. Edit the beginning of `embed.py` to import `S3DirectoryLoader` and `RecursiveCharacterTextSplitter`:
+Edit the beginning of `embed.py` to import `S3DirectoryLoader` and `RecursiveCharacterTextSplitter`:
 
    ```python
    from langchain_community.document_loaders import S3DirectoryLoader
@@ -138,7 +138,7 @@ Then, we will embed them as vectors and store these vectors in your PostgreSQL d
 
 ### Iterate through objects
 
-8. Edit `embed.py` to load all files in your bucket using `S3DirectoryLoader`, split them into chunks of 500 characters using `RecursiveCharacterTextSplitter` and embed them and store them into your PostgreSQL database using `PGVector`.
+Edit `embed.py` to load all files in your bucket using `S3DirectoryLoader`, split them into chunks of 500 characters using `RecursiveCharacterTextSplitter` and embed them and store them into your PostgreSQL database using `PGVector`.
 
    ```python
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0, add_start_index=True, length_function=len, is_separator_regex=False)
@@ -158,7 +158,7 @@ Then, we will embed them as vectors and store these vectors in your PostgreSQL d
 
    The chunk size of 500 characters is chosen to fit within the context size limit of the embedding model used in this tutorial, but could be raised to up to 4096 characters for `bge-multilingual-gemma2` model (or slightly more as context size is counted in tokens). Keeping chunks small also optimizes performance during inference.
 
-9. You can now run you vector embedding script with:
+You can now run you vector embedding script with:
 
    ```sh
    python embed.py
@@ -176,7 +176,7 @@ Then, we will embed them as vectors and store these vectors in your PostgreSQL d
 
 ### Create a new file and import required modules
 
-1. Create a new file called `rag.py` and add the following content to it:
+Create a new file called `rag.py` and add the following content to it:
 
    ```python
    #rag.py
@@ -195,7 +195,7 @@ Then, we will embed them as vectors and store these vectors in your PostgreSQL d
 
 ### Configure vector store
 
-2. Edit `rag.py` to load `.env` file, and configure Embeddings format and Vector store:
+Edit `rag.py` to load `.env` file, and configure Embeddings format and Vector store:
 
    ```python
    load_dotenv()
@@ -216,7 +216,7 @@ Then, we will embed them as vectors and store these vectors in your PostgreSQL d
 
 ### Configure the LLM client and create a basic RAG pipeline
 
-3. Edit `rag.py` to configure LLM client using `ChatOpenAI` and create a simple RAG pipeline:
+Edit `rag.py` to configure the LLM client using `ChatOpenAI` and create a simple RAG pipeline:
 
    ```python
    llm = ChatOpenAI(
@@ -244,7 +244,7 @@ Then, we will embed them as vectors and store these vectors in your PostgreSQL d
    - `rag_chain` defines a workflow performing the following steps in order: Retrieve relevant documents, Prompt LLM with document as context, and final output parsing.
    - `for r in rag_chain.stream("Prompt question")` starts the rag workflow with `Prompt question` as input.
 
-4. You can now execute your RAG pipeline with:
+You can now execute your RAG pipeline with the following command:
 
    ```sh
    python rag.py
@@ -270,7 +270,7 @@ Then, we will embed them as vectors and store these vectors in your PostgreSQL d
 
 Personalizing your prompt template allows you to tailor the responses from your RAG (Retrieval-Augmented Generation) system to better fit your specific needs. This can significantly improve the relevance and tone of the answers you receive. Below is a detailed guide on how to create a custom prompt for querying the system.
 
-5. Replace `rag.py` content with the following:
+Replace the `rag.py` content with the following:
 
    ```python
    #rag.py
@@ -320,7 +320,8 @@ Personalizing your prompt template allows you to tailor the responses from your
    - `retriever.invoke` lets you customize which part of the LLM input is used to retrieve documents.
    - `create_stuff_documents_chain` provides the prompt template to the LLM.
 
-6. You can now execute your custom RAG pipeline with:
+You can now execute your custom RAG pipeline with the following command:
+
 
    ```sh
    python rag.py