Update README.md

AlexanderHodicke · web-flow · commit e2b65434f201 · 2024-05-06T12:11:06.000+02:00
spell checking
diff --git a/cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/README.md b/cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/README.md
@@ -1,4 +1,4 @@
-# RAG with OCI, LangChain and VLLMs
+# RAG with OCI, LangChain, and VLLMs
 
 [![License: UPL](https://img.shields.io/badge/license-UPL-green)](https://img.shields.io/badge/license-UPL-green) [![Quality gate](https://sonarcloud.io/api/project_badges/quality_gate?project=oracle-devrel_technology-engineering)](https://sonarcloud.io/dashboard?id=oracle-devrel_technology-engineering)
 
@@ -8,22 +8,22 @@ This repository is a variant of the Retrieval Augmented Generation (RAG) tutoria
 
 These are the following libraries and modules being used in this solution:
 
-* **LlamaIndex**: a data framework for LLM-based applications which benefit from context augmentation.
+* **LlamaIndex**: a data framework for LLM-based applications that benefit from context augmentation.
 * **LangChain**: a framework for developing applications powered by large language models.
 * **vLLM**: a fast and easy-to-use library for LLM inference and serving.
 * **Qdrant**: a vector similarity search engine.
 
-As we're using a Mistral model, [Mistral.ai](https://mistral.ai/) also deserves proper introduction: Mistral AI is a French AI startup that develops Large Language Models (LLMs), and one of the few companies with uncensored versions for their models (interesting to look into as a developer) Mistral 7B Instruct is a small yet powerful open model that supports English and code. The instruct version -the one we're using here- is optimized for chat.
+As we're using a Mistral model, [Mistral.ai](https://mistral.ai/) also deserves a proper introduction: Mistral AI is a French AI startup that develops Large Language Models (LLMs), and one of the few companies with uncensored versions for their models (interesting to look into as a developer) Mistral 7B Instruct is a small yet powerful open model that supports English and code. The instructional version, the one we're using here, is optimized for chat.
 
 In this example, inference performance is increased using the [FlashAttention](https://huggingface.co/docs/text-generation-inference/conceptual/flash_attention) backend.
 
 These are the components of the Python solution being used here:
 
-* **SitemapReader**: Asynchronous sitemap reader for web (based on beautifulsoup). Reads pages from the web based on their sitemap.xml. Other data connectors are available (Snowflake, Twitter, Wikipedia, etc.). In this example the site mapxml file is stored in an OCI bucket.
+* **SitemapReader**: Asynchronous sitemap reader for the web (based on beautifulsoup). Reads pages from the web based on their sitemap.xml. Other data connectors are available (Snowflake, Twitter, Wikipedia, etc.). In this example, the site mapxml file is stored in an OCI bucket.
 * **QdrantClient**: Python client for the Qdrant vector search engine.
 * **SentenceTransformerEmbeddings**: Sentence embeddings model object (from HuggingFace). Other options include Aleph Alpha, Cohere, MistralAI, SpaCy, etc.
 * **VLLM**: Fast and easy-to-use LLM inference server.
-* **Settings**: Bundle of commonly used resources used during the indexing and querying stage in a LlamaIndex pipeline/application. In this example we use global configuration.
+* **Settings**: Bundle of commonly used resources used during the indexing and querying stage in a LlamaIndex pipeline/application. In this example, we use global configuration.
 * **QdrantVectorStore**: Vector store where embeddings and docs are stored within a Qdrant collection.
 * **StorageContext**: Utility container for storing nodes, indices, and vectors.
 * **VectorStoreIndex**: Index built from the documents loaded in the Vector Store.
@@ -46,13 +46,13 @@ These are the components of the Python solution being used here:
 
 ## 1. Instance Creation
 
-There are two approaches here: either install everything from scratch using an Ubuntu 22 LTS OS Image, or use a marketplace image from NVIDIA, which will significantly reduce the overhead from installing all NVIDIA drivers and dependencies manually. However, these steps are also provided for those of you who want to know exactly what goes into your machine.
+There are two approaches here: either install everything from scratch using an Ubuntu 22 LTS OS Image or use a marketplace image from NVIDIA, which will significantly reduce the overhead from installing all NVIDIA drivers and dependencies manually. However, these steps are also provided for those of you who want to know exactly what goes into your machine.
 
 A boot volume of 200-250 GB is also recommended.
 
-In this example a single A10 GPU VM shape, codename `VM.GPU.A10.1`, is used. This is currently the smallest GPU shape available on OCI. With this configuration, it is recommended to limit the context length of the VLLM Model to **16384MB**, especially for larger models. To use the full context length, a dual A10 GPU, codename `VM.GPU.A10.2`, will be necessary.
+In this example, a single A10 GPU VM shape, codename `VM.GPU.A10.1`, is used. This is currently the smallest GPU shape available on OCI. With this configuration, it is recommended to limit the context length of the VLLM Model to **16384MB**, especially for larger models. To use the full context length, a dual A10 GPU, codename `VM.GPU.A10.2`, will be necessary.
 
-> **Important**: If you've chosen to follow the guide with your NVIDIA GPU Cloud Machine Image (instead of a fresh Ubuntu image), you won't need to execute the steps found below in chapter 2: Setup. These steps will be marked with an asterisk (*) at the beginning so you know which ones to **skip** and which ones to execute.
+> **Important**: If you've chosen to follow the guide with your NVIDIA GPU Cloud Machine Image (instead of a fresh Ubuntu image), you won't need to execute the steps found below in Chapter 2: Setup. These steps will be marked with an asterisk (*) at the beginning so you know which ones to **skip** and which ones to execute.
 
 1. Create a GPU instance on OCI if you haven't already:
 
@@ -64,21 +64,21 @@ In this example a single A10 GPU VM shape, codename `VM.GPU.A10.1`, is used. Thi
     ssh -i <private.key> ubuntu@<public-ip>
     ```
 
-where `private.key` is the ssh private key provided in the instance creation phase and `public-ip` is the instance Public IP address that can be found in the OCI Console.
+where `private.key` is the SSH private key provided in the instance creation phase and `public-ip` is the instance Public IP address that can be found in the OCI Console.
 
 Once we have SSH access to our instance, we can proceed with the setup.
 
 ## 2. Setup
 
-For the sake of libraries and packages compatibility, is highly recommended to update the image packages, NVIDIA drivers and CUDA versions.
+For the sake of libraries and package compatibility, is highly recommended to update the image packages, NVIDIA drivers, and CUDA versions.
 
-1. Fetch, download and install the packages of the distribution:
+1. Fetch, download, and install the packages of the distribution:
 
     ```bash
     sudo apt-get update && sudo apt-get upgrade -y
     ```
 
-2. (*) Remove the current NVIDIA packages and replace it with the following versions.
+2. (*) Remove the current NVIDIA packages and replace them with the following versions.
 
     ```bash
     sudo apt purge nvidia* libnvidia* -y
@@ -160,7 +160,7 @@ For the sake of libraries and packages compatibility, is highly recommended to u
     python rag-langchain-vllm-mistral.py
     ```
 
-2. If you want to run a batch of queries against Mistral with the vLLM engine, execute the following script (containst an editable list of queries):
+2. If you want to run a batch of queries against Mistral with the vLLM engine, execute the following script (containing an editable list of queries):
 
     ```bash
     python invoke_api.py
@@ -170,7 +170,7 @@ The script will return the answer to the questions asked in the query.
 
 ## 4. Alternative deployment
 
-Alternatively it is possible to deploy both components (qdrant client and vLLM server) remotely using Docker containers. This option can be useful in two situations:
+Alternatively, it is possible to deploy both components (qdrant client and vLLM server) remotely using Docker containers. This option can be useful in two situations:
 
 * The engines are shared by multiple solutions for which data must segregated.
 * The engines are deployed on instances with optimized configurations (GPU, RAM, CPU cores, etc.).
@@ -231,11 +231,11 @@ To deploy the container, refer to this [tutorial](https://github.com/oracle-devr
 
 ## Notes
 
-The libraries used in this example are evolving quite fast. The python script provided here might have to be updated in a near future to avoid Warnings and Errors.
+The libraries used in this example are evolving quite fast. The python script provided here might have to be updated in the near future to avoid Warnings and Errors.
 
 ## Contributing
 
-This project is open source.  Please submit your contributions by forking this repository and submitting a pull request!  Oracle appreciates any contributions that are made by the open source community.
+This project is open source.  Please submit your contributions by forking this repository and submitting a pull request!  Oracle appreciates any contributions that are made by the open-source community.
 
 ## License