Skip to content

Commit e2b6543

Browse files
Update README.md
spell checking
1 parent 34d3d97 commit e2b6543

File tree

1 file changed

+16
-16
lines changed
  • cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral

1 file changed

+16
-16
lines changed

cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/README.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# RAG with OCI, LangChain and VLLMs
1+
# RAG with OCI, LangChain, and VLLMs
22

33
[![License: UPL](https://img.shields.io/badge/license-UPL-green)](https://img.shields.io/badge/license-UPL-green) [![Quality gate](https://sonarcloud.io/api/project_badges/quality_gate?project=oracle-devrel_technology-engineering)](https://sonarcloud.io/dashboard?id=oracle-devrel_technology-engineering)
44

@@ -8,22 +8,22 @@ This repository is a variant of the Retrieval Augmented Generation (RAG) tutoria
88

99
These are the following libraries and modules being used in this solution:
1010

11-
* **LlamaIndex**: a data framework for LLM-based applications which benefit from context augmentation.
11+
* **LlamaIndex**: a data framework for LLM-based applications that benefit from context augmentation.
1212
* **LangChain**: a framework for developing applications powered by large language models.
1313
* **vLLM**: a fast and easy-to-use library for LLM inference and serving.
1414
* **Qdrant**: a vector similarity search engine.
1515

16-
As we're using a Mistral model, [Mistral.ai](https://mistral.ai/) also deserves proper introduction: Mistral AI is a French AI startup that develops Large Language Models (LLMs), and one of the few companies with uncensored versions for their models (interesting to look into as a developer) Mistral 7B Instruct is a small yet powerful open model that supports English and code. The instruct version -the one we're using here- is optimized for chat.
16+
As we're using a Mistral model, [Mistral.ai](https://mistral.ai/) also deserves a proper introduction: Mistral AI is a French AI startup that develops Large Language Models (LLMs), and one of the few companies with uncensored versions for their models (interesting to look into as a developer) Mistral 7B Instruct is a small yet powerful open model that supports English and code. The instructional version, the one we're using here, is optimized for chat.
1717

1818
In this example, inference performance is increased using the [FlashAttention](https://huggingface.co/docs/text-generation-inference/conceptual/flash_attention) backend.
1919

2020
These are the components of the Python solution being used here:
2121

22-
* **SitemapReader**: Asynchronous sitemap reader for web (based on beautifulsoup). Reads pages from the web based on their sitemap.xml. Other data connectors are available (Snowflake, Twitter, Wikipedia, etc.). In this example the site mapxml file is stored in an OCI bucket.
22+
* **SitemapReader**: Asynchronous sitemap reader for the web (based on beautifulsoup). Reads pages from the web based on their sitemap.xml. Other data connectors are available (Snowflake, Twitter, Wikipedia, etc.). In this example, the site mapxml file is stored in an OCI bucket.
2323
* **QdrantClient**: Python client for the Qdrant vector search engine.
2424
* **SentenceTransformerEmbeddings**: Sentence embeddings model object (from HuggingFace). Other options include Aleph Alpha, Cohere, MistralAI, SpaCy, etc.
2525
* **VLLM**: Fast and easy-to-use LLM inference server.
26-
* **Settings**: Bundle of commonly used resources used during the indexing and querying stage in a LlamaIndex pipeline/application. In this example we use global configuration.
26+
* **Settings**: Bundle of commonly used resources used during the indexing and querying stage in a LlamaIndex pipeline/application. In this example, we use global configuration.
2727
* **QdrantVectorStore**: Vector store where embeddings and docs are stored within a Qdrant collection.
2828
* **StorageContext**: Utility container for storing nodes, indices, and vectors.
2929
* **VectorStoreIndex**: Index built from the documents loaded in the Vector Store.
@@ -46,13 +46,13 @@ These are the components of the Python solution being used here:
4646

4747
## 1. Instance Creation
4848

49-
There are two approaches here: either install everything from scratch using an Ubuntu 22 LTS OS Image, or use a marketplace image from NVIDIA, which will significantly reduce the overhead from installing all NVIDIA drivers and dependencies manually. However, these steps are also provided for those of you who want to know exactly what goes into your machine.
49+
There are two approaches here: either install everything from scratch using an Ubuntu 22 LTS OS Image or use a marketplace image from NVIDIA, which will significantly reduce the overhead from installing all NVIDIA drivers and dependencies manually. However, these steps are also provided for those of you who want to know exactly what goes into your machine.
5050

5151
A boot volume of 200-250 GB is also recommended.
5252

53-
In this example a single A10 GPU VM shape, codename `VM.GPU.A10.1`, is used. This is currently the smallest GPU shape available on OCI. With this configuration, it is recommended to limit the context length of the VLLM Model to **16384MB**, especially for larger models. To use the full context length, a dual A10 GPU, codename `VM.GPU.A10.2`, will be necessary.
53+
In this example, a single A10 GPU VM shape, codename `VM.GPU.A10.1`, is used. This is currently the smallest GPU shape available on OCI. With this configuration, it is recommended to limit the context length of the VLLM Model to **16384MB**, especially for larger models. To use the full context length, a dual A10 GPU, codename `VM.GPU.A10.2`, will be necessary.
5454

55-
> **Important**: If you've chosen to follow the guide with your NVIDIA GPU Cloud Machine Image (instead of a fresh Ubuntu image), you won't need to execute the steps found below in chapter 2: Setup. These steps will be marked with an asterisk (*) at the beginning so you know which ones to **skip** and which ones to execute.
55+
> **Important**: If you've chosen to follow the guide with your NVIDIA GPU Cloud Machine Image (instead of a fresh Ubuntu image), you won't need to execute the steps found below in Chapter 2: Setup. These steps will be marked with an asterisk (*) at the beginning so you know which ones to **skip** and which ones to execute.
5656
5757
1. Create a GPU instance on OCI if you haven't already:
5858

@@ -64,21 +64,21 @@ In this example a single A10 GPU VM shape, codename `VM.GPU.A10.1`, is used. Thi
6464
ssh -i <private.key> ubuntu@<public-ip>
6565
```
6666

67-
where `private.key` is the ssh private key provided in the instance creation phase and `public-ip` is the instance Public IP address that can be found in the OCI Console.
67+
where `private.key` is the SSH private key provided in the instance creation phase and `public-ip` is the instance Public IP address that can be found in the OCI Console.
6868

6969
Once we have SSH access to our instance, we can proceed with the setup.
7070

7171
## 2. Setup
7272

73-
For the sake of libraries and packages compatibility, is highly recommended to update the image packages, NVIDIA drivers and CUDA versions.
73+
For the sake of libraries and package compatibility, is highly recommended to update the image packages, NVIDIA drivers, and CUDA versions.
7474

75-
1. Fetch, download and install the packages of the distribution:
75+
1. Fetch, download, and install the packages of the distribution:
7676

7777
```bash
7878
sudo apt-get update && sudo apt-get upgrade -y
7979
```
8080

81-
2. (*) Remove the current NVIDIA packages and replace it with the following versions.
81+
2. (*) Remove the current NVIDIA packages and replace them with the following versions.
8282

8383
```bash
8484
sudo apt purge nvidia* libnvidia* -y
@@ -160,7 +160,7 @@ For the sake of libraries and packages compatibility, is highly recommended to u
160160
python rag-langchain-vllm-mistral.py
161161
```
162162
163-
2. If you want to run a batch of queries against Mistral with the vLLM engine, execute the following script (containst an editable list of queries):
163+
2. If you want to run a batch of queries against Mistral with the vLLM engine, execute the following script (containing an editable list of queries):
164164
165165
```bash
166166
python invoke_api.py
@@ -170,7 +170,7 @@ The script will return the answer to the questions asked in the query.
170170
171171
## 4. Alternative deployment
172172
173-
Alternatively it is possible to deploy both components (qdrant client and vLLM server) remotely using Docker containers. This option can be useful in two situations:
173+
Alternatively, it is possible to deploy both components (qdrant client and vLLM server) remotely using Docker containers. This option can be useful in two situations:
174174
175175
* The engines are shared by multiple solutions for which data must segregated.
176176
* The engines are deployed on instances with optimized configurations (GPU, RAM, CPU cores, etc.).
@@ -231,11 +231,11 @@ To deploy the container, refer to this [tutorial](https://github.com/oracle-devr
231231
232232
## Notes
233233
234-
The libraries used in this example are evolving quite fast. The python script provided here might have to be updated in a near future to avoid Warnings and Errors.
234+
The libraries used in this example are evolving quite fast. The python script provided here might have to be updated in the near future to avoid Warnings and Errors.
235235
236236
## Contributing
237237
238-
This project is open source. Please submit your contributions by forking this repository and submitting a pull request! Oracle appreciates any contributions that are made by the open source community.
238+
This project is open source. Please submit your contributions by forking this repository and submitting a pull request! Oracle appreciates any contributions that are made by the open-source community.
239239
240240
## License
241241

0 commit comments

Comments
 (0)