You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cloud-infrastructure/ai-infra-gpu/ai-infrastructure/rag-langchain-vllm-mistral/README.md
+24-18Lines changed: 24 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
# RAG with OCI, LangChain, and VLLMs
2
2
3
-
This repository is a variant of the Retrieval Augmented Generation (RAG) tutorial available [here](https://github.com/oracle-devrel/technology-engineering/tree/main/ai-and-app-modernisation/ai-services/generative-ai-service/rag-genai). Instead of the OCI GenAI Service, it uses a local deployment of Mistral 7B Instruct v0.2 using a vLLM inference server powered by an NVIDIA A10 GPU.
3
+
This repository is a variant of the Retrieval Augmented Generation (RAG) tutorial available [here](https://github.com/oracle-devrel/technology-engineering/tree/main/ai-and-app-modernisation/ai-services/generative-ai-service/rag-genai). Instead of the OCI GenAI Service, it uses a local deployment of Mistral 7B Instruct v0.3 using a vLLM inference server powered by an NVIDIA A10 GPU.
4
4
5
5
Reviewed: 23.05.2024
6
6
7
7
# When to use this asset?
8
8
9
-
To run the RAG tutorial with a local deployment of Mistral 7B Instruct v0.2 using a vLLM inference server powered by an NVIDIA A10 GPU.
9
+
To run the RAG tutorial with a local deployment of Mistral 7B Instruct v0.3 using a vLLM inference server powered by an NVIDIA A10 GPU.
10
10
11
11
# How to use this asset?
12
12
@@ -25,7 +25,7 @@ These are the components of the Python solution being used here:
25
25
26
26
***SitemapReader**: Asynchronous sitemap reader for the web (based on beautifulsoup). Reads pages from the web based on their sitemap.xml. Other data connectors are available (Snowflake, Twitter, Wikipedia, etc.). In this example, the site mapxml file is stored in an OCI bucket.
27
27
***QdrantClient**: Python client for the Qdrant vector search engine.
28
-
***SentenceTransformerEmbeddings**: Sentence embeddings model object (from HuggingFace). Other options include Aleph Alpha, Cohere, MistralAI, SpaCy, etc.
28
+
***HuggingFaceEmbeddings**: Sentence embeddings model object (from HuggingFace). Other options include Aleph Alpha, Cohere, MistralAI, SpaCy, etc.
29
29
***VLLM**: Fast and easy-to-use LLM inference server.
30
30
***Settings**: Bundle of commonly used resources used during the indexing and querying stage in a LlamaIndex pipeline/application. In this example, we use global configuration.
31
31
***QdrantVectorStore**: Vector store where embeddings and docs are stored within a Qdrant collection.
@@ -82,23 +82,20 @@ For the sake of libraries and package compatibility, is highly recommended to up
82
82
sudo apt-get update && sudo apt-get upgrade -y
83
83
```
84
84
85
-
2. (*) Remove the current NVIDIA packages and replace them with the following versions.
4. (*) After installation, we need to add the CUDA path to the PATH environment variable, to allow for NVCC (NVIDIA CUDA Compiler) is able to find the right CUDA executable for parallelizing and running code:
@@ -146,10 +143,16 @@ For the sake of libraries and package compatibility, is highly recommended to up
146
143
conda activate rag
147
144
pip install packaging
148
145
pip install -r requirements.txt
149
-
# requirements.txt can be found in `technology-engineering/cloud-infrastructure/ai-infra-gpu/ai-infrastructure/rag-langchain-vllm-mistral/`
146
+
# requirements.txt can be found in `technology-engineering/cloud-infrastructure/ai-infra-gpu/ai-infrastructure/rag-langchain-vllm-mistral/files`
147
+
```
148
+
149
+
9. Install `gcc` compiler to be able to build PyTorch (in vllm):
150
+
151
+
```bash
152
+
sudo apt install -y gcc
150
153
```
151
154
152
-
9. Finally, reboot the instance and reconnect via SSH.
155
+
10. Finally, reboot the instance and reconnect via SSH.
153
156
154
157
```bash
155
158
ssh -i <private.key> ubuntu@<public-ip>
@@ -158,10 +161,13 @@ For the sake of libraries and package compatibility, is highly recommended to up
158
161
159
162
## Running the solution
160
163
161
-
1. You can run an editable file with parameters to test one query by running:
164
+
1. You can run an editable file with parameters to test one query but first set a few more details, namely the `VLLM_WORKER_MULTIPROC_METHOD` environment variable and the `ipython` interactive terminal:
162
165
163
166
```bash
164
-
python rag-langchain-vllm-mistral.py
167
+
export VLLM_WORKER_MULTIPROC_METHOD="spawn"
168
+
conda install ipython
169
+
ipython
170
+
run rag-langchain-vllm-mistral.py
165
171
```
166
172
167
173
2. If you want to run a batch of queries against Mistral with the vLLM engine, execute the following script (containing an editable list of queries):
@@ -210,7 +216,7 @@ Instead of:
210
216
from langchain_community.llms import VLLM
211
217
212
218
llm = VLLM(
213
-
model="mistralai/Mistral-7B-v0.1",
219
+
model="mistralai/Mistral-7B-v0.3",
214
220
...
215
221
vllm_kwargs={
216
222
...
@@ -226,7 +232,7 @@ from langchain_community.llms import VLLMOpenAI
Copy file name to clipboardExpand all lines: cloud-infrastructure/ai-infra-gpu/ai-infrastructure/rag-langchain-vllm-mistral/files/rag-langchain-vllm-mistral.py
0 commit comments