Skip to content

Commit ad9c4b4

Browse files
Marketing updates
Marketing minor updates and typos. Moving from pip to conda env.
1 parent 67eb6cf commit ad9c4b4

File tree

4 files changed

+266
-214
lines changed

4 files changed

+266
-214
lines changed

cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/README.md

Lines changed: 59 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -5,57 +5,76 @@ This repository is a variant of the Retrieval Augmented Generation (RAG) tutoria
55
# Requirements
66

77
* An OCI tenancy with A10 GPU quota.
8+
* An HuggingFace account with a valid Access Token
89

910
# Libraries
1011

1112
* **LlamaIndex**: a data framework for LLM-based applications which benefit from context augmentation.
12-
* **LangChai**: a framework for developing applications powered by large language models.
13+
* **LangChain**: a framework for developing applications powered by large language models.
1314
* **vLLM**: a fast and easy-to-use library for LLM inference and serving.
1415
* **Qdrant**: a vector similarity search engine.
1516

1617
# Mistral LLM
1718

1819
[Mistral.ai](https://mistral.ai/) is a French AI startup that develop Large Language Models (LLMs). Mistral 7B Instruct is a small yet powerful open model that supports English and code. The Instruct version is optimized for chat. In this example, inference performance is increased using the [FlashAttention](https://huggingface.co/docs/text-generation-inference/conceptual/flash_attention) backend.
1920

20-
# Instance Configuration
21+
# Instance Creation
2122

22-
In this example a single A10 GPU VM shape, codename VM.GPU.A10.1, is used. This is currently the smallest GPU shape available on OCI. With this configuration, it is necessary to limit the VLLM Model context length option to 16384 because the memory is unsufficient. To use the full context length, a dual A10 GPU, codename VM.GPU.A10.2, will be necessary.
23-
The image is the NVIDIA GPU Cloud Machine image from the OCI marketplace.
24-
A boot volume of 200 GB is also recommended.
23+
In this example a single A10 GPU VM shape, codename VM.GPU.A10.1, is used. This is currently the smallest GPU shape available on OCI. With this configuration, it is necessary to limit the VLLM Model context length option to 16384 because the memory is unsufficient. To use the full context length, a dual A10 GPU, codename VM.GPU.A10.2, will be necessary.\
24+
The image is the NVIDIA GPU Cloud Machine image from the OCI marketplace.\
25+
A boot volume of 200 GB is also recommended.\
26+
Create the the instance and connect to it once it is running.
27+
```
28+
ssh -i public.key [email protected]
29+
```
30+
where `public.key` is the ssh public key provided in the instance creation phase and `public.ip` is the instance Public IP address that can be found in the OCI Console.
31+
32+
# Walkthrough
33+
34+
Along this walkthrough you will be guided through the different steps of the deployment, from configuring the environment to running the different components of the RAG solution.
2535

26-
# Image Update
36+
## Update packages and drivers
2737

28-
For the sake of libraries age support, is highly recommended to update NVIDIA drivers and CUDA by running:
38+
For the sake of libraries and packages comptibility, is highly recommended to update the image packages, NVIDIA drivers and CUDA versions. First fetch, download and install the packages of the distribution (optional).
39+
```
40+
sudo apt-get update && sudo apt-get upgrade -y
41+
42+
```
43+
Then remove the current NVIDIA packages and replace it with the following versions.
2944

3045
```
31-
sudo apt purge nvidia* libnvidia*
46+
sudo apt purge nvidia* libnvidia* -y
3247
sudo apt-get install -y cuda-drivers-545
3348
sudo apt-get install -y nvidia-kernel-open-545
3449
sudo apt-get install -y cuda-toolkit-12-3
50+
```
51+
Finally reboot the instance.
52+
```
3553
sudo reboot
3654
```
3755

38-
# Framework deployment
39-
40-
## Install packages
41-
42-
First setup a virtual environment with all the required packages:
56+
## Configure environment
4357

58+
Once the instance is running again, clone the repository and go to the right folder:
59+
```
60+
git clone https://github.com/oracle-devrel/technology-engineering.git
61+
cd technology-engineering/cloud-infrastructure/ai-infra-gpu/AI\ Infrastructure/rag-langchain-vllm-mistral/
62+
```
63+
Then update conda and create a virtual environment with all the required packages:
64+
```
65+
conda update -n base -c conda-forge conda
66+
conda env create -f environment.yml
4467
```
45-
python -m venv rag
46-
pip install -r requirements.txt
47-
source rag/bin/activate
68+
Then activate the environment.
69+
```
70+
conda activate rag
4871
```
4972

5073
## Deploy the framework
5174

52-
The python script creates an all-in-one framework with local instances of the Qdrant vector similarity search engine and the vLLM inference server. Alternatively it is possible to deploy these two components remotely using Docker containers. This option can be useful in two situations:
53-
* The engines are shared by multiple solutions for which data must segregated.
54-
* The engines are deployed on instances with optimized configurations (GPU, RAM, CPU cores, etc.).
55-
5675
### Framework components
5776

58-
* **SitemapReader**: Asynchronous sitemap reader for web. Reads pages from the web based on their sitemap.xml. Other data connectors are available (Snowflake, Twitter, Wikipedia, etc.). In this example the site mapxml file is stored in an OCI bucket.
77+
* **SitemapReader**: Asynchronous sitemap reader for web (based on beautifulsoup). Reads pages from the web based on their sitemap.xml. Other data connectors are available (Snowflake, Twitter, Wikipedia, etc.). In this example the site mapxml file is stored in an OCI bucket.
5978
* **QdrantClient**: Python client for the Qdrant vector search engine.
6079
* **SentenceTransformerEmbeddings**: Sentence embeddings model object (from HuggingFace). Other options include Aleph Alpha, Cohere, MistralAI, SpaCy, etc.
6180
* **VLLM**: Fast and easy-to-use LLM inference server.
@@ -64,6 +83,24 @@ The python script creates an all-in-one framework with local instances of the Qd
6483
* **StorageContext**: Utility container for storing nodes, indices, and vectors.
6584
* **VectorStoreIndex**: Index built from the documents loaded in the Vector Store.
6685

86+
### Running the solution
87+
88+
The python script creates an all-in-one framework with local instances of the Qdrant vector similarity search engine and the vLLM inference server. First set your HuggingFace Access Token as an environment variable:
89+
```
90+
export HF_TOKEN=your-hf-token
91+
```
92+
where `your-hf-token` is you personal Access Token. It might also be necessary to validate the Mistral model access on the HuggingFace website. Then run the python script:
93+
```
94+
python rag-langchain-vllm-mistral.py
95+
```
96+
The script will return the answer to the question asked in the query.
97+
98+
## Alternative deployment
99+
100+
Alternatively it is possible to deploy these two components remotely using Docker containers. This option can be useful in two situations:
101+
* The engines are shared by multiple solutions for which data must segregated.
102+
* The engines are deployed on instances with optimized configurations (GPU, RAM, CPU cores, etc.).
103+
67104
### Remote Qdrant client
68105

69106
Instead of:
@@ -108,7 +145,7 @@ llm = VLLMOpenAI(
108145
},
109146
)
110147
```
111-
To deploy the container, refer to this [tutorial](https://github.com/oracle-devrel/technology-engineering/tree/main/cloud-infrastructure/ai-infra-gpu/GPU/vllm-mistral).
148+
To deploy the container, refer to this [tutorial](https://github.com/oracle-devrel/technology-engineering/tree/main/cloud-infrastructure/ai-infra-gpu/AI%20Infrastructure/vllm-mistral).
112149

113150
# Notes
114151

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
name: rag
2+
channels:
3+
- conda-forge
4+
dependencies:
5+
- _libgcc_mutex=0.1=conda_forge
6+
- _openmp_mutex=4.5=2_gnu
7+
- bzip2=1.0.8=hd590300_5
8+
- ca-certificates=2024.2.2=hbcca054_0
9+
- ld_impl_linux-64=2.40=h55db66e_0
10+
- libffi=3.4.2=h7f98852_5
11+
- libgcc-ng=13.2.0=hc881cc4_6
12+
- libgomp=13.2.0=hc881cc4_6
13+
- libnsl=2.0.1=hd590300_0
14+
- libsqlite=3.45.3=h2797004_0
15+
- libuuid=2.38.1=h0b41bf4_0
16+
- libxcrypt=4.4.36=hd590300_1
17+
- libzlib=1.2.13=hd590300_5
18+
- ncurses=6.4.20240210=h59595ed_0
19+
- openssl=3.2.1=hd590300_1
20+
- pip=24.0=pyhd8ed1ab_0
21+
- python=3.10.14=hd12c33a_0_cpython
22+
- readline=8.2=h8228510_1
23+
- setuptools=69.5.1=pyhd8ed1ab_0
24+
- tk=8.6.13=noxft_h4845f30_101
25+
- wheel=0.43.0=pyhd8ed1ab_1
26+
- xz=5.2.6=h166bdaf_0
27+
- pip:
28+
- aiohttp==3.9.5
29+
- aiosignal==1.3.1
30+
- annotated-types==0.6.0
31+
- anyio==4.3.0
32+
- async-timeout==4.0.3
33+
- attrs==23.2.0
34+
- beautifulsoup4==4.12.3
35+
- certifi==2024.2.2
36+
- charset-normalizer==3.3.2
37+
- chromedriver-autoinstaller==0.6.4
38+
- click==8.1.7
39+
- cloudpickle==3.0.0
40+
- cmake==3.29.2
41+
- cssselect==1.2.0
42+
- dataclasses-json==0.6.4
43+
- deprecated==1.2.14
44+
- dirtyjson==1.0.8
45+
- diskcache==5.6.3
46+
- distro==1.9.0
47+
- einops==0.7.0
48+
- exceptiongroup==1.2.1
49+
- fastapi==0.110.2
50+
- feedfinder2==0.0.4
51+
- feedparser==6.0.11
52+
- filelock==3.13.4
53+
- flash-attn==2.5.7
54+
- frozenlist==1.4.1
55+
- fsspec==2024.3.1
56+
- greenlet==3.0.3
57+
- grpcio==1.62.2
58+
- grpcio-tools==1.62.2
59+
- h11==0.14.0
60+
- h2==4.1.0
61+
- hpack==4.0.0
62+
- html2text==2020.1.16
63+
- httpcore==1.0.5
64+
- httptools==0.6.1
65+
- httpx==0.27.0
66+
- huggingface-hub==0.22.2
67+
- hyperframe==6.0.1
68+
- idna==3.7
69+
- interegular==0.3.3
70+
- jieba3k==0.35.1
71+
- jinja2==3.1.3
72+
- joblib==1.4.0
73+
- jsonpatch==1.33
74+
- jsonpointer==2.4
75+
- jsonschema==4.21.1
76+
- jsonschema-specifications==2023.12.1
77+
- langchain==0.1.16
78+
- langchain-community==0.0.34
79+
- langchain-core==0.1.46
80+
- langchain-text-splitters==0.0.1
81+
- langsmith==0.1.51
82+
- lark==1.1.9
83+
- llama-hub==0.0.79.post1
84+
- llama-index==0.10.32
85+
- llama-index-agent-openai==0.2.3
86+
- llama-index-cli==0.1.12
87+
- llama-index-core==0.10.32
88+
- llama-index-embeddings-langchain==0.1.2
89+
- llama-index-embeddings-openai==0.1.9
90+
- llama-index-indices-managed-llama-cloud==0.1.5
91+
- llama-index-legacy==0.9.48
92+
- llama-index-llms-anyscale==0.1.3
93+
- llama-index-llms-langchain==0.1.3
94+
- llama-index-llms-openai==0.1.16
95+
- llama-index-multi-modal-llms-openai==0.1.5
96+
- llama-index-program-openai==0.1.6
97+
- llama-index-question-gen-openai==0.1.3
98+
- llama-index-readers-file==0.1.19
99+
- llama-index-readers-llama-parse==0.1.4
100+
- llama-index-readers-web==0.1.10
101+
- llama-index-vector-stores-qdrant==0.2.8
102+
- llama-parse==0.4.2
103+
- llamaindex-py-client==0.1.18
104+
- llvmlite==0.42.0
105+
- lm-format-enforcer==0.9.8
106+
- lxml==5.2.1
107+
- markupsafe==2.1.5
108+
- marshmallow==3.21.1
109+
- mpmath==1.3.0
110+
- msgpack==1.0.8
111+
- multidict==6.0.5
112+
- mypy-extensions==1.0.0
113+
- nest-asyncio==1.6.0
114+
- networkx==3.3
115+
- newspaper3k==0.2.8
116+
- ninja==1.11.1.1
117+
- nltk==3.8.1
118+
- numba==0.59.1
119+
- numpy==1.26.4
120+
- nvidia-cublas-cu12==12.1.3.1
121+
- nvidia-cuda-cupti-cu12==12.1.105
122+
- nvidia-cuda-nvrtc-cu12==12.1.105
123+
- nvidia-cuda-runtime-cu12==12.1.105
124+
- nvidia-cudnn-cu12==8.9.2.26
125+
- nvidia-cufft-cu12==11.0.2.54
126+
- nvidia-curand-cu12==10.3.2.106
127+
- nvidia-cusolver-cu12==11.4.5.107
128+
- nvidia-cusparse-cu12==12.1.0.106
129+
- nvidia-ml-py==12.550.52
130+
- nvidia-nccl-cu12==2.19.3
131+
- nvidia-nvjitlink-cu12==12.4.127
132+
- nvidia-nvtx-cu12==12.1.105
133+
- openai==1.23.6
134+
- orjson==3.10.1
135+
- outcome==1.3.0.post0
136+
- outlines==0.0.34
137+
- packaging==23.2
138+
- pandas==2.2.2
139+
- pillow==10.3.0
140+
- playwright==1.43.0
141+
- portalocker==2.8.2
142+
- prometheus-client==0.20.0
143+
- protobuf==4.25.3
144+
- psutil==5.9.8
145+
- py-cpuinfo==9.0.0
146+
- pyaml==23.12.0
147+
- pydantic==2.7.1
148+
- pydantic-core==2.18.2
149+
- pyee==11.1.0
150+
- pypdf==4.2.0
151+
- pysocks==1.7.1
152+
- python-dateutil==2.9.0.post0
153+
- python-dotenv==1.0.1
154+
- pytz==2024.1
155+
- pyyaml==6.0.1
156+
- qdrant-client==1.9.0
157+
- ray==2.12.0
158+
- referencing==0.35.0
159+
- regex==2024.4.16
160+
- requests==2.31.0
161+
- requests-file==2.0.0
162+
- retrying==1.3.4
163+
- rpds-py==0.18.0
164+
- safetensors==0.4.3
165+
- scikit-learn==1.4.2
166+
- scipy==1.13.0
167+
- selenium==4.20.0
168+
- sentence-transformers==2.7.0
169+
- sentencepiece==0.2.0
170+
- sgmllib3k==1.0.0
171+
- six==1.16.0
172+
- sniffio==1.3.1
173+
- sortedcontainers==2.4.0
174+
- soupsieve==2.5
175+
- sqlalchemy==2.0.29
176+
- starlette==0.37.2
177+
- striprtf==0.0.26
178+
- sympy==1.12
179+
- tenacity==8.2.3
180+
- threadpoolctl==3.4.0
181+
- tiktoken==0.6.0
182+
- tinysegmenter==0.3
183+
- tldextract==5.1.2
184+
- tokenizers==0.19.1
185+
- torch==2.2.1
186+
- tqdm==4.66.2
187+
- transformers==4.40.1
188+
- trio==0.25.0
189+
- trio-websocket==0.11.1
190+
- triton==2.2.0
191+
- typing-extensions==4.11.0
192+
- typing-inspect==0.9.0
193+
- tzdata==2024.1
194+
- urllib3==2.2.1
195+
- uvicorn==0.29.0
196+
- uvloop==0.19.0
197+
- vllm==0.4.1
198+
- vllm-nccl-cu12==2.18.1.0.4.0
199+
- watchfiles==0.21.0
200+
- websockets==12.0
201+
- wrapt==1.16.0
202+
- wsproto==1.2.0
203+
- xformers==0.0.25
204+
- yarl==1.9.4
205+
prefix: /home/ubuntu/miniforge3/envs/rag

cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/rag-langchain-vllm-mistral.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@
1313
documents = loader.load_data(
1414
sitemap_url='https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/frpj5kvxryk1/b/thisIsThePlace/o/latest.xml'
1515
)
16-
for document in documents:
17-
print(document.metadata['Source'])
16+
# for document in documents:
17+
# print(document.metadata['Source'])
1818

1919
# local Docker-based instance of Qdrant
2020
client = QdrantClient(

0 commit comments

Comments
 (0)