Marketing updates

yanncaniouoracle · yanncaniouoracle · commit ad9c4b4ff981 · 2024-04-29T11:34:37.000+02:00
Marketing minor updates and typos. Moving from pip to conda env.
diff --git a/cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/README.md b/cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/README.md
@@ -5,57 +5,76 @@ This repository is a variant of the Retrieval Augmented Generation (RAG) tutoria
 # Requirements
 
 * An OCI tenancy with A10 GPU quota.
+* An HuggingFace account with a valid Access Token
 
 # Libraries
 
 * **LlamaIndex**: a data framework for LLM-based applications which benefit from context augmentation.
-* **LangChai**: a framework for developing applications powered by large language models.
+* **LangChain**: a framework for developing applications powered by large language models.
 * **vLLM**: a fast and easy-to-use library for LLM inference and serving.
 * **Qdrant**: a vector similarity search engine.
 
 # Mistral LLM
 
 [Mistral.ai](https://mistral.ai/) is a French AI startup that develop Large Language Models (LLMs). Mistral 7B Instruct is a small yet powerful open model that supports English and code. The Instruct version is optimized for chat. In this example, inference performance is increased using the [FlashAttention](https://huggingface.co/docs/text-generation-inference/conceptual/flash_attention) backend.
 
-# Instance Configuration
+# Instance Creation
 
-In this example a single A10 GPU VM shape, codename VM.GPU.A10.1, is used. This is currently the smallest GPU shape available on OCI. With this configuration, it is necessary to limit the VLLM Model context length option to 16384 because the memory is unsufficient. To use the full context length, a dual A10 GPU, codename VM.GPU.A10.2, will be necessary.
-The image is the NVIDIA GPU Cloud Machine image from the OCI marketplace.
-A boot volume of 200 GB is also recommended.
+In this example a single A10 GPU VM shape, codename VM.GPU.A10.1, is used. This is currently the smallest GPU shape available on OCI. With this configuration, it is necessary to limit the VLLM Model context length option to 16384 because the memory is unsufficient. To use the full context length, a dual A10 GPU, codename VM.GPU.A10.2, will be necessary.\
+The image is the NVIDIA GPU Cloud Machine image from the OCI marketplace.\
+A boot volume of 200 GB is also recommended.\
+Create the the instance and connect to it once it is running.
+```
+ssh -i public.key ubuntu@public.ip
+```
+where `public.key` is the ssh public key provided in the instance creation phase and `public.ip` is the instance Public IP address that can be found in the OCI Console.
+
+# Walkthrough
+
+Along this walkthrough you will be guided through the different steps of the deployment, from configuring the environment to running the different components of the RAG solution.
 
-# Image Update
+## Update packages and drivers
 
-For the sake of libraries age support, is highly recommended to update NVIDIA drivers and CUDA by running:
+For the sake of libraries and packages comptibility, is highly recommended to update the image packages, NVIDIA drivers and CUDA versions. First fetch, download and install the packages of the distribution (optional).
+```
+sudo apt-get update && sudo apt-get upgrade -y
+
+```
+Then remove the current NVIDIA packages and replace it with the following versions.
 
 ```
-sudo apt purge nvidia* libnvidia*
+sudo apt purge nvidia* libnvidia* -y
 sudo apt-get install -y cuda-drivers-545
 sudo apt-get install -y nvidia-kernel-open-545
 sudo apt-get install -y cuda-toolkit-12-3
+```
+Finally reboot the instance.
+```
 sudo reboot
 ```
 
-# Framework deployment
-
-## Install packages
-
-First setup a virtual environment with all the required packages:
+## Configure environment
 
+Once the instance is running again, clone the repository and go to the right folder:
+```
+git clone https://github.com/oracle-devrel/technology-engineering.git
+cd technology-engineering/cloud-infrastructure/ai-infra-gpu/AI\ Infrastructure/rag-langchain-vllm-mistral/
+```
+Then update conda and create a virtual environment with all the required packages:
+```
+conda update -n base -c conda-forge conda
+conda env create -f environment.yml
 ```
-python -m venv rag
-pip install -r requirements.txt
-source rag/bin/activate
+Then activate the environment.
+```
+conda activate rag
 ```
 
 ## Deploy the framework
 
-The python script creates an all-in-one framework with local instances of the Qdrant vector similarity search engine and the vLLM inference server. Alternatively it is possible to deploy these two components remotely using Docker containers. This option can be useful in two situations:
-* The engines are shared by multiple solutions for which data must segregated.
-* The engines are deployed on instances with optimized configurations (GPU, RAM, CPU cores, etc.).
-
 ### Framework components
 
-* **SitemapReader**: Asynchronous sitemap reader for web. Reads pages from the web based on their sitemap.xml. Other data connectors are available (Snowflake, Twitter, Wikipedia, etc.). In this example the site mapxml file is stored in an OCI bucket.
+* **SitemapReader**: Asynchronous sitemap reader for web (based on beautifulsoup). Reads pages from the web based on their sitemap.xml. Other data connectors are available (Snowflake, Twitter, Wikipedia, etc.). In this example the site mapxml file is stored in an OCI bucket.
 * **QdrantClient**: Python client for the Qdrant vector search engine.
 * **SentenceTransformerEmbeddings**: Sentence embeddings model object (from HuggingFace). Other options include Aleph Alpha, Cohere, MistralAI, SpaCy, etc.
 * **VLLM**: Fast and easy-to-use LLM inference server.
@@ -64,6 +83,24 @@ The python script creates an all-in-one framework with local instances of the Qd
 * **StorageContext**: Utility container for storing nodes, indices, and vectors.
 * **VectorStoreIndex**: Index built from the documents loaded in the Vector Store.
 
+### Running the solution
+
+The python script creates an all-in-one framework with local instances of the Qdrant vector similarity search engine and the vLLM inference server. First set your HuggingFace Access Token as an environment variable:
+```
+export HF_TOKEN=your-hf-token
+```
+where `your-hf-token` is you personal Access Token. It might also be necessary to validate the Mistral model access on the HuggingFace website. Then run the python script:
+```
+python rag-langchain-vllm-mistral.py
+```
+The script will return the answer to the question asked in the query.
+
+## Alternative deployment
+
+Alternatively it is possible to deploy these two components remotely using Docker containers. This option can be useful in two situations:
+* The engines are shared by multiple solutions for which data must segregated.
+* The engines are deployed on instances with optimized configurations (GPU, RAM, CPU cores, etc.).
+
 ### Remote Qdrant client
 
 Instead of:
@@ -108,7 +145,7 @@ llm = VLLMOpenAI(
     },
 )
 ```
-To deploy the container, refer to this [tutorial](https://github.com/oracle-devrel/technology-engineering/tree/main/cloud-infrastructure/ai-infra-gpu/GPU/vllm-mistral).
+To deploy the container, refer to this [tutorial](https://github.com/oracle-devrel/technology-engineering/tree/main/cloud-infrastructure/ai-infra-gpu/AI%20Infrastructure/vllm-mistral).
 
 # Notes
 
diff --git a/cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/environment.yml b/cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/environment.yml
@@ -0,0 +1,205 @@
+name: rag
+channels:
+  - conda-forge
+dependencies:
+  - _libgcc_mutex=0.1=conda_forge
+  - _openmp_mutex=4.5=2_gnu
+  - bzip2=1.0.8=hd590300_5
+  - ca-certificates=2024.2.2=hbcca054_0
+  - ld_impl_linux-64=2.40=h55db66e_0
+  - libffi=3.4.2=h7f98852_5
+  - libgcc-ng=13.2.0=hc881cc4_6
+  - libgomp=13.2.0=hc881cc4_6
+  - libnsl=2.0.1=hd590300_0
+  - libsqlite=3.45.3=h2797004_0
+  - libuuid=2.38.1=h0b41bf4_0
+  - libxcrypt=4.4.36=hd590300_1
+  - libzlib=1.2.13=hd590300_5
+  - ncurses=6.4.20240210=h59595ed_0
+  - openssl=3.2.1=hd590300_1
+  - pip=24.0=pyhd8ed1ab_0
+  - python=3.10.14=hd12c33a_0_cpython
+  - readline=8.2=h8228510_1
+  - setuptools=69.5.1=pyhd8ed1ab_0
+  - tk=8.6.13=noxft_h4845f30_101
+  - wheel=0.43.0=pyhd8ed1ab_1
+  - xz=5.2.6=h166bdaf_0
+  - pip:
+      - aiohttp==3.9.5
+      - aiosignal==1.3.1
+      - annotated-types==0.6.0
+      - anyio==4.3.0
+      - async-timeout==4.0.3
+      - attrs==23.2.0
+      - beautifulsoup4==4.12.3
+      - certifi==2024.2.2
+      - charset-normalizer==3.3.2
+      - chromedriver-autoinstaller==0.6.4
+      - click==8.1.7
+      - cloudpickle==3.0.0
+      - cmake==3.29.2
+      - cssselect==1.2.0
+      - dataclasses-json==0.6.4
+      - deprecated==1.2.14
+      - dirtyjson==1.0.8
+      - diskcache==5.6.3
+      - distro==1.9.0
+      - einops==0.7.0
+      - exceptiongroup==1.2.1
+      - fastapi==0.110.2
+      - feedfinder2==0.0.4
+      - feedparser==6.0.11
+      - filelock==3.13.4
+      - flash-attn==2.5.7
+      - frozenlist==1.4.1
+      - fsspec==2024.3.1
+      - greenlet==3.0.3
+      - grpcio==1.62.2
+      - grpcio-tools==1.62.2
+      - h11==0.14.0
+      - h2==4.1.0
+      - hpack==4.0.0
+      - html2text==2020.1.16
+      - httpcore==1.0.5
+      - httptools==0.6.1
+      - httpx==0.27.0
+      - huggingface-hub==0.22.2
+      - hyperframe==6.0.1
+      - idna==3.7
+      - interegular==0.3.3
+      - jieba3k==0.35.1
+      - jinja2==3.1.3
+      - joblib==1.4.0
+      - jsonpatch==1.33
+      - jsonpointer==2.4
+      - jsonschema==4.21.1
+      - jsonschema-specifications==2023.12.1
+      - langchain==0.1.16
+      - langchain-community==0.0.34
+      - langchain-core==0.1.46
+      - langchain-text-splitters==0.0.1
+      - langsmith==0.1.51
+      - lark==1.1.9
+      - llama-hub==0.0.79.post1
+      - llama-index==0.10.32
+      - llama-index-agent-openai==0.2.3
+      - llama-index-cli==0.1.12
+      - llama-index-core==0.10.32
+      - llama-index-embeddings-langchain==0.1.2
+      - llama-index-embeddings-openai==0.1.9
+      - llama-index-indices-managed-llama-cloud==0.1.5
+      - llama-index-legacy==0.9.48
+      - llama-index-llms-anyscale==0.1.3
+      - llama-index-llms-langchain==0.1.3
+      - llama-index-llms-openai==0.1.16
+      - llama-index-multi-modal-llms-openai==0.1.5
+      - llama-index-program-openai==0.1.6
+      - llama-index-question-gen-openai==0.1.3
+      - llama-index-readers-file==0.1.19
+      - llama-index-readers-llama-parse==0.1.4
+      - llama-index-readers-web==0.1.10
+      - llama-index-vector-stores-qdrant==0.2.8
+      - llama-parse==0.4.2
+      - llamaindex-py-client==0.1.18
+      - llvmlite==0.42.0
+      - lm-format-enforcer==0.9.8
+      - lxml==5.2.1
+      - markupsafe==2.1.5
+      - marshmallow==3.21.1
+      - mpmath==1.3.0
+      - msgpack==1.0.8
+      - multidict==6.0.5
+      - mypy-extensions==1.0.0
+      - nest-asyncio==1.6.0
+      - networkx==3.3
+      - newspaper3k==0.2.8
+      - ninja==1.11.1.1
+      - nltk==3.8.1
+      - numba==0.59.1
+      - numpy==1.26.4
+      - nvidia-cublas-cu12==12.1.3.1
+      - nvidia-cuda-cupti-cu12==12.1.105
+      - nvidia-cuda-nvrtc-cu12==12.1.105
+      - nvidia-cuda-runtime-cu12==12.1.105
+      - nvidia-cudnn-cu12==8.9.2.26
+      - nvidia-cufft-cu12==11.0.2.54
+      - nvidia-curand-cu12==10.3.2.106
+      - nvidia-cusolver-cu12==11.4.5.107
+      - nvidia-cusparse-cu12==12.1.0.106
+      - nvidia-ml-py==12.550.52
+      - nvidia-nccl-cu12==2.19.3
+      - nvidia-nvjitlink-cu12==12.4.127
+      - nvidia-nvtx-cu12==12.1.105
+      - openai==1.23.6
+      - orjson==3.10.1
+      - outcome==1.3.0.post0
+      - outlines==0.0.34
+      - packaging==23.2
+      - pandas==2.2.2
+      - pillow==10.3.0
+      - playwright==1.43.0
+      - portalocker==2.8.2
+      - prometheus-client==0.20.0
+      - protobuf==4.25.3
+      - psutil==5.9.8
+      - py-cpuinfo==9.0.0
+      - pyaml==23.12.0
+      - pydantic==2.7.1
+      - pydantic-core==2.18.2
+      - pyee==11.1.0
+      - pypdf==4.2.0
+      - pysocks==1.7.1
+      - python-dateutil==2.9.0.post0
+      - python-dotenv==1.0.1
+      - pytz==2024.1
+      - pyyaml==6.0.1
+      - qdrant-client==1.9.0
+      - ray==2.12.0
+      - referencing==0.35.0
+      - regex==2024.4.16
+      - requests==2.31.0
+      - requests-file==2.0.0
+      - retrying==1.3.4
+      - rpds-py==0.18.0
+      - safetensors==0.4.3
+      - scikit-learn==1.4.2
+      - scipy==1.13.0
+      - selenium==4.20.0
+      - sentence-transformers==2.7.0
+      - sentencepiece==0.2.0
+      - sgmllib3k==1.0.0
+      - six==1.16.0
+      - sniffio==1.3.1
+      - sortedcontainers==2.4.0
+      - soupsieve==2.5
+      - sqlalchemy==2.0.29
+      - starlette==0.37.2
+      - striprtf==0.0.26
+      - sympy==1.12
+      - tenacity==8.2.3
+      - threadpoolctl==3.4.0
+      - tiktoken==0.6.0
+      - tinysegmenter==0.3
+      - tldextract==5.1.2
+      - tokenizers==0.19.1
+      - torch==2.2.1
+      - tqdm==4.66.2
+      - transformers==4.40.1
+      - trio==0.25.0
+      - trio-websocket==0.11.1
+      - triton==2.2.0
+      - typing-extensions==4.11.0
+      - typing-inspect==0.9.0
+      - tzdata==2024.1
+      - urllib3==2.2.1
+      - uvicorn==0.29.0
+      - uvloop==0.19.0
+      - vllm==0.4.1
+      - vllm-nccl-cu12==2.18.1.0.4.0
+      - watchfiles==0.21.0
+      - websockets==12.0
+      - wrapt==1.16.0
+      - wsproto==1.2.0
+      - xformers==0.0.25
+      - yarl==1.9.4
+prefix: /home/ubuntu/miniforge3/envs/rag
diff --git a/cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/rag-langchain-vllm-mistral.py b/cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/rag-langchain-vllm-mistral.py
@@ -13,8 +13,8 @@
 documents = loader.load_data(
     sitemap_url='https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/frpj5kvxryk1/b/thisIsThePlace/o/latest.xml'
 )
-for document in documents:
-    print(document.metadata['Source'])
+# for document in documents:
+#    print(document.metadata['Source'])
 
 # local Docker-based instance of Qdrant
 client = QdrantClient(
diff --git a/cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/requirements.txt b/cloud-infrastructure/ai-infra-gpu/AI Infrastructure/rag-langchain-vllm-mistral/requirements.txt