Skip to content

Commit b433776

Browse files
committed
Readme, requirements.txt Updated
1 parent 9f653cf commit b433776

File tree

5 files changed

+219
-44
lines changed

5 files changed

+219
-44
lines changed

model-deployment/containers/rag_llama2/Inference MD/main.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111

1212
fast_app = FastAPI()
13-
model_path = "/opt/ds/model/deployed_model/7B/ggml-model-q4_0.bin"
13+
model_path = "<MODEL_PATH>"
1414

1515
def load_model(model_folder_directory):
1616
embedding = LlamaCppEmbeddings(model_path=model_folder_directory)
@@ -22,8 +22,8 @@ def load_model(model_folder_directory):
2222
except Exception as e:
2323
print("Error: %s", e)
2424

25-
url = "https://0ad84320-52a6-407d-9c82-375bf60e1fc6.us-east4-0.gcp.cloud.qdrant.io"
26-
api_key= "a675QyMVF8SxqY9wNAssu4dwuIpbHGuXj8aZVDPBKX22AJeBGCOhqw"
25+
url = "<QDRANT_URL>"
26+
api_key= "<API_KEY>"
2727

2828

2929
qdrant = None
Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1-
langchain
2-
llama-cpp-python
3-
requests
1+
langchain==0.0.333
2+
llama-cpp-python==0.2.15
3+
requests==2.25.1
44
uvicorn
55
fastapi
6-
qdrant-client
6+
qdrant-client==1.6.9
7+
oci==2.47.1
Lines changed: 201 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,201 @@
1-
## TO ADD
1+
# Overview
2+
This repo provides the setup of RAG using Llama2 and Qdrant vector DB.
3+
# Pre Requisites
4+
## Object Storage Bucket
5+
6+
Object storage bucket is required to save the documents which are provided at time of ingestion in vector DB. [Refer](github.com/oracle-samples/oci-data-science-ai-samples/tree/main/distributed_training#2-object-storage)
7+
8+
## Access to Hugging Face Llama2
9+
10+
Access token from HuggingFace to download Llama2 model. To fine-tune the model, you will first need to access the pre-trained model. The pre-trained model can be obtained from Meta or HuggingFace. In this example, we will use the HuggingFace access token to download the pre-trained model from HuggingFace (by setting the HUGGING_FACE_HUB_TOKEN environment variable). [Refer](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2)
11+
12+
## OCI Logging
13+
When experimenting with new frameworks and models, it is highly advisable to attach log groups to model deployment in order to enable self assistance in debugging. Follow below steps to create log groups.
14+
15+
* Create logging for the model deployment (if you have to already created, you can skip this step)
16+
* Go to the [OCI Logging Service](https://cloud.oracle.com/logging/log-groups) and select `Log Groups`
17+
* Either select one of the existing Log Groups or create a new one
18+
* In the log group create ***two*** `Log`, one predict log and one access log, like:
19+
* Click on the `Create custom log`
20+
* Specify a name (predict|access) and select the log group you want to use
21+
* Under `Create agent configuration` select `Add configuration later`
22+
* Then click `Create agent configuration`
23+
24+
## Required IAM Policies
25+
26+
Public [documentation](https://docs.oracle.com/en-us/iaas/data-science/using/policies.htm).
27+
28+
### Generic Model Deployment policies
29+
`allow group <group-name> to manage data-science-model-deployments in compartment <compartment-name>`
30+
31+
`allow dynamic-group <dynamic-group-name> to manage data-science-model-deployments in compartment <compartment-name>`
32+
33+
### Allows a model deployment to emit logs to the Logging service. You need this policy if you’re using Logging in a model deployment
34+
`allow any-user to use log-content in tenancy where ALL {request.principal.type = 'datasciencemodeldeployment'}`
35+
36+
### Bring your own container [policies](https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm#model_dep_policies_auth__access-logging-service#model_dep_policies_auth__access-custom-container)
37+
`ALL { resource.type = 'datasciencemodeldeployment' }`
38+
39+
`allow dynamic-group <dynamic-group-name> to read repos in compartment <compartment-name> where ANY {request.operation='ReadDockerRepositoryMetadata',request.operation='ReadDockerRepositoryManifest',request.operation='PullDockerLayer' }`
40+
41+
#### If the repository is in the root compartment, allow read for the tenancy
42+
43+
`allow dynamic-group <dynamic-group-name> to read repos in tenancy where ANY {
44+
request.operation='ReadDockerRepositoryMetadata',
45+
request.operation='ReadDockerRepositoryManifest',
46+
request.operation='PullDockerLayer'
47+
}`
48+
49+
#### For user level policies
50+
51+
`allow any-user to read repos in tenancy where ALL { request.principal.type = 'datasciencemodeldeployment' }`
52+
53+
`allow any-user to read repos in compartment <compartment-name> where ALL { request.principal.type = 'datasciencemodeldeployment'}`
54+
55+
### Model Store [export API](https://docs.oracle.com/en-us/iaas/data-science/using/large-model-artifact-export.htm#large-model-artifact-export) for creating model artifacts greater than 6 GB in size
56+
57+
`allow service datascience to manage object-family in compartment <compartment> where ALL {target.bucket.name='<bucket_name>'}`
58+
59+
`allow service objectstorage-<region> to manage object-family in compartment <compartment> where ALL {target.bucket.name='<bucket_name>'}`
60+
61+
### Policy to check Data Science work requests
62+
`allow group <group_name> to manage data-science-work-requests in compartment <compartment_name>`
63+
64+
For all other Data Science policies, please refer these [details](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/distributed_training/README.md#3-oci-policies).
65+
66+
## Notebook session
67+
68+
[Notebook session](docs.oracle.com/en-us/iaas/data-science/using/manage-notebook-sessions.htm) - used to initiate the distributed training and to access the fine-tuned mode
69+
70+
## Compute Instance as basic server
71+
72+
* Setting up Compute Instance. [Refer](https://docs.oracle.com/iaas/Content/Compute/Tasks/launchinginstance.htm)
73+
* Create a compute instance with public subnet with internet gateway. [Refer](https://docs.oracle.com/en/solutions/wls-on-prem-to-oci/use-wizard-create-vcn.html)
74+
* Create a dynamic group and add the compute instance ocid to it. [Refer](https://docs.oracle.com/en-us/iaas/Content/Identity/dynamicgroups/To_create_a_dynamic_group.htm)
75+
76+
Provide the following in policies for the dynamic group
77+
78+
`allow group data-science-model-deployments to manage data_science_projects in compartment <datascience_hol>`
79+
80+
81+
# Deploying the Llama2 Model
82+
83+
Please refer the following [Github Link](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2) to download and deploy the Llama2 model.
84+
85+
# Setting Qdrant
86+
87+
Qdrant integrates smoothly with LangChain, and you can actually use Qdrant within LangChain via the VectorDBQA class. The initial step is to compile all the documents that will act as the foundational knowledge for our LLM. Imagine we place these in a list titled docs. Each item in this docs list is a string containing segments of paragraphs. Please download the required python libraries as defined in "ingestion MD" directories requirements.txt file.
88+
89+
## Qdrant Initialisation
90+
91+
Subsequently, the task is to produce embeddings from these documents. To illustrate, we will utilize a compact model from the sentence-transformers package:
92+
```python
93+
from langchain.vectorstores import Qdrant
94+
from langchain.embeddings import LlamaCppEmbeddings
95+
import qdrant_client
96+
97+
#Load the embeddings model
98+
embedding = LlamaCppEmbeddings(model_path=model_folder_directory,n_gpu_layers=1000)
99+
100+
# Get your Qdrant URL and API Key
101+
url = <QDRANT-URL-HERE>
102+
api_key = <QDRANT-API-KEY-HERE>
103+
104+
# Setting up Qdrant
105+
client = qdrant_client.QdrantClient(
106+
url,
107+
api_key=api_key
108+
)
109+
110+
qdrant = Qdrant(
111+
client=client, collection_name="my_documents",
112+
embeddings=embeddings
113+
)
114+
```
115+
116+
## Qdrant Upload to Vector DB
117+
118+
```python
119+
# If adding for the first time, this method recreate the collection
120+
qdrant = Qdrant.from_texts(
121+
texts, # texts is a list of documents to convert in embeddings and store to vector DB
122+
embedding,
123+
url=url,
124+
api_key=api_key,
125+
collection_name="my_documents"
126+
)
127+
128+
# Adding following texts to the vector DB by calling the same object
129+
qdrant.add_texts(texts) # texts is a list of documents to convert in embeddings and store to vector DB
130+
```
131+
132+
## Qdrant retrieval from vector DB
133+
134+
Qdrant provides retrieval options in similarity search methods such as batch search, range search, geospatial search, distance metrics etc. Here we would leverage similarity search based on the prompt question. 
135+
136+
```python
137+
qdrant = Qdrant(
138+
client=client, collection_name="my_documents",
139+
embeddings=embeddings
140+
)
141+
142+
# Similarity search
143+
docs = qdrant.similarity_search(prompt)
144+
```
145+
146+
147+
# RAG Basic setup
148+
149+
We use the prompt template and QA chain provided by Langchain to make the chatbot, this helps in passing the context and question directly to the Llama2 based model
150+
151+
```python
152+
from langchain.chains.question_answering import load_qa_chain
153+
from langchain.prompts.prompt import PromptTemplate
154+
155+
template = """You are an assistant to the user, you are given some context below, please answer the query of the user with as detail as possible
156+
157+
Context:\"""
158+
{context}
159+
\"""
160+
161+
Question:\"
162+
{question}
163+
\"""
164+
165+
Answer:"""
166+
167+
168+
chain = load_qa_chain(llm, chain_type="stuff", prompt=qa_prompt)
169+
170+
## Retrieve docs from Qdrant Vector DB based upon user prompt
171+
docs = qdrant.similarity_search(user_prompt)
172+
173+
answer = chain({"input_documents": docs, "question": question,"context": docs}, return_only_outputs=True)['output_text']
174+
```
175+
176+
# Hosting streamlit application
177+
178+
Go to the streamlit folder in the directory and install the requirements.txt using pip3 and user the app.py as the template of streamlit application.
179+
180+
The below code define the authentication mechanism after you have setup the dynamic group with the compute instance and have added the required policies as mentioned in the Pre-requisites section.
181+
182+
```python
183+
from oci.auth import signers
184+
import requests
185+
186+
config = {"region": <YOUR_REGION>}
187+
signer = signers.InstancePrincipalsSecurityTokenSigner()
188+
189+
endpoint = <MD_ENDPOINT>
190+
prompt = <USER_PROMPT>
191+
192+
headers = {"content-type": "application/text"}
193+
194+
response = requests.post(endpoint, data=prompt, auth=signer, headers=headers, timeout=200)
195+
```
196+
197+
Use below command to run the application on the server:
198+
```bash
199+
streamlit run app.py
200+
```
201+

model-deployment/containers/rag_llama2/ingestion MD/main.py

Lines changed: 3 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515
fast_app = FastAPI()
1616

17-
model_path = "/opt/ds/model/deployed_model/7B/ggml-model-q4_0.bin"
17+
model_path = "<MODEL_PATH>"
1818

1919
def load_model(model_folder_directory):
2020
embedding = LlamaCppEmbeddings(model_path=model_folder_directory,n_gpu_layers=15000)
@@ -26,8 +26,8 @@ def load_model(model_folder_directory):
2626
except Exception as e:
2727
print("Error: %s", e)
2828

29-
url = "QDRANT_URL"
30-
api_key= "API_KEY"
29+
url = "<QDRANT_URL>"
30+
api_key= "<API_KEY>"
3131

3232
template = """You are an assistant to the user, you are given some context below, please answer the query of the user with as detail as possible
3333
@@ -56,7 +56,6 @@ def load_model(model_folder_directory):
5656
qa_prompt = PromptTemplate.from_template(template)
5757

5858
llm = LlamaCpp(model_path=model_path,n_gpu_layers=15000, n_ctx=2048)
59-
# llm = LlamaCpp(model_path=model_path, n_ctx=2048)
6059

6160
@fast_app.get("/", response_class=HTMLResponse)
6261
def read_root():
@@ -68,23 +67,12 @@ def read_root():
6867
@fast_app.post("/predict")
6968
def model_predict(request: Request, response: Response, data=Body(None)):
7069
global llm, embeddings, qa_prompt, qdrant
71-
print(data)
7270
question = data.decode("utf-8")
73-
print(question)
7471
chain = load_qa_chain(llm, chain_type="stuff", prompt=qa_prompt)
75-
print("OK")
76-
if question =="Hi":
77-
return "I am able to load the embedding"
78-
if question == "Hello":
79-
docs = qdrant.similarity_search(question)
80-
return docs
8172
try:
8273
docs = qdrant.similarity_search(question)
83-
print(docs)
8474
except Exception as e:
85-
print(e)
8675
return e
87-
print(question)
8876
answer = chain({"input_documents": docs, "question": question,"context": docs}, return_only_outputs=True)['output_text']
8977
return answer
9078

model-deployment/containers/rag_llama2/streamlit/app.py

Lines changed: 7 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,19 @@
11
import streamlit as st
22
from streamlit_chat import message
3-
import oci
4-
import time
5-
import os
6-
import oci
3+
from oci.auth import signers
74
import requests
8-
from oci.signer import Signer
95

10-
# token_file = os.path.expanduser("/Users/gagachau/.oci/sessions/OC1/token")
11-
# with open(token_file, 'r') as f:
12-
# token = f.read()
13-
# private_key = oci.signer.load_private_key_from_file("/Users/gagachau/.oci/sessions/OC1/oci_api_key.pem")
14-
# signer = oci.auth.signers.SecurityTokenSigner(token, private_key)
6+
config = {"region": <YOUR_REGION>}
7+
signer = signers.InstancePrincipalsSecurityTokenSigner()
158

9+
endpoint = "<MD_ENDPOINT>"
10+
prompt = "<USER_PROMPT>"
1611

1712
def generate_response(prompt):
18-
# global signer
19-
endpoint = "http://localhost:8080/predict"
13+
global signer, endpoint
2014
headers = {"content-type": "application/text"} # header goes here
21-
# response = requests.post(endpoint, data=prompt, auth=signer, headers=headers)
2215
response = requests.post(endpoint, data=prompt, headers=headers)
23-
res = response.text
24-
print(res)
25-
res = res.replace('\n', '')
26-
res = res.replace("\n", "")
27-
res = res.replace('"', "")
28-
res = res.replace("'", "")
29-
res = res.replace('\\', "")
30-
return res
16+
return response.text
3117

3218
# Create the title and
3319
st.set_page_config(page_title="SQuAD Chatbot")

0 commit comments

Comments
 (0)