Skip to content

Commit bb0bce5

Browse files
authored
Merge pull request #35 from vmleon/agentic_rag_automation
Agentic rag automation
2 parents 16465ce + e911675 commit bb0bce5

39 files changed

+3187
-1
lines changed

agentic_rag/.gitignore

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ __pycache__/
77
venv/
88
env/
99
.env
10+
kubeconfig
1011

1112
# IDE
1213
.vscode/
@@ -19,6 +20,27 @@ env/
1920
embeddings/
2021
chroma_db/
2122
docs/*.json
23+
**/.certs
24+
**/node_modules
25+
k8s/kustom/demo/config.yaml
26+
k8s/kustom/demo/wallet/
27+
**/generated/
28+
29+
# Terraform
30+
**/.terraform/*
31+
*.plan
32+
*.tfstate
33+
*.tfstate.*
34+
crash.log
35+
crash.*.log
36+
*.tfvars
37+
*.tfvars.json
38+
override.tf
39+
override.tf.json
40+
*_override.tf
41+
*_override.tf.json
42+
.terraformrc
43+
terraform.rc
2244

2345
# Distribution / packaging
2446
dist/

agentic_rag/DEPLOY.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Deploy with Terraform and Kustomize
2+
3+
## TODOS
4+
5+
- Multiple containers for different functions
6+
- Gradio
7+
- Agents / Inference
8+
- Database Access
9+
- Hugging face token should be a secret
10+
- Liveness and Readiness
11+
- Use Load balancer instead of Gradio Live
12+
- Autoscaling
13+
14+
## Deploy Infrastructure
15+
16+
Install scripts dependencies
17+
18+
```bash
19+
cd scripts/ && npm install && cd ..
20+
```
21+
22+
Set environment (answer questions) and generate Terraform `tfvars` file.
23+
24+
```bash
25+
zx scripts/setenv.mjs
26+
```
27+
28+
> Alternative: One liner for the yellow commands (for easy copy paste)
29+
>
30+
> ```bash
31+
> cd tf && terraform init && terraform apply -auto-approve
32+
> ```
33+
34+
Come back to root folder
35+
36+
```bash
37+
cd ..
38+
```
39+
40+
Prepare Kubeconfig and namespace:
41+
42+
```bash
43+
zx scripts/kustom.mjs
44+
```
45+
46+
## Deploy Application
47+
48+
Export kubeconfig to get access to the Kubernetes Cluster
49+
50+
```bash
51+
export KUBECONFIG="$(pwd)/tf/generated/kubeconfig"
52+
```
53+
54+
Check everything works
55+
56+
```bash
57+
kubectl cluster-info
58+
```
59+
60+
Deploy the production overlay
61+
62+
```bash
63+
kubectl apply -k k8s/kustom/overlays/prod
64+
```
65+
66+
Check all pods are Ready:
67+
68+
```bash
69+
kubectl get po --namespace=agentic-rag
70+
```
71+
72+
Get Gradio Live URL:
73+
74+
```bash
75+
kubectl logs $(kubectl get po -n agentic-rag -l app=agentic-rag -o name) -n agentic-rag | grep "Running on public URL"
76+
```
77+
78+
Open the URL from the command before in your browser.
79+
80+
Also, you could get the Load Balancer Public IP address:
81+
82+
```bash
83+
echo "http://$(kubectl get service \
84+
-n agentic-rag \
85+
-o jsonpath='{.items[?(@.spec.type=="LoadBalancer")].status.loadBalancer.ingress[0].ip}')"
86+
```
87+
88+
To troubleshoot connect to the container
89+
90+
```bash
91+
kubectl exec -it $(kubectl get po -n agentic-rag -l app=agentic-rag -o name) -n agentic-rag -- sh
92+
```
93+
94+
## Clean up
95+
96+
Delete the production overlay
97+
98+
```bash
99+
kubectl delete -k k8s/kustom/overlays/prod
100+
```
101+
102+
Destroy infrastructure with Terraform.
103+
104+
```bash
105+
cd tf
106+
```
107+
108+
```bash
109+
terraform destroy -auto-approve
110+
```
111+
112+
```bash
113+
cd ..
114+
```
115+
116+
Clean up the artifacts and config files
117+
118+
```bash
119+
zx scripts/clean.mjs
120+
```

agentic_rag/OraDBVectorStore.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,21 @@ def __init__(self, persist_directory: str = "embeddings"):
2222
username = credentials.get("ORACLE_DB_USERNAME", "ADMIN")
2323
password = credentials.get("ORACLE_DB_PASSWORD", "")
2424
dsn = credentials.get("ORACLE_DB_DSN", "")
25+
wallet_path = credentials.get("ORACLE_DB_WALLET_LOCATION")
26+
wallet_password = credentials.get("ORACLE_DB_WALLET_PASSWORD")
2527

2628
if not password or not dsn:
2729
raise ValueError("Oracle DB credentials not found in config.yaml. Please set ORACLE_DB_USERNAME, ORACLE_DB_PASSWORD, and ORACLE_DB_DSN.")
2830

2931
# Connect to the database
3032
try:
31-
conn23c = oracledb.connect(user=username, password=password, dsn=dsn)
33+
if not wallet_path:
34+
print(f'Connecting (no wallet) to dsn {dsn} and user {username}')
35+
conn23c = oracledb.connect(user=username, password=password, dsn=dsn)
36+
else:
37+
print(f'Connecting (with wallet) to dsn {dsn} and user {username}')
38+
conn23c = oracledb.connect(user=username, password=password, dsn=dsn,
39+
config_dir=wallet_path, wallet_location=wallet_path, wallet_password=wallet_password)
3240
print("Oracle DB Connection successful!")
3341
except Exception as e:
3442
print("Oracle DB Connection failed!", e)
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
HUGGING_FACE_HUB_TOKEN: "{{{ hugging_face_token }}}"
2+
ORACLE_DB_USERNAME: "{{{ adb_username }}}"
3+
ORACLE_DB_PASSWORD: "{{{ adb_admin_password }}}"
4+
ORACLE_DB_DSN: "{{{ adb_service_name }}}"
5+
ORACLE_DB_WALLET_LOCATION: "{{{ adb_wallet_location }}}"
6+
ORACLE_DB_WALLET_PASSWORD: "{{{ adb_admin_password }}}"
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: agentic-rag
5+
labels:
6+
app: agentic-rag
7+
spec:
8+
replicas: 1
9+
selector:
10+
matchLabels:
11+
app: agentic-rag
12+
template:
13+
metadata:
14+
labels:
15+
app: agentic-rag
16+
spec:
17+
tolerations:
18+
- key: "nvidia.com/gpu"
19+
operator: "Equal"
20+
value: "present"
21+
effect: "NoSchedule"
22+
initContainers:
23+
- name: unzip
24+
image: busybox
25+
command: ["unzip", "/app/walletzip/wallet.zip", "-d", "/app/wallet"]
26+
volumeMounts:
27+
- name: wallet-config
28+
mountPath: /app/walletzip
29+
- name: wallet-volume
30+
mountPath: /app/wallet
31+
containers:
32+
- name: agentic-rag
33+
image: python:3.10-slim
34+
resources:
35+
requests:
36+
memory: "8Gi"
37+
cpu: "2"
38+
ephemeral-storage: "50Gi" # Add this
39+
limits:
40+
memory: "16Gi"
41+
cpu: "4"
42+
ephemeral-storage: "100Gi" # Add this
43+
ports:
44+
- containerPort: 7860
45+
name: gradio
46+
- containerPort: 11434
47+
name: ollama-api
48+
volumeMounts:
49+
- name: config-volume
50+
mountPath: /app/config.yaml
51+
subPath: config.yaml
52+
- name: wallet-config
53+
mountPath: /app/walletzip
54+
- name: wallet-volume
55+
mountPath: /app/wallet
56+
- name: data-volume
57+
mountPath: /app/embeddings
58+
- name: chroma-volume
59+
mountPath: /app/chroma_db
60+
- name: ollama-models
61+
mountPath: /root/.ollama
62+
command: ["/bin/bash", "-c"]
63+
args:
64+
- |
65+
apt-get update && apt-get install -y git curl gnupg
66+
67+
# Install NVIDIA drivers and CUDA
68+
echo "Installing NVIDIA drivers and CUDA..."
69+
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
70+
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
71+
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
72+
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
73+
apt-get update && apt-get install -y nvidia-container-toolkit
74+
75+
# Verify GPU is available
76+
echo "Verifying GPU availability..."
77+
nvidia-smi || echo "WARNING: nvidia-smi command failed. GPU might not be properly configured."
78+
79+
# Install Ollama
80+
echo "Installing Ollama..."
81+
curl -fsSL https://ollama.com/install.sh | sh
82+
83+
# Configure Ollama to use GPU
84+
echo "Configuring Ollama for GPU usage..."
85+
mkdir -p /root/.ollama
86+
echo '{"gpu": {"enable": true}}' > /root/.ollama/config.json
87+
88+
# Start Ollama in the background with GPU support
89+
echo "Starting Ollama service with GPU support..."
90+
ollama serve &
91+
92+
# Wait for Ollama to be ready
93+
echo "Waiting for Ollama to be ready..."
94+
until curl -s http://localhost:11434/api/tags >/dev/null; do
95+
sleep 5
96+
done
97+
98+
# Verify models are using GPU
99+
echo "Verifying models are using GPU..."
100+
curl -s http://localhost:11434/api/tags | grep -q "llama3" && echo "llama3 model is available"
101+
102+
# Clone and set up the application
103+
cd /app
104+
git clone -b agentic_rag_automation https://github.com/vmleon/devrel-labs.git
105+
cd devrel-labs/agentic_rag
106+
pip install -r requirements.txt
107+
108+
# Move config.yaml file to agentic-rag folder
109+
echo "Copying config.yaml to /app/devrel-labs/agentic_rag/config.yaml"
110+
cp /app/config.yaml /app/devrel-labs/agentic_rag/config.yaml
111+
112+
# Start the Gradio app
113+
echo "Starting Gradio application..."
114+
python gradio_app.py
115+
env:
116+
- name: PYTHONUNBUFFERED
117+
value: "1"
118+
- name: OLLAMA_HOST
119+
value: "http://localhost:11434"
120+
- name: NVIDIA_VISIBLE_DEVICES
121+
value: "all"
122+
- name: NVIDIA_DRIVER_CAPABILITIES
123+
value: "compute,utility"
124+
- name: TORCH_CUDA_ARCH_LIST
125+
value: "7.0;7.5;8.0;8.6"
126+
volumes:
127+
- name: config-volume
128+
configMap:
129+
name: agentic-rag-config
130+
- name: wallet-config
131+
configMap:
132+
name: wallet-zip
133+
- name: wallet-volume
134+
emptyDir:
135+
sizeLimit: 50Mi
136+
- name: data-volume
137+
persistentVolumeClaim:
138+
claimName: agentic-rag-data-pvc
139+
- name: chroma-volume
140+
persistentVolumeClaim:
141+
claimName: agentic-rag-chroma-pvc
142+
- name: ollama-models
143+
persistentVolumeClaim:
144+
claimName: ollama-models-pvc
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
resources:
2+
- pvcs.yaml
3+
- deployment.yaml
4+
- service.yaml
5+
configMapGenerator:
6+
- name: agentic-rag-config
7+
files:
8+
- config.yaml
9+
- name: wallet-zip
10+
files:
11+
- wallet/wallet.zip
12+
namespace: agentic-rag

agentic_rag/k8s/kustom/demo/pvcs.yaml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
apiVersion: v1
2+
kind: PersistentVolumeClaim
3+
metadata:
4+
name: agentic-rag-data-pvc
5+
spec:
6+
accessModes:
7+
- ReadWriteOnce
8+
resources:
9+
requests:
10+
storage: 50Gi
11+
---
12+
apiVersion: v1
13+
kind: PersistentVolumeClaim
14+
metadata:
15+
name: agentic-rag-chroma-pvc
16+
spec:
17+
accessModes:
18+
- ReadWriteOnce
19+
resources:
20+
requests:
21+
storage: 50Gi
22+
---
23+
apiVersion: v1
24+
kind: PersistentVolumeClaim
25+
metadata:
26+
name: ollama-models-pvc
27+
spec:
28+
accessModes:
29+
- ReadWriteOnce
30+
resources:
31+
requests:
32+
storage: 50Gi # Larger storage for model files
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: agentic-rag
5+
labels:
6+
app: agentic-rag
7+
spec:
8+
type: LoadBalancer # Use NodePort if LoadBalancer is not available
9+
ports:
10+
- port: 80
11+
targetPort: 7860
12+
protocol: TCP
13+
name: http
14+
selector:
15+
app: agentic-rag

0 commit comments

Comments
 (0)