Skip to content

Commit 6d0e4bc

Browse files
committed
add ADB and OKE automation
1 parent 65785a6 commit 6d0e4bc

36 files changed

+3131
-1
lines changed

agentic_rag/.gitignore

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ __pycache__/
77
venv/
88
env/
99
.env
10+
kubeconfig
1011

1112
# IDE
1213
.vscode/
@@ -19,6 +20,27 @@ env/
1920
embeddings/
2021
chroma_db/
2122
docs/*.json
23+
**/.certs
24+
**/node_modules
25+
k8s/kustom/demo/config.yaml
26+
k8s/kustom/demo/wallet/
27+
**/generated/
28+
29+
# Terraform
30+
**/.terraform/*
31+
*.plan
32+
*.tfstate
33+
*.tfstate.*
34+
crash.log
35+
crash.*.log
36+
*.tfvars
37+
*.tfvars.json
38+
override.tf
39+
override.tf.json
40+
*_override.tf
41+
*_override.tf.json
42+
.terraformrc
43+
terraform.rc
2244

2345
# Distribution / packaging
2446
dist/

agentic_rag/DEPLOY.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# Deploy with Terraform and Kustomize
2+
3+
## TODOS
4+
5+
- Hugging face token should be a secret
6+
- PVCs and deployments in separate files
7+
- multiple deployments/pods for different functions
8+
- Consider include installation of driver on Kustomize https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml
9+
- Hugging Face Token optional
10+
- Autonomous for Vector Search
11+
12+
## Deploy Infrastructure
13+
14+
Install scripts dependencies
15+
16+
```bash
17+
cd scripts/ && npm install && cd ..
18+
```
19+
20+
Set environment (answer questions) and generate Terraform `tfvars` file.
21+
22+
```bash
23+
zx scripts/setenv.mjs
24+
```
25+
26+
> Alternative: One liner for the yellow commands (for easy copy paste)
27+
>
28+
> ```bash
29+
> cd tf && terraform init && terraform apply -auto-approve
30+
> ```
31+
32+
Come back to root folder
33+
34+
```bash
35+
cd ..
36+
```
37+
38+
Prepare Kubeconfig and namespace:
39+
40+
```bash
41+
zx scripts/kustom.mjs
42+
```
43+
44+
## Deploy Application
45+
46+
Export kubeconfig to get access to the Kubernetes Cluster
47+
48+
```bash
49+
export KUBECONFIG="$(pwd)/tf/generated/kubeconfig"
50+
```
51+
52+
Check everything works
53+
54+
```bash
55+
kubectl cluster-info
56+
```
57+
58+
Deploy the production overlay
59+
60+
```bash
61+
kubectl apply -k k8s/kustom/overlays/prod
62+
```
63+
64+
Check all pods are Ready:
65+
66+
```bash
67+
kubectl wait pod --all --for=condition=Ready --namespace=agentic-rag
68+
```
69+
70+
Get Gradio Live URL:
71+
72+
```bash
73+
kubectl logs $(kubectl get po -n agentic-rag -l app=agentic-rag -o name) -n agentic-rag | grep "Running on public URL"
74+
```
75+
76+
Open the URL from the command before in your browser.
77+
78+
Also, you could get the Load Balancer Public IP address:
79+
80+
```bash
81+
echo "http://$(kubectl get service \
82+
-n agentic-rag \
83+
-o jsonpath='{.items[?(@.spec.type=="LoadBalancer")].status.loadBalancer.ingress[0].ip}')"
84+
```
85+
86+
## Clean up
87+
88+
Delete the production overlay
89+
90+
```bash
91+
kubectl delete -k k8s/kustom/overlays/prod
92+
```
93+
94+
Destroy infrastructure with Terraform.
95+
96+
```bash
97+
cd tf
98+
```
99+
100+
```bash
101+
terraform destroy -auto-approve
102+
```
103+
104+
```bash
105+
cd ..
106+
```
107+
108+
Clean up the artifacts and config files
109+
110+
```bash
111+
zx scripts/clean.mjs
112+
```

agentic_rag/OraDBVectorStore.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,19 @@ def __init__(self, persist_directory: str = "embeddings"):
2222
username = credentials.get("ORACLE_DB_USERNAME", "ADMIN")
2323
password = credentials.get("ORACLE_DB_PASSWORD", "")
2424
dsn = credentials.get("ORACLE_DB_DSN", "")
25+
wallet_path = credentials.get("ORACLE_DB_WALLET_LOCATION")
26+
wallet_password = credentials.get("ORACLE_DB_WALLET_PASSWORD")
2527

2628
if not password or not dsn:
2729
raise ValueError("Oracle DB credentials not found in config.yaml. Please set ORACLE_DB_USERNAME, ORACLE_DB_PASSWORD, and ORACLE_DB_DSN.")
2830

2931
# Connect to the database
3032
try:
31-
conn23c = oracledb.connect(user=username, password=password, dsn=dsn)
33+
if not wallet_path:
34+
conn23c = oracledb.connect(user=username, password=password, dsn=dsn)
35+
else:
36+
conn23c = oracledb.connect(user=username, password=password, dsn=dsn,
37+
wallet_location=wallet_path, wallet_password=wallet_password)
3238
print("Oracle DB Connection successful!")
3339
except Exception as e:
3440
print("Oracle DB Connection failed!", e)
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
HUGGING_FACE_HUB_TOKEN: "{{{ hugging_face_token }}}"
2+
ORACLE_DB_USERNAME: "{{{ adb_username }}}"
3+
ORACLE_DB_PASSWORD: "{{{ adb_admin_password }}}"
4+
ORACLE_DB_DSN: "{{{ adb_service_name }}}"
5+
ORACLE_DB_WALLET_LOCATION: "{{{ adb_wallet_location }}}"
6+
ORACLE_DB_WALLET_PASSWORD: "{{{ adb_admin_password }}}"
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: agentic-rag
5+
labels:
6+
app: agentic-rag
7+
spec:
8+
replicas: 1
9+
selector:
10+
matchLabels:
11+
app: agentic-rag
12+
template:
13+
metadata:
14+
labels:
15+
app: agentic-rag
16+
spec:
17+
tolerations:
18+
- key: "nvidia.com/gpu"
19+
operator: "Equal"
20+
value: "present"
21+
effect: "NoSchedule"
22+
initContainers:
23+
- name: unzip
24+
image: busybox
25+
command: ["unzip", "/app/walletzip/wallet.zip", "-d", "/app/wallet"]
26+
volumeMounts:
27+
- name: wallet-config
28+
mountPath: /walletzip
29+
- name: wallet-volume
30+
mountPath: /wallet
31+
containers:
32+
- name: agentic-rag
33+
image: python:3.10-slim
34+
resources:
35+
requests:
36+
memory: "8Gi"
37+
cpu: "2"
38+
ephemeral-storage: "50Gi" # Add this
39+
limits:
40+
memory: "16Gi"
41+
cpu: "4"
42+
ephemeral-storage: "100Gi" # Add this
43+
ports:
44+
- containerPort: 7860
45+
name: gradio
46+
- containerPort: 11434
47+
name: ollama-api
48+
volumeMounts:
49+
- name: config-volume
50+
mountPath: /app/config.yaml
51+
subPath: config.yaml
52+
- name: wallet-config
53+
mountPath: /app/walletzip
54+
- name: wallet-volume
55+
mountPath: /app/wallet
56+
- name: data-volume
57+
mountPath: /app/embeddings
58+
- name: chroma-volume
59+
mountPath: /app/chroma_db
60+
- name: ollama-models
61+
mountPath: /root/.ollama
62+
command: ["/bin/bash", "-c"]
63+
args:
64+
- |
65+
apt-get update && apt-get install -y git curl gnupg
66+
67+
# Install NVIDIA drivers and CUDA
68+
echo "Installing NVIDIA drivers and CUDA..."
69+
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
70+
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
71+
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
72+
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
73+
apt-get update && apt-get install -y nvidia-container-toolkit
74+
75+
# Verify GPU is available
76+
echo "Verifying GPU availability..."
77+
nvidia-smi || echo "WARNING: nvidia-smi command failed. GPU might not be properly configured."
78+
79+
# Install Ollama
80+
echo "Installing Ollama..."
81+
curl -fsSL https://ollama.com/install.sh | sh
82+
83+
# Configure Ollama to use GPU
84+
echo "Configuring Ollama for GPU usage..."
85+
mkdir -p /root/.ollama
86+
echo '{"gpu": {"enable": true}}' > /root/.ollama/config.json
87+
88+
# Start Ollama in the background with GPU support
89+
echo "Starting Ollama service with GPU support..."
90+
ollama serve &
91+
92+
# Wait for Ollama to be ready
93+
echo "Waiting for Ollama to be ready..."
94+
until curl -s http://localhost:11434/api/tags >/dev/null; do
95+
sleep 5
96+
done
97+
98+
# Verify models are using GPU
99+
echo "Verifying models are using GPU..."
100+
curl -s http://localhost:11434/api/tags | grep -q "llama3" && echo "llama3 model is available"
101+
102+
# Clone and set up the application
103+
cd /app
104+
git clone https://github.com/vmleon/devrel-labs.git
105+
cd devrel-labs/agentic_rag
106+
pip install -r requirements.txt
107+
108+
# Start the Gradio app
109+
echo "Starting Gradio application..."
110+
python gradio_app.py
111+
env:
112+
- name: PYTHONUNBUFFERED
113+
value: "1"
114+
- name: OLLAMA_HOST
115+
value: "http://localhost:11434"
116+
- name: NVIDIA_VISIBLE_DEVICES
117+
value: "all"
118+
- name: NVIDIA_DRIVER_CAPABILITIES
119+
value: "compute,utility"
120+
- name: TORCH_CUDA_ARCH_LIST
121+
value: "7.0;7.5;8.0;8.6"
122+
volumes:
123+
- name: config-volume
124+
configMap:
125+
name: agentic-rag-config
126+
- name: wallet-config
127+
configMap:
128+
name: wallet-zip
129+
- name: wallet-volume
130+
emptyDir:
131+
sizeLimit: 50Mi
132+
- name: data-volume
133+
persistentVolumeClaim:
134+
claimName: agentic-rag-data-pvc
135+
- name: chroma-volume
136+
persistentVolumeClaim:
137+
claimName: agentic-rag-chroma-pvc
138+
- name: ollama-models
139+
persistentVolumeClaim:
140+
claimName: ollama-models-pvc
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
resources:
2+
- pvcs.yaml
3+
- deployment.yaml
4+
- service.yaml
5+
configMapGenerator:
6+
- name: agentic-rag-config
7+
files:
8+
- config.yaml
9+
- name: wallet-zip
10+
files:
11+
- wallet/wallet.zip
12+
namespace: agentic-rag

agentic_rag/k8s/kustom/demo/pvcs.yaml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
apiVersion: v1
2+
kind: PersistentVolumeClaim
3+
metadata:
4+
name: agentic-rag-data-pvc
5+
spec:
6+
accessModes:
7+
- ReadWriteOnce
8+
resources:
9+
requests:
10+
storage: 50Gi
11+
---
12+
apiVersion: v1
13+
kind: PersistentVolumeClaim
14+
metadata:
15+
name: agentic-rag-chroma-pvc
16+
spec:
17+
accessModes:
18+
- ReadWriteOnce
19+
resources:
20+
requests:
21+
storage: 50Gi
22+
---
23+
apiVersion: v1
24+
kind: PersistentVolumeClaim
25+
metadata:
26+
name: ollama-models-pvc
27+
spec:
28+
accessModes:
29+
- ReadWriteOnce
30+
resources:
31+
requests:
32+
storage: 50Gi # Larger storage for model files
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: agentic-rag
5+
labels:
6+
app: agentic-rag
7+
spec:
8+
type: LoadBalancer # Use NodePort if LoadBalancer is not available
9+
ports:
10+
- port: 80
11+
targetPort: 7860
12+
protocol: TCP
13+
name: http
14+
selector:
15+
app: agentic-rag
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
resources:
2+
- "../../demo"

0 commit comments

Comments
 (0)