Skip to content

Commit 2c60a92

Browse files
authored
Merge pull request #7 from Bobbins228/llama-stack-rhoai
Guide for deploying llama stack with eval provider on openshift && AWS credentials on the LLS CR
2 parents 94dcc4a + 98e88bc commit 2c60a92

File tree

5 files changed

+185
-2
lines changed

5 files changed

+185
-2
lines changed
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Deploying Llama Stack on OpenShift AI with the remote Ragas eval provider
2+
3+
## Prerequisites
4+
* OpenShift AI or Open Data Hub installed on your OpenShift Cluster
5+
* Data Science Pipeline Server configured
6+
* Llama Stack Operator installed
7+
* A VLLM hosted Model either through Kserve or MaaS. You can follow these [docs](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_cloud_service/1/html/working_with_rag/deploying-a-rag-stack-in-a-data-science-project_rag#Deploying-a-llama-model-with-kserve_rag) until step 3.4
8+
9+
## Setup
10+
Create a secret for storing your model's information.
11+
```
12+
export INFERENCE_MODEL="llama-3-2-3b"
13+
export VLLM_URL="https://llama-32-3b-instruct-predictor:8443/v1"
14+
export VLLM_TLS_VERIFY="false" # Use "true" in production!
15+
export VLLM_API_TOKEN="<token identifier>"
16+
17+
oc create secret generic llama-stack-inference-model-secret \
18+
--from-literal INFERENCE_MODEL="$INFERENCE_MODEL" \
19+
--from-literal VLLM_URL="$VLLM_URL" \
20+
--from-literal VLLM_TLS_VERIFY="$VLLM_TLS_VERIFY" \
21+
--from-literal VLLM_API_TOKEN="$VLLM_API_TOKEN"
22+
```
23+
24+
## Setup Deployment files
25+
### Configuring the `kubeflow-ragas-config` ConfigMap
26+
Update the [kubeflow-ragas-config](deployment/kubeflow-ragas-config.yaml) with the following data:
27+
``` bash
28+
# See project README for more details
29+
EMBEDDING_MODEL=all-MiniLM-L6-v2
30+
KUBEFLOW_LLAMA_STACK_URL=<your-llama-stack-url>
31+
KUBEFLOW_PIPELINES_ENDPOINT=<your-kfp-endpoint>
32+
KUBEFLOW_NAMESPACE=<your-namespace>
33+
KUBEFLOW_BASE_IMAGE=quay.io/diegosquayorg/my-ragas-provider-image:latest
34+
KUBEFLOW_RESULTS_S3_PREFIX=s3://my-bucket/ragas-results
35+
KUBEFLOW_S3_CREDENTIALS_SECRET_NAME=<secret-name>
36+
```
37+
38+
> [!NOTE]
39+
> The `KUBEFLOW_LLAMA_STACK_URL` must be an external route.
40+
41+
### Configuring the `pipelines_token` Secret
42+
Unfortunately the Llama Stack distribution service account does not have privilages to create pipeline runs. In order to work around this we must provide a user token as a secret to the Llama Stack Distribution.
43+
44+
Create the secret with:
45+
``` bash
46+
# Gather your token with `oc whoami -t`
47+
kubectl create secret generic kubeflow-pipelines-token \
48+
--from-literal=KUBEFLOW_PIPELINES_TOKEN=<your-pipelines-token>
49+
```
50+
51+
## Deploy Llama Stack on OpenShift
52+
You can now deploy the configuration files and the Llama Stack distribution with `oc apply -f deployment/kubeflow-ragas-config.yaml` and `oc apply -f deployment/llama-stack-distribution.yaml`
53+
54+
You should now have a Llama Stack server on OpenShift with the remote ragas eval provider configured.
55+
You can now follow the [remote_demo.ipynb](../../demos/remote_demo.ipynb) demo but ensure you are running it in a Data Science workbench and use the `LLAMA_STACK_URL` defined earlier. Alternatively you can run it locally if you create a Route.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
apiVersion: v1
2+
kind: ConfigMap
3+
metadata:
4+
name: kubeflow-ragas-config
5+
data:
6+
EMBEDDING_MODEL: "all-MiniLM-L6-v2"
7+
KUBEFLOW_LLAMA_STACK_URL: "<your-llama-stack-url>"
8+
KUBEFLOW_PIPELINES_ENDPOINT: "<your-kfp-endpoint>"
9+
KUBEFLOW_NAMESPACE: "<your-namespace>"
10+
KUBEFLOW_BASE_IMAGE: "quay.io/diegosquayorg/my-ragas-provider-image:latest"
11+
KUBEFLOW_RESULTS_S3_PREFIX: "s3://my-bucket/ragas-results"
12+
KUBEFLOW_S3_CREDENTIALS_SECRET_NAME: "<secret-name>"
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
apiVersion: llamastack.io/v1alpha1
2+
kind: LlamaStackDistribution
3+
metadata:
4+
name: lsd-ragas-example
5+
spec:
6+
replicas: 1
7+
server:
8+
containerSpec:
9+
resources:
10+
requests:
11+
cpu: 4
12+
memory: "12Gi"
13+
limits:
14+
cpu: 6
15+
memory: "14Gi"
16+
env:
17+
- name: INFERENCE_MODEL
18+
valueFrom:
19+
secretKeyRef:
20+
key: INFERENCE_MODEL
21+
name: llama-stack-inference-model-secret
22+
optional: true
23+
- name: VLLM_MAX_TOKENS
24+
value: "4096"
25+
- name: VLLM_URL
26+
valueFrom:
27+
secretKeyRef:
28+
key: VLLM_URL
29+
name: llama-stack-inference-model-secret
30+
optional: true
31+
- name: VLLM_TLS_VERIFY
32+
valueFrom:
33+
secretKeyRef:
34+
key: VLLM_TLS_VERIFY
35+
name: llama-stack-inference-model-secret
36+
optional: true
37+
- name: VLLM_API_TOKEN
38+
valueFrom:
39+
secretKeyRef:
40+
key: VLLM_API_TOKEN
41+
name: llama-stack-inference-model-secret
42+
optional: true
43+
- name: MILVUS_DB_PATH
44+
value: ~/milvus.db
45+
- name: FMS_ORCHESTRATOR_URL
46+
value: "http://localhost"
47+
- name: KUBEFLOW_PIPELINES_ENDPOINT
48+
valueFrom:
49+
configMapKeyRef:
50+
key: KUBEFLOW_PIPELINES_ENDPOINT
51+
name: kubeflow-ragas-config
52+
optional: true
53+
- name: KUBEFLOW_NAMESPACE
54+
valueFrom:
55+
configMapKeyRef:
56+
key: KUBEFLOW_NAMESPACE
57+
name: kubeflow-ragas-config
58+
optional: true
59+
- name: KUBEFLOW_BASE_IMAGE
60+
valueFrom:
61+
configMapKeyRef:
62+
key: KUBEFLOW_BASE_IMAGE
63+
name: kubeflow-ragas-config
64+
optional: true
65+
- name: KUBEFLOW_LLAMA_STACK_URL
66+
valueFrom:
67+
configMapKeyRef:
68+
key: KUBEFLOW_LLAMA_STACK_URL
69+
name: kubeflow-ragas-config
70+
optional: true
71+
- name: KUBEFLOW_RESULTS_S3_PREFIX
72+
valueFrom:
73+
configMapKeyRef:
74+
key: KUBEFLOW_RESULTS_S3_PREFIX
75+
name: kubeflow-ragas-config
76+
optional: true
77+
- name: KUBEFLOW_S3_CREDENTIALS_SECRET_NAME
78+
valueFrom:
79+
configMapKeyRef:
80+
key: KUBEFLOW_S3_CREDENTIALS_SECRET_NAME
81+
name: kubeflow-ragas-config
82+
optional: true
83+
- name: EMBEDDING_MODEL
84+
valueFrom:
85+
configMapKeyRef:
86+
key: EMBEDDING_MODEL
87+
name: kubeflow-ragas-config
88+
optional: true
89+
- name: KUBEFLOW_PIPELINES_TOKEN
90+
valueFrom:
91+
secretKeyRef:
92+
key: KUBEFLOW_PIPELINES_TOKEN
93+
name: kubeflow-pipelines-token
94+
optional: true
95+
- name: AWS_ACCESS_KEY_ID
96+
valueFrom:
97+
secretKeyRef:
98+
key: AWS_ACCESS_KEY_ID
99+
name: aws-credentials
100+
optional: true
101+
- name: AWS_SECRET_ACCESS_KEY
102+
valueFrom:
103+
secretKeyRef:
104+
key: AWS_SECRET_ACCESS_KEY
105+
name: aws-credentials
106+
optional: true
107+
- name: AWS_DEFAULT_REGION
108+
valueFrom:
109+
secretKeyRef:
110+
key: AWS_DEFAULT_REGION
111+
name: aws-credentials
112+
optional: true
113+
name: llama-stack
114+
port: 8321
115+
distribution:
116+
name: rh-dev

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "llama-stack-provider-ragas"
7-
version = "0.3.5"
7+
version = "0.3.6"
88
description = "Ragas evaluation as an out-of-tree Llama Stack provider"
99
readme = "README.md"
1010
requires-python = ">=3.12"

uv.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)