Skip to content

Commit 5ddea63

Browse files
authored
Merge pull request #4 from trustyai-explainability/main
[pull] main from trustyai-explainability:main
2 parents ce1679b + 0b1cbf3 commit 5ddea63

26 files changed

+981
-1435
lines changed

.vscode/launch.json

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,12 @@
44
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
55
"version": "0.2.0",
66
"configurations": [
7-
87
{
9-
"name": "Debug Ragas Distribution -- Remote",
8+
"name": "Debug Ragas Distribution",
109
"type": "debugpy",
1110
"request": "launch",
1211
"module": "llama_stack.cli.llama",
13-
"args": ["stack", "run", "distribution/run-remote.yaml"],
14-
"cwd": "${workspaceFolder}",
15-
"envFile": "${workspaceFolder}/.env",
16-
"justMyCode": false
17-
},
18-
{
19-
"name": "Debug Ragas Distribution -- Inline",
20-
"type": "debugpy",
21-
"request": "launch",
22-
"module": "llama_stack.cli.llama",
23-
"args": ["stack", "run", "distribution/run-inline.yaml"],
12+
"args": ["stack", "run", "distribution/run.yaml"],
2413
"cwd": "${workspaceFolder}",
2514
"envFile": "${workspaceFolder}/.env",
2615
"justMyCode": false

README.md

Lines changed: 22 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ This repository implements [Ragas](https://github.com/explodinggradients/ragas)
1414
The goal is to provide all of Ragas' evaluation functionality over Llama Stack's eval API, while leveraging the Llama Stack's built-in APIs for inference (llms and embeddings), datasets, and benchmarks.
1515

1616
There are two versions of the provider:
17-
- `inline`: runs the Ragas evaluation in the same process as the Llama Stack server.
18-
- `remote`: runs the Ragas evaluation in a remote process, using Kubeflow Pipelines.
17+
- `inline`: runs the Ragas evaluation in the same process as the Llama Stack server. This is always available with the base installation.
18+
- `remote`: runs the Ragas evaluation in a remote process, using Kubeflow Pipelines. Only available when remote dependencies are installed with `pip install llama-stack-provider-ragas[remote]`.
1919

2020
## Prerequisites
2121
- Python 3.12
@@ -41,12 +41,29 @@ There are two versions of the provider:
4141
```
4242
- The sample LS distributions (one for inline and one for remote provider) is a simple LS distribution that uses Ollama for inference and embeddings. See the provider-specific sections below for setup and run commands.
4343
44-
### Remote provider (default)
44+
### Inline provider (default with base installation)
45+
46+
Create a `.env` file with the required environment variable:
47+
```bash
48+
EMBEDDING_MODEL=ollama/all-minilm:l6-v2
49+
```
50+
51+
Run the server:
52+
```bash
53+
dotenv run uv run llama stack run distribution/run.yaml
54+
```
55+
56+
### Remote provider (requires optional dependencies)
57+
58+
First install the remote dependencies:
59+
```bash
60+
uv pip install -e ".[remote]"
61+
```
4562
4663
Create a `.env` file with the following:
4764
```bash
4865
# Required for both inline and remote
49-
EMBEDDING_MODEL=all-MiniLM-L6-v2
66+
EMBEDDING_MODEL=ollama/all-minilm:l6-v2
5067
5168
# Required for remote provider
5269
KUBEFLOW_LLAMA_STACK_URL=<your-llama-stack-url>
@@ -75,22 +92,9 @@ Where:
7592
7693
Run the server:
7794
```bash
78-
dotenv run uv run llama stack run distribution/run-remote.yaml
79-
```
80-
81-
### Inline provider (need to specify `.inline` in the module name)
82-
83-
Create a `.env` file with the required environment variable:
84-
```bash
85-
EMBEDDING_MODEL=all-MiniLM-L6-v2
86-
```
87-
88-
Run the server:
89-
```bash
90-
dotenv run uv run llama stack run distribution/run-inline.yaml
95+
dotenv run uv run llama stack run distribution/run.yaml
9196
```
9297
93-
You will notice that `run-inline.yaml` file has the module name as `llama_stack_provider_ragas.inline`, in order to specify the inline provider.
9498
9599
## Usage
96100
See the demos in the `demos` directory.
Lines changed: 550 additions & 276 deletions
Large diffs are not rendered by default.

demos/inline_demo.ipynb

Lines changed: 0 additions & 880 deletions
This file was deleted.
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Deploying Llama Stack on OpenShift AI with the remote Ragas eval provider
2+
3+
## Prerequisites
4+
* OpenShift AI or Open Data Hub installed on your OpenShift Cluster
5+
* Data Science Pipeline Server configured
6+
* Llama Stack Operator installed
7+
* A VLLM hosted Model either through Kserve or MaaS. You can follow these [docs](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_cloud_service/1/html/working_with_rag/deploying-a-rag-stack-in-a-data-science-project_rag#Deploying-a-llama-model-with-kserve_rag) until step 3.4
8+
9+
## Setup
10+
Create a secret for storing your model's information.
11+
```
12+
export INFERENCE_MODEL="llama-3-2-3b"
13+
export VLLM_URL="https://llama-32-3b-instruct-predictor:8443/v1"
14+
export VLLM_TLS_VERIFY="false" # Use "true" in production!
15+
export VLLM_API_TOKEN="<token identifier>"
16+
17+
oc create secret generic llama-stack-inference-model-secret \
18+
--from-literal INFERENCE_MODEL="$INFERENCE_MODEL" \
19+
--from-literal VLLM_URL="$VLLM_URL" \
20+
--from-literal VLLM_TLS_VERIFY="$VLLM_TLS_VERIFY" \
21+
--from-literal VLLM_API_TOKEN="$VLLM_API_TOKEN"
22+
```
23+
24+
## Setup Deployment files
25+
### Configuring the `kubeflow-ragas-config` ConfigMap
26+
Update the [kubeflow-ragas-config](deployment/kubeflow-ragas-config.yaml) with the following data:
27+
``` bash
28+
# See project README for more details
29+
EMBEDDING_MODEL=all-MiniLM-L6-v2
30+
KUBEFLOW_LLAMA_STACK_URL=<your-llama-stack-url>
31+
KUBEFLOW_PIPELINES_ENDPOINT=<your-kfp-endpoint>
32+
KUBEFLOW_NAMESPACE=<your-namespace>
33+
KUBEFLOW_BASE_IMAGE=quay.io/diegosquayorg/my-ragas-provider-image:latest
34+
KUBEFLOW_RESULTS_S3_PREFIX=s3://my-bucket/ragas-results
35+
KUBEFLOW_S3_CREDENTIALS_SECRET_NAME=<secret-name>
36+
```
37+
38+
> [!NOTE]
39+
> The `KUBEFLOW_LLAMA_STACK_URL` must be an external route.
40+
41+
### Configuring the `pipelines_token` Secret
42+
Unfortunately the Llama Stack distribution service account does not have privilages to create pipeline runs. In order to work around this we must provide a user token as a secret to the Llama Stack Distribution.
43+
44+
Create the secret with:
45+
``` bash
46+
# Gather your token with `oc whoami -t`
47+
kubectl create secret generic kubeflow-pipelines-token \
48+
--from-literal=KUBEFLOW_PIPELINES_TOKEN=<your-pipelines-token>
49+
```
50+
51+
## Deploy Llama Stack on OpenShift
52+
You can now deploy the configuration files and the Llama Stack distribution with `oc apply -f deployment/kubeflow-ragas-config.yaml` and `oc apply -f deployment/llama-stack-distribution.yaml`
53+
54+
You should now have a Llama Stack server on OpenShift with the remote ragas eval provider configured.
55+
You can now follow the [remote_demo.ipynb](../../demos/remote_demo.ipynb) demo but ensure you are running it in a Data Science workbench and use the `LLAMA_STACK_URL` defined earlier. Alternatively you can run it locally if you create a Route.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
apiVersion: v1
2+
kind: ConfigMap
3+
metadata:
4+
name: kubeflow-ragas-config
5+
data:
6+
EMBEDDING_MODEL: "all-MiniLM-L6-v2"
7+
KUBEFLOW_LLAMA_STACK_URL: "<your-llama-stack-url>"
8+
KUBEFLOW_PIPELINES_ENDPOINT: "<your-kfp-endpoint>"
9+
KUBEFLOW_NAMESPACE: "<your-namespace>"
10+
KUBEFLOW_BASE_IMAGE: "quay.io/diegosquayorg/my-ragas-provider-image:latest"
11+
KUBEFLOW_RESULTS_S3_PREFIX: "s3://my-bucket/ragas-results"
12+
KUBEFLOW_S3_CREDENTIALS_SECRET_NAME: "<secret-name>"
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
apiVersion: llamastack.io/v1alpha1
2+
kind: LlamaStackDistribution
3+
metadata:
4+
name: lsd-ragas-example
5+
spec:
6+
replicas: 1
7+
server:
8+
containerSpec:
9+
resources:
10+
requests:
11+
cpu: 4
12+
memory: "12Gi"
13+
limits:
14+
cpu: 6
15+
memory: "14Gi"
16+
env:
17+
- name: INFERENCE_MODEL
18+
valueFrom:
19+
secretKeyRef:
20+
key: INFERENCE_MODEL
21+
name: llama-stack-inference-model-secret
22+
optional: true
23+
- name: VLLM_MAX_TOKENS
24+
value: "4096"
25+
- name: VLLM_URL
26+
valueFrom:
27+
secretKeyRef:
28+
key: VLLM_URL
29+
name: llama-stack-inference-model-secret
30+
optional: true
31+
- name: VLLM_TLS_VERIFY
32+
valueFrom:
33+
secretKeyRef:
34+
key: VLLM_TLS_VERIFY
35+
name: llama-stack-inference-model-secret
36+
optional: true
37+
- name: VLLM_API_TOKEN
38+
valueFrom:
39+
secretKeyRef:
40+
key: VLLM_API_TOKEN
41+
name: llama-stack-inference-model-secret
42+
optional: true
43+
- name: MILVUS_DB_PATH
44+
value: ~/milvus.db
45+
- name: FMS_ORCHESTRATOR_URL
46+
value: "http://localhost"
47+
- name: KUBEFLOW_PIPELINES_ENDPOINT
48+
valueFrom:
49+
configMapKeyRef:
50+
key: KUBEFLOW_PIPELINES_ENDPOINT
51+
name: kubeflow-ragas-config
52+
optional: true
53+
- name: KUBEFLOW_NAMESPACE
54+
valueFrom:
55+
configMapKeyRef:
56+
key: KUBEFLOW_NAMESPACE
57+
name: kubeflow-ragas-config
58+
optional: true
59+
- name: KUBEFLOW_BASE_IMAGE
60+
valueFrom:
61+
configMapKeyRef:
62+
key: KUBEFLOW_BASE_IMAGE
63+
name: kubeflow-ragas-config
64+
optional: true
65+
- name: KUBEFLOW_LLAMA_STACK_URL
66+
valueFrom:
67+
configMapKeyRef:
68+
key: KUBEFLOW_LLAMA_STACK_URL
69+
name: kubeflow-ragas-config
70+
optional: true
71+
- name: KUBEFLOW_RESULTS_S3_PREFIX
72+
valueFrom:
73+
configMapKeyRef:
74+
key: KUBEFLOW_RESULTS_S3_PREFIX
75+
name: kubeflow-ragas-config
76+
optional: true
77+
- name: KUBEFLOW_S3_CREDENTIALS_SECRET_NAME
78+
valueFrom:
79+
configMapKeyRef:
80+
key: KUBEFLOW_S3_CREDENTIALS_SECRET_NAME
81+
name: kubeflow-ragas-config
82+
optional: true
83+
- name: EMBEDDING_MODEL
84+
valueFrom:
85+
configMapKeyRef:
86+
key: EMBEDDING_MODEL
87+
name: kubeflow-ragas-config
88+
optional: true
89+
- name: KUBEFLOW_PIPELINES_TOKEN
90+
valueFrom:
91+
secretKeyRef:
92+
key: KUBEFLOW_PIPELINES_TOKEN
93+
name: kubeflow-pipelines-token
94+
optional: true
95+
- name: AWS_ACCESS_KEY_ID
96+
valueFrom:
97+
secretKeyRef:
98+
key: AWS_ACCESS_KEY_ID
99+
name: aws-credentials
100+
optional: true
101+
- name: AWS_SECRET_ACCESS_KEY
102+
valueFrom:
103+
secretKeyRef:
104+
key: AWS_SECRET_ACCESS_KEY
105+
name: aws-credentials
106+
optional: true
107+
- name: AWS_DEFAULT_REGION
108+
valueFrom:
109+
secretKeyRef:
110+
key: AWS_DEFAULT_REGION
111+
name: aws-credentials
112+
optional: true
113+
name: llama-stack
114+
port: 8321
115+
distribution:
116+
name: rh-dev

distribution/run-inline.yaml

Lines changed: 0 additions & 61 deletions
This file was deleted.
Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ apis:
99
- datasetio
1010
providers:
1111
eval:
12-
- provider_id: trustyai_ragas
12+
- provider_id: ${env.KUBEFLOW_LLAMA_STACK_URL:+trustyai_ragas_remote}
1313
provider_type: remote::trustyai_ragas
14-
module: llama_stack_provider_ragas
14+
module: llama_stack_provider_ragas.remote
1515
config:
1616
embedding_model: ${env.EMBEDDING_MODEL}
1717
kubeflow_config:
@@ -21,7 +21,12 @@ providers:
2121
namespace: ${env.KUBEFLOW_NAMESPACE}
2222
llama_stack_url: ${env.KUBEFLOW_LLAMA_STACK_URL}
2323
base_image: ${env.KUBEFLOW_BASE_IMAGE}
24-
pipelines_token: ${env.KUBEFLOW_PIPELINES_TOKEN:=}
24+
pipelines_api_token: ${env.KUBEFLOW_PIPELINES_TOKEN:=}
25+
- provider_id: ${env.EMBEDDING_MODEL:+trustyai_ragas_inline}
26+
provider_type: inline::trustyai_ragas
27+
module: llama_stack_provider_ragas.inline
28+
config:
29+
embedding_model: ${env.EMBEDDING_MODEL}
2530
datasetio:
2631
- provider_id: localfs
2732
provider_type: inline::localfs

docs/modules/ROOT/pages/index.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ The goal is to provide all of Ragas' evaluation functionality over Llama Stack's
1515

1616
There are two versions of the provider:
1717

18-
* `remote`: runs the Ragas evaluation in a remote process, using Kubeflow Pipelines. This is the *default* when using the module-based import.
19-
* `inline`: runs the Ragas evaluation in the same process as the Llama Stack server.
18+
* `inline`: runs the Ragas evaluation in the same process as the Llama Stack server. This is always available with the base installation.
19+
* `remote`: runs the Ragas evaluation in a remote process, using Kubeflow Pipelines. Only available when remote dependencies are installed with `pip install llama-stack-provider-ragas[remote]`.
2020

2121
== Getting Started
2222

0 commit comments

Comments
 (0)