Skip to content

Commit 68c7ed6

Browse files
authored
docs: add zeno visualization integration (#359)
Hi Ragas team! Thanks for this awesome library! I'm part of the team working on [Zeno](https://zenoml.com), an AI evaluation tool. We've had some users working on RAG who mentioned the ragas library, I wrote up a short integration tutorial I thought would be a good addition. Let me know if you need any clarification from our end!
1 parent 4d01af2 commit 68c7ed6

File tree

2 files changed

+229
-0
lines changed

2 files changed

+229
-0
lines changed

docs/howtos/integrations/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,5 @@ llamaindex.ipynb
1010
langchain.ipynb
1111
langsmith.ipynb
1212
langfuse.ipynb
13+
zeno.ipynb
1314
:::
Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Visualizing Ragas Results with Zeno\n",
8+
"\n",
9+
"You can use the [Zeno](https://zenoml.com) evaluation platform to easily visualize and explore the results of your Ragas evaluation.\n",
10+
"\n",
11+
"> Check out what the result of this tutorial looks like [here](https://hub.zenoml.com/project/b35c83b8-0b22-4b9c-aedb-80964011d7a7/ragas%20FICA%20eval)\n",
12+
"\n",
13+
"First, install the `zeno-client` package:\n",
14+
"\n",
15+
"```bash\n",
16+
"pip install zeno-client\n",
17+
"```\n",
18+
"\n",
19+
"Next, create an account at [hub.zenoml.com](https://hub.zenoml.com) and generate an API key on your [account page](https://hub.zenoml.com/account).\n",
20+
"\n",
21+
"We can now pick up the evaluation where we left off at the [Getting Started](../../getstarted/evaluation.md) guide:"
22+
]
23+
},
24+
{
25+
"cell_type": "code",
26+
"execution_count": null,
27+
"metadata": {},
28+
"outputs": [],
29+
"source": [
30+
"import os\n",
31+
"\n",
32+
"import pandas as pd\n",
33+
"from datasets import load_dataset\n",
34+
"from ragas import evaluate\n",
35+
"from ragas.metrics import (\n",
36+
" answer_relevancy,\n",
37+
" context_precision,\n",
38+
" context_recall,\n",
39+
" faithfulness,\n",
40+
")\n",
41+
"from zeno_client import ZenoClient, ZenoMetric"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"# Set API keys\n",
51+
"os.environ[\"OPENAI_API_KEY\"] = \"your-openai-api-key\"\n",
52+
"os.environ[\"ZENO_API_KEY\"] = \"your-zeno-api-key\""
53+
]
54+
},
55+
{
56+
"cell_type": "code",
57+
"execution_count": null,
58+
"metadata": {},
59+
"outputs": [],
60+
"source": [
61+
"fiqa_eval = load_dataset(\"explodinggradients/fiqa\", \"ragas_eval\")\n",
62+
"result = evaluate(\n",
63+
" fiqa_eval[\"baseline\"],\n",
64+
" metrics=[\n",
65+
" context_precision,\n",
66+
" faithfulness,\n",
67+
" answer_relevancy,\n",
68+
" context_recall,\n",
69+
" ],\n",
70+
")\n",
71+
"\n",
72+
"df = result.to_pandas()\n",
73+
"df.head()"
74+
]
75+
},
76+
{
77+
"cell_type": "markdown",
78+
"metadata": {},
79+
"source": [
80+
"We can now take the `df` with our data and results and upload it to Zeno.\n",
81+
"\n",
82+
"We first create a project with a custom RAG view specification and the metric columns we want to do evaluation across:"
83+
]
84+
},
85+
{
86+
"cell_type": "code",
87+
"execution_count": null,
88+
"metadata": {},
89+
"outputs": [],
90+
"source": [
91+
"client = ZenoClient(os.environ[\"ZENO_API_KEY\"])\n",
92+
"\n",
93+
"project = client.create_project(\n",
94+
" name=\"Ragas FICA eval\",\n",
95+
" description=\"Evaluation of RAG model using Ragas on the FICA dataset\",\n",
96+
" view={\n",
97+
" \"data\": {\n",
98+
" \"type\": \"vstack\",\n",
99+
" \"keys\": {\n",
100+
" \"question\": {\"type\": \"markdown\"},\n",
101+
" \"texts\": {\n",
102+
" \"type\": \"list\",\n",
103+
" \"elements\": {\"type\": \"markdown\"},\n",
104+
" \"border\": True,\n",
105+
" \"pad\": True,\n",
106+
" },\n",
107+
" },\n",
108+
" },\n",
109+
" \"label\": {\n",
110+
" \"type\": \"markdown\",\n",
111+
" },\n",
112+
" \"output\": {\n",
113+
" \"type\": \"vstack\",\n",
114+
" \"keys\": {\n",
115+
" \"answer\": {\"type\": \"markdown\"},\n",
116+
" \"ground_truths\": {\n",
117+
" \"type\": \"list\",\n",
118+
" \"elements\": {\"type\": \"markdown\"},\n",
119+
" \"border\": True,\n",
120+
" \"pad\": True,\n",
121+
" },\n",
122+
" },\n",
123+
" },\n",
124+
" \"size\": \"large\",\n",
125+
" },\n",
126+
" metrics=[\n",
127+
" ZenoMetric(\n",
128+
" name=\"context_precision\", type=\"mean\", columns=[\"context_precision\"]\n",
129+
" ),\n",
130+
" ZenoMetric(name=\"faithfulness\", type=\"mean\", columns=[\"faithfulness\"]),\n",
131+
" ZenoMetric(name=\"answer_relevancy\", type=\"mean\", columns=[\"answer_relevancy\"]),\n",
132+
" ZenoMetric(name=\"context_recall\", type=\"mean\", columns=[\"context_recall\"]),\n",
133+
" ],\n",
134+
")"
135+
]
136+
},
137+
{
138+
"cell_type": "markdown",
139+
"metadata": {},
140+
"source": [
141+
"Next, we upload the base dataset with the questions and ground truths:"
142+
]
143+
},
144+
{
145+
"cell_type": "code",
146+
"execution_count": null,
147+
"metadata": {},
148+
"outputs": [],
149+
"source": [
150+
"data_df = pd.DataFrame(\n",
151+
" {\n",
152+
" \"data\": df.apply(\n",
153+
" lambda x: {\"question\": x[\"question\"], \"texts\": list(x[\"contexts\"])}, axis=1\n",
154+
" ),\n",
155+
" \"label\": df[\"ground_truths\"].apply(lambda x: \"\\n\".join(x)),\n",
156+
" }\n",
157+
")\n",
158+
"data_df[\"id\"] = data_df.index\n",
159+
"\n",
160+
"project.upload_dataset(\n",
161+
" data_df, id_column=\"id\", data_column=\"data\", label_column=\"label\"\n",
162+
")"
163+
]
164+
},
165+
{
166+
"cell_type": "markdown",
167+
"metadata": {},
168+
"source": [
169+
"Lastly, we upload the RAG outputs and Ragas metrics. \n",
170+
"\n",
171+
"You can run this for any number of models when doing comparison and iteration:"
172+
]
173+
},
174+
{
175+
"cell_type": "code",
176+
"execution_count": null,
177+
"metadata": {},
178+
"outputs": [],
179+
"source": [
180+
"output_df = df[\n",
181+
" [\n",
182+
" \"context_precision\",\n",
183+
" \"faithfulness\",\n",
184+
" \"answer_relevancy\",\n",
185+
" \"context_recall\",\n",
186+
" ]\n",
187+
"].copy()\n",
188+
"\n",
189+
"output_df['output'] = df.apply(\n",
190+
" lambda x: {\"answer\": x[\"answer\"], \"ground_truths\": list(x[\"ground_truths\"])}, axis=1\n",
191+
")\n",
192+
"output_df[\"id\"] = output_df.index\n",
193+
"\n",
194+
"project.upload_system(\n",
195+
" output_df, name=\"Base System\", id_column=\"id\", output_column=\"output\"\n",
196+
")"
197+
]
198+
},
199+
{
200+
"cell_type": "markdown",
201+
"metadata": {},
202+
"source": [
203+
"Reach out to the Zeno team on [Discord](https://discord.gg/km62pDKAkE) or at [[email protected]](mailto:[email protected]) if you have any questions!"
204+
]
205+
}
206+
],
207+
"metadata": {
208+
"kernelspec": {
209+
"display_name": "zeno-build",
210+
"language": "python",
211+
"name": "python3"
212+
},
213+
"language_info": {
214+
"codemirror_mode": {
215+
"name": "ipython",
216+
"version": 3
217+
},
218+
"file_extension": ".py",
219+
"mimetype": "text/x-python",
220+
"name": "python",
221+
"nbconvert_exporter": "python",
222+
"pygments_lexer": "ipython3",
223+
"version": "3.10.13"
224+
}
225+
},
226+
"nbformat": 4,
227+
"nbformat_minor": 2
228+
}

0 commit comments

Comments
 (0)