|
60 | 60 | "\n", |
61 | 61 | "Ragas performs a `ground_truth` free evaluation of your RAG pipelines. This is because for most people building a gold labeled dataset which represents in the distribution they get in production is a very expensive process.\n", |
62 | 62 | "\n", |
| 63 | + "**Note:** *While originially ragas was aimed at `ground_truth` free evalutions there is some aspects of the RAG pipeline that need `ground_truth` in order to measure. We're in the process of building a testset generation features that will make it easier. Checkout [issue#136](https://github.com/explodinggradients/ragas/issues/136) for more details.*\n", |
| 64 | + "\n", |
63 | 65 | "Hence to work with ragas all you need are the following data\n", |
64 | 66 | "- question: `list[str]` - These are the questions you RAG pipeline will be evaluated on. \n", |
65 | 67 | "- answer: `list[str]` - The answer generated from the RAG pipeline and give to the user.\n", |
|
73 | 75 | }, |
74 | 76 | { |
75 | 77 | "cell_type": "code", |
76 | | - "execution_count": 8, |
| 78 | + "execution_count": 1, |
77 | 79 | "id": "b658e02f", |
78 | 80 | "metadata": {}, |
79 | 81 | "outputs": [ |
|
87 | 89 | { |
88 | 90 | "data": { |
89 | 91 | "application/vnd.jupyter.widget-view+json": { |
90 | | - "model_id": "e481f1b6ae824149aaf5afe96330fda3", |
| 92 | + "model_id": "a2dfebb012dd4b79b3a6ed951ce0d406", |
91 | 93 | "version_major": 2, |
92 | 94 | "version_minor": 0 |
93 | 95 | }, |
|
109 | 111 | "})" |
110 | 112 | ] |
111 | 113 | }, |
112 | | - "execution_count": 8, |
| 114 | + "execution_count": 1, |
113 | 115 | "metadata": {}, |
114 | 116 | "output_type": "execute_result" |
115 | 117 | } |
|
141 | 143 | }, |
142 | 144 | { |
143 | 145 | "cell_type": "code", |
144 | | - "execution_count": 9, |
| 146 | + "execution_count": 3, |
145 | 147 | "id": "f17bcf9d", |
146 | 148 | "metadata": {}, |
147 | 149 | "outputs": [], |
|
185 | 187 | }, |
186 | 188 | { |
187 | 189 | "cell_type": "code", |
188 | | - "execution_count": 10, |
| 190 | + "execution_count": null, |
189 | 191 | "id": "22eb6f97", |
190 | 192 | "metadata": {}, |
191 | 193 | "outputs": [ |
|
200 | 202 | "name": "stderr", |
201 | 203 | "output_type": "stream", |
202 | 204 | "text": [ |
203 | | - "100%|█████████████████████████████████████████████████████████████| 1/1 [00:06<00:00, 6.57s/it]\n" |
| 205 | + "100%|████████████████████████████████████████████████████████████| 2/2 [04:08<00:00, 124.31s/it]\n" |
204 | 206 | ] |
205 | 207 | }, |
206 | 208 | { |
|
214 | 216 | "name": "stderr", |
215 | 217 | "output_type": "stream", |
216 | 218 | "text": [ |
217 | | - "100%|█████████████████████████████████████████████████████████████| 1/1 [00:28<00:00, 28.82s/it]\n" |
| 219 | + "100%|████████████████████████████████████████████████████████████| 2/2 [06:29<00:00, 194.60s/it]\n" |
218 | 220 | ] |
219 | 221 | }, |
220 | 222 | { |
|
228 | 230 | "name": "stderr", |
229 | 231 | "output_type": "stream", |
230 | 232 | "text": [ |
231 | | - "100%|█████████████████████████████████████████████████████████████| 1/1 [00:07<00:00, 7.53s/it]\n" |
| 233 | + "100%|█████████████████████████████████████████████████████████████| 2/2 [01:16<00:00, 38.12s/it]\n" |
232 | 234 | ] |
233 | 235 | }, |
234 | 236 | { |
|
242 | 244 | "name": "stderr", |
243 | 245 | "output_type": "stream", |
244 | 246 | "text": [ |
245 | | - "100%|█████████████████████████████████████████████████████████████| 1/1 [00:24<00:00, 24.13s/it]\n" |
| 247 | + "100%|████████████████████████████████████████████████████████████| 2/2 [07:53<00:00, 236.95s/it]\n" |
246 | 248 | ] |
247 | 249 | }, |
248 | 250 | { |
|
256 | 258 | "name": "stderr", |
257 | 259 | "output_type": "stream", |
258 | 260 | "text": [ |
259 | | - "100%|█████████████████████████████████████████████████████████████| 1/1 [00:07<00:00, 7.31s/it]\n" |
| 261 | + " 50%|██████████████████████████████▌ | 1/2 [00:46<00:46, 46.32s/it]" |
260 | 262 | ] |
261 | | - }, |
262 | | - { |
263 | | - "data": { |
264 | | - "text/plain": [ |
265 | | - "{'ragas_score': 0.3482, 'context_relevancy': 0.1296, 'faithfulness': 0.8889, 'answer_relevancy': 0.9285, 'context_recall': 0.6370, 'harmfulness': 0.0000}" |
266 | | - ] |
267 | | - }, |
268 | | - "execution_count": 10, |
269 | | - "metadata": {}, |
270 | | - "output_type": "execute_result" |
271 | 263 | } |
272 | 264 | ], |
273 | 265 | "source": [ |
274 | 266 | "from ragas import evaluate\n", |
275 | 267 | "\n", |
276 | 268 | "result = evaluate(\n", |
277 | | - " fiqa_eval[\"baseline\"].select(range(3)),\n", |
| 269 | + " fiqa_eval[\"baseline\"],\n", |
278 | 270 | " metrics=[\n", |
279 | 271 | " context_relevancy,\n", |
280 | 272 | " faithfulness,\n", |
|
454 | 446 | "source": [ |
455 | 447 | "And thats it!\n", |
456 | 448 | "\n", |
457 | | - "You can check out the [ragas in action] notebook to get a feel of what is like to use it while trying to improve your pipelines.\n", |
458 | | - "\n", |
459 | 449 | "if you have any suggestion/feedbacks/things your not happy about, please do share it in the [issue section](https://github.com/explodinggradients/ragas/issues). We love hearing from you 😁" |
460 | 450 | ] |
461 | 451 | } |
|
0 commit comments