Skip to content

Commit 9444617

Browse files
authored
docs: notebook guide for custom llm (#68)
1 parent b5770f0 commit 9444617

File tree

1 file changed

+202
-0
lines changed

1 file changed

+202
-0
lines changed

docs/guides/llms.ipynb

Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "0174eb96",
6+
"metadata": {},
7+
"source": [
8+
"# Bring your own LLMs\n",
9+
"\n",
10+
"Ragas uses langchain under the hood for connecting to LLMs for metrices that require them. This means you can swap out the default LLM we use (`gpt-3.5-turbo-16k`) to use any 100s of API supported out of the box with langchain.\n",
11+
"\n",
12+
"- [Completion LLMs Supported](https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.llms)\n",
13+
"- [Chat based LLMs Supported](https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.chat_models)\n",
14+
"\n",
15+
"This guide will show you how to use another or LLM API for evaluation."
16+
]
17+
},
18+
{
19+
"cell_type": "markdown",
20+
"id": "55f0f9b9",
21+
"metadata": {},
22+
"source": [
23+
"## Evaluating with GPT4\n",
24+
"\n",
25+
"Ragas uses gpt3.5 by default but using gpt4 for evaluation can improve the results so lets use that for the `Faithfulness` metric\n",
26+
"\n",
27+
"To start-off, we initialise the gpt4 `chat_model` from langchain"
28+
]
29+
},
30+
{
31+
"cell_type": "code",
32+
"execution_count": 5,
33+
"id": "a6d96660",
34+
"metadata": {},
35+
"outputs": [],
36+
"source": [
37+
"# make sure you have you OpenAI API key ready\n",
38+
"import os\n",
39+
"\n",
40+
"os.environ[\"OPENAI_API_KEY\"] = \"your-openai-key\""
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": 1,
46+
"id": "6906a4d6",
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"from langchain.chat_models import ChatOpenAI\n",
51+
"\n",
52+
"gpt4 = ChatOpenAI(model_name=\"gpt-4\")"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"id": "f1fdb48b",
58+
"metadata": {},
59+
"source": [
60+
"Now initialise `Faithfulness` with `gpt4`"
61+
]
62+
},
63+
{
64+
"cell_type": "code",
65+
"execution_count": 9,
66+
"id": "307321ed",
67+
"metadata": {},
68+
"outputs": [],
69+
"source": [
70+
"from ragas.metrics import Faithfulness\n",
71+
"\n",
72+
"faithfulness_gpt4 = Faithfulness(\n",
73+
" name=\"faithfulness_gpt4\", llm=gpt4, batch_size=3\n",
74+
")"
75+
]
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"id": "1930dd49",
80+
"metadata": {},
81+
"source": [
82+
"That's it!\n",
83+
"\n",
84+
"Now lets run the evaluations using the example from [quickstart](../quickstart.ipnb)."
85+
]
86+
},
87+
{
88+
"cell_type": "code",
89+
"execution_count": 6,
90+
"id": "62c0eadb",
91+
"metadata": {},
92+
"outputs": [
93+
{
94+
"name": "stderr",
95+
"output_type": "stream",
96+
"text": [
97+
"Found cached dataset fiqa (/home/jjmachan/.cache/huggingface/datasets/explodinggradients___fiqa/ragas_eval/1.0.0/3dc7b639f5b4b16509a3299a2ceb78bf5fe98ee6b5fee25e7d5e4d290c88efb8)\n"
98+
]
99+
},
100+
{
101+
"data": {
102+
"application/vnd.jupyter.widget-view+json": {
103+
"model_id": "c55f09ffe1094e6190c255c09c0eb141",
104+
"version_major": 2,
105+
"version_minor": 0
106+
},
107+
"text/plain": [
108+
" 0%| | 0/1 [00:00<?, ?it/s]"
109+
]
110+
},
111+
"metadata": {},
112+
"output_type": "display_data"
113+
},
114+
{
115+
"data": {
116+
"text/plain": [
117+
"DatasetDict({\n",
118+
" baseline: Dataset({\n",
119+
" features: ['question', 'ground_truths', 'answer', 'contexts'],\n",
120+
" num_rows: 30\n",
121+
" })\n",
122+
"})"
123+
]
124+
},
125+
"execution_count": 6,
126+
"metadata": {},
127+
"output_type": "execute_result"
128+
}
129+
],
130+
"source": [
131+
"# data\n",
132+
"from datasets import load_dataset\n",
133+
"\n",
134+
"fiqa_eval = load_dataset(\"explodinggradients/fiqa\", \"ragas_eval\")\n",
135+
"fiqa_eval"
136+
]
137+
},
138+
{
139+
"cell_type": "code",
140+
"execution_count": 10,
141+
"id": "c4396f6e",
142+
"metadata": {},
143+
"outputs": [
144+
{
145+
"name": "stdout",
146+
"output_type": "stream",
147+
"text": [
148+
"evaluating with [faithfulness_gpt4]\n"
149+
]
150+
},
151+
{
152+
"name": "stderr",
153+
"output_type": "stream",
154+
"text": [
155+
"100%|██████████████████████████████████████████████████████████| 10/10 [15:38<00:00, 93.84s/it]\n"
156+
]
157+
},
158+
{
159+
"data": {
160+
"text/plain": [
161+
"{'faithfulness_gpt4': 0.6594}"
162+
]
163+
},
164+
"execution_count": 10,
165+
"metadata": {},
166+
"output_type": "execute_result"
167+
}
168+
],
169+
"source": [
170+
"# evaluate\n",
171+
"from ragas import evaluate\n",
172+
"\n",
173+
"result = evaluate(\n",
174+
" fiqa_eval[\"baseline\"], metrics=[faithfulness_gpt4]\n",
175+
")\n",
176+
"\n",
177+
"result"
178+
]
179+
}
180+
],
181+
"metadata": {
182+
"kernelspec": {
183+
"display_name": "Python 3 (ipykernel)",
184+
"language": "python",
185+
"name": "python3"
186+
},
187+
"language_info": {
188+
"codemirror_mode": {
189+
"name": "ipython",
190+
"version": 3
191+
},
192+
"file_extension": ".py",
193+
"mimetype": "text/x-python",
194+
"name": "python",
195+
"nbconvert_exporter": "python",
196+
"pygments_lexer": "ipython3",
197+
"version": "3.10.12"
198+
}
199+
},
200+
"nbformat": 4,
201+
"nbformat_minor": 5
202+
}

0 commit comments

Comments
 (0)