Skip to content

Commit 5426bc7

Browse files
authored
docs: add cost tracker back (#1653)
1 parent 1d170d7 commit 5426bc7

File tree

2 files changed

+325
-0
lines changed

2 files changed

+325
-0
lines changed
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Understand Cost and Usage of Operations
2+
3+
When using LLMs for evaluation and test set generation, cost will be an important factor. Ragas provides you some tools to help you with that.
4+
5+
## Understanding `TokenUsageParser`
6+
7+
By default Ragas does not calculate the usage of tokens for `evaluate()`. This is because langchain's LLMs do not always return information about token usage in a uniform way. So in order to get the usage data, we have to implement a `TokenUsageParser`.
8+
9+
A `TokenUsageParser` is function that parses the `LLMResult` or `ChatResult` from langchain models `generate_prompt()` function and outputs `TokenUsage` which Ragas expects.
10+
11+
For an example here is one that will parse OpenAI by using a parser we have defined.
12+
13+
14+
```python
15+
import os
16+
os.environ["OPENAI_API_KEY"] = "your-api-key"
17+
```
18+
19+
20+
```python
21+
from langchain_openai.chat_models import ChatOpenAI
22+
from langchain_core.prompt_values import StringPromptValue
23+
24+
gpt4o = ChatOpenAI(model="gpt-4o")
25+
p = StringPromptValue(text="hai there")
26+
llm_result = gpt4o.generate_prompt([p])
27+
28+
# lets import a parser for OpenAI
29+
from ragas.cost import get_token_usage_for_openai
30+
31+
get_token_usage_for_openai(llm_result)
32+
```
33+
34+
/opt/homebrew/Caskroom/miniforge/base/envs/ragas/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
35+
from .autonotebook import tqdm as notebook_tqdm
36+
37+
38+
39+
40+
41+
TokenUsage(input_tokens=9, output_tokens=9, model='')
42+
43+
44+
45+
You can define your own or import parsers if they are defined. If you would like to suggest parser for LLM providers or contribute your own ones please check out this [issue](https://github.com/explodinggradients/ragas/issues/1151) 🙂.
46+
47+
You can use it for evaluations as so. Using example from [get started](get-started-evaluation) here.
48+
49+
50+
```python
51+
from datasets import load_dataset
52+
from ragas import EvaluationDataset
53+
from ragas.metrics._aspect_critic import AspectCriticWithReference
54+
55+
dataset = load_dataset("explodinggradients/amnesty_qa", "english_v3")
56+
57+
58+
eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])
59+
60+
metric = AspectCriticWithReference(
61+
name="answer_correctness",
62+
definition="is the response correct compared to reference",
63+
)
64+
65+
66+
```
67+
68+
Repo card metadata block was not found. Setting CardData to empty.
69+
70+
71+
72+
```python
73+
from ragas import evaluate
74+
from ragas.cost import get_token_usage_for_openai
75+
76+
results = evaluate(eval_dataset[:5], metrics=[metric], llm=gpt4o,
77+
token_usage_parser=get_token_usage_for_openai,)
78+
```
79+
80+
Evaluating: 100%|██████████| 5/5 [00:01<00:00, 2.81it/s]
81+
82+
83+
84+
```python
85+
results.total_tokens()
86+
```
87+
88+
89+
90+
91+
TokenUsage(input_tokens=5463, output_tokens=355, model='')
92+
93+
94+
95+
You can compute the cost for each run by passing in the cost per token to `Result.total_cost()` function.
96+
97+
In this case GPT-4o costs $5 for 1M input tokens and $15 for 1M output tokens.
98+
99+
100+
```python
101+
results.total_cost(cost_per_input_token=5 / 1e6, cost_per_output_token=15 / 1e6)
102+
```
103+
104+
105+
106+
107+
0.03264
108+
109+
110+
111+
112+
```python
113+
114+
```
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Understand Cost and Usage of Operations\n",
8+
"\n",
9+
"When using LLMs for evaluation and test set generation, cost will be an important factor. Ragas provides you some tools to help you with that."
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"## Understanding `TokenUsageParser`\n",
17+
"\n",
18+
"By default Ragas does not calculate the usage of tokens for `evaluate()`. This is because langchain's LLMs do not always return information about token usage in a uniform way. So in order to get the usage data, we have to implement a `TokenUsageParser`. \n",
19+
"\n",
20+
"A `TokenUsageParser` is function that parses the `LLMResult` or `ChatResult` from langchain models `generate_prompt()` function and outputs `TokenUsage` which Ragas expects.\n",
21+
"\n",
22+
"For an example here is one that will parse OpenAI by using a parser we have defined."
23+
]
24+
},
25+
{
26+
"cell_type": "code",
27+
"execution_count": 4,
28+
"metadata": {},
29+
"outputs": [],
30+
"source": [
31+
"import os\n",
32+
"os.environ[\"OPENAI_API_KEY\"] = \"your-api-key\""
33+
]
34+
},
35+
{
36+
"cell_type": "code",
37+
"execution_count": 5,
38+
"metadata": {},
39+
"outputs": [
40+
{
41+
"name": "stderr",
42+
"output_type": "stream",
43+
"text": [
44+
"/opt/homebrew/Caskroom/miniforge/base/envs/ragas/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
45+
" from .autonotebook import tqdm as notebook_tqdm\n"
46+
]
47+
},
48+
{
49+
"data": {
50+
"text/plain": [
51+
"TokenUsage(input_tokens=9, output_tokens=9, model='')"
52+
]
53+
},
54+
"execution_count": 5,
55+
"metadata": {},
56+
"output_type": "execute_result"
57+
}
58+
],
59+
"source": [
60+
"from langchain_openai.chat_models import ChatOpenAI\n",
61+
"from langchain_core.prompt_values import StringPromptValue\n",
62+
"\n",
63+
"gpt4o = ChatOpenAI(model=\"gpt-4o\")\n",
64+
"p = StringPromptValue(text=\"hai there\")\n",
65+
"llm_result = gpt4o.generate_prompt([p])\n",
66+
"\n",
67+
"# lets import a parser for OpenAI\n",
68+
"from ragas.cost import get_token_usage_for_openai\n",
69+
"\n",
70+
"get_token_usage_for_openai(llm_result)"
71+
]
72+
},
73+
{
74+
"cell_type": "markdown",
75+
"metadata": {},
76+
"source": [
77+
"You can define your own or import parsers if they are defined. If you would like to suggest parser for LLM providers or contribute your own ones please check out this [issue](https://github.com/explodinggradients/ragas/issues/1151) 🙂.\n",
78+
"\n",
79+
"You can use it for evaluations as so. Using example from [get started](get-started-evaluation) here."
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": 6,
85+
"metadata": {},
86+
"outputs": [
87+
{
88+
"name": "stderr",
89+
"output_type": "stream",
90+
"text": [
91+
"Repo card metadata block was not found. Setting CardData to empty.\n"
92+
]
93+
}
94+
],
95+
"source": [
96+
"from datasets import load_dataset\n",
97+
"from ragas import EvaluationDataset\n",
98+
"from ragas.metrics._aspect_critic import AspectCriticWithReference\n",
99+
"\n",
100+
"dataset = load_dataset(\"explodinggradients/amnesty_qa\", \"english_v3\")\n",
101+
"\n",
102+
"\n",
103+
"eval_dataset = EvaluationDataset.from_hf_dataset(dataset[\"eval\"])\n",
104+
"\n",
105+
"metric = AspectCriticWithReference(\n",
106+
" name=\"answer_correctness\",\n",
107+
" definition=\"is the response correct compared to reference\",\n",
108+
")\n",
109+
"\n"
110+
]
111+
},
112+
{
113+
"cell_type": "code",
114+
"execution_count": 7,
115+
"metadata": {},
116+
"outputs": [
117+
{
118+
"name": "stderr",
119+
"output_type": "stream",
120+
"text": [
121+
"Evaluating: 100%|██████████| 5/5 [00:01<00:00, 2.81it/s]\n"
122+
]
123+
}
124+
],
125+
"source": [
126+
"from ragas import evaluate\n",
127+
"from ragas.cost import get_token_usage_for_openai\n",
128+
"\n",
129+
"results = evaluate(eval_dataset[:5], metrics=[metric], llm=gpt4o,\n",
130+
" token_usage_parser=get_token_usage_for_openai,)"
131+
]
132+
},
133+
{
134+
"cell_type": "code",
135+
"execution_count": 9,
136+
"metadata": {},
137+
"outputs": [
138+
{
139+
"data": {
140+
"text/plain": [
141+
"TokenUsage(input_tokens=5463, output_tokens=355, model='')"
142+
]
143+
},
144+
"execution_count": 9,
145+
"metadata": {},
146+
"output_type": "execute_result"
147+
}
148+
],
149+
"source": [
150+
"results.total_tokens()"
151+
]
152+
},
153+
{
154+
"cell_type": "markdown",
155+
"metadata": {},
156+
"source": [
157+
"You can compute the cost for each run by passing in the cost per token to `Result.total_cost()` function.\n",
158+
"\n",
159+
"In this case GPT-4o costs $5 for 1M input tokens and $15 for 1M output tokens."
160+
]
161+
},
162+
{
163+
"cell_type": "code",
164+
"execution_count": 10,
165+
"metadata": {},
166+
"outputs": [
167+
{
168+
"data": {
169+
"text/plain": [
170+
"0.03264"
171+
]
172+
},
173+
"execution_count": 10,
174+
"metadata": {},
175+
"output_type": "execute_result"
176+
}
177+
],
178+
"source": [
179+
"results.total_cost(cost_per_input_token=5 / 1e6, cost_per_output_token=15 / 1e6)"
180+
]
181+
},
182+
{
183+
"cell_type": "code",
184+
"execution_count": null,
185+
"metadata": {},
186+
"outputs": [],
187+
"source": []
188+
}
189+
],
190+
"metadata": {
191+
"kernelspec": {
192+
"display_name": "Python 3",
193+
"language": "python",
194+
"name": "python3"
195+
},
196+
"language_info": {
197+
"codemirror_mode": {
198+
"name": "ipython",
199+
"version": 3
200+
},
201+
"file_extension": ".py",
202+
"mimetype": "text/x-python",
203+
"name": "python",
204+
"nbconvert_exporter": "python",
205+
"pygments_lexer": "ipython3",
206+
"version": "3.9.20"
207+
}
208+
},
209+
"nbformat": 4,
210+
"nbformat_minor": 2
211+
}

0 commit comments

Comments
 (0)