Skip to content

Commit a288656

Browse files
mrtjshahules786jokokojotesmodlichSebastian Modlich
authored
More robust JSON prompting and parsing (#807)
This PR implements a langchain-style, hopefully more robust output parsing as discussed in #761 . --------- Co-authored-by: Shahules786 <[email protected]> Co-authored-by: Felix Rothe <[email protected]> Co-authored-by: Sebastian Modlich <[email protected]> Co-authored-by: Sebastian Modlich <[email protected]> Co-authored-by: Wang Jian <[email protected]> Co-authored-by: Daniel Camejo <[email protected]>
1 parent 036ac97 commit a288656

25 files changed

+956
-732
lines changed
-50.9 KB
Loading

docs/concepts/prompts.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Prompts play a crucial role in any language model-based framework and warrant more consideration than mere strings. A well-crafted prompt should include a clear task instruction, articulated in straightforward natural language, comprehensible to any language model. The objective is to compose prompts that are generalizable and do not overly specialize to a specific state of the language model. It's widely recognized that language models exhibit higher accuracy in few-shot scenarios as opposed to zero-shot contexts. To capitalize on this advantage, it is advisable to accompany each prompt with relevant examples.
44

5-
Prompts in ragas are defined using the `Prompt` class. Each prompt defined using this class will contain.
5+
Prompts in ragas are defined using the `Prompt` class. Each prompt defined using this class will contain.
66

77
- `name`: a name given to the prompt. Used to save and identify the prompt.
88
- `instruction`: The natural language description of the task to be carried out by the LLM
@@ -33,7 +33,7 @@ qa_prompt = Prompt(
3333
],
3434
input_keys=["answer", "context"],
3535
output_key="output",
36-
output_type="JSON",
36+
output_type="json",
3737
)
3838
```
3939

@@ -46,64 +46,64 @@ This will create a Prompt class object with the given instruction, examples, and
4646
Prompt objects have the following methods that can be used when evaluating or formatting a prompt object.
4747

4848
- `to_string(self)`
49-
49+
5050
This method will generate a prompt string from the given object. This string can be directly used as a formatted string with the metrics in the evaluation task.
51-
51+
5252
```{code-block} python
5353
print(qa_prompt.to_string())
5454
```
55-
55+
5656
```
5757
Generate a question for the given answer
58-
58+
5959
answer: "The last Olympics was held in Tokyo, Japan."
6060
context: "The last Olympics was held in Tokyo, Japan. It is held every 4 years"
6161
output: {{"question": "Where was the last Olympics held?"}}
62-
62+
6363
answer: "It can change its skin color based on the temperature of its environment."
6464
context: "A recent scientific study has discovered a new species of frog in the Amazon rainforest that has the unique ability to change its skin color based on the temperature of its environment."
6565
output: {{"question": "What unique ability does the newly discovered species of frog have?"}}
66-
66+
6767
answer: {answer}
6868
context: {context}
6969
output:
7070
```
71-
71+
7272
- `format(self, **kwargs)`
73-
73+
7474
This method will use the parameters passed as keyword arguments to format the prompt object and return a Langchain `PromptValue` object that can be directly used in the evaluation tasks.
75-
75+
7676
```{code-block} python
7777
qa_prompt.format(answer="This is an answer", context="This is a context")
7878
```
79-
79+
8080
```{code-block} python
8181
PromptValue(prompt_str='Generate a question for the given answer\n\nanswer: "The last Olympics was held in Tokyo, Japan."\ncontext: "The last Olympics was held in Tokyo, Japan. It is held every 4 years"\noutput: {"question": "Where was the last Olympics held?"}\n\nanswer: "It can change its skin color based on the temperature of its environment."\ncontext: "A recent scientific study has discovered a new species of frog in the Amazon rainforest that has the unique ability to change its skin color based on the temperature of its environment."\noutput: {"question": "What unique ability does the newly discovered species of frog have?"}\n\nanswer: This is an answer\ncontext: This is a context\noutput: \n')
82-
82+
8383
```
84-
84+
8585
8686
- `save(self, cache_dir)`
87-
88-
This method will save the prompt to the given cache_dir (default `~/.cache`) directory using the value in the `name` variable.
89-
87+
88+
This method will save the prompt to the given cache_dir (default `~/.cache`) directory using the value in the `name` variable.
89+
9090
```{code-block} python
9191
qa_prompt.save()
9292
```
93-
93+
9494
The prompts are saved in JSON format to `~/.cache/ragas` by default. One can change this by setting the `RAGAS_CACHE_HOME` environment variable to the desired path. In this example, the prompt will be saved in `~/.cache/ragas/english/question_generation.json`
95-
95+
9696
- `_load(self, language, name, cache_dir)`
97-
98-
This method will load the appropriate prompt from the saved directory.
99-
97+
98+
This method will load the appropriate prompt from the saved directory.
99+
100100
```{code-block} python
101101
from ragas.utils import RAGAS_CACHE_HOME
102102
Prompt._load(name="question_generation",language="english",cache_dir=RAGAS_CACHE_HOME)
103103
```
104-
104+
105105
```{code-block} python
106106
Prompt(name='question_generation', instruction='Generate a question for the given answer', examples=[{'answer': 'The last Olympics was held in Tokyo, Japan.', 'context': 'The last Olympics was held in Tokyo, Japan. It is held every 4 years', 'output': {'question': 'Where was the last Olympics held?'}}, {'answer': 'It can change its skin color based on the temperature of its environment.', 'context': 'A recent scientific study has discovered a new species of frog in the Amazon rainforest that has the unique ability to change its skin color based on the temperature of its environment.', 'output': {'question': 'What unique ability does the newly discovered species of frog have?'}}], input_keys=['answer', 'context'], output_key='output', output_type='JSON')
107107
```
108-
108+
109109
The prompt was loaded from `.cache/ragas/english/question_generation.json`

docs/getstarted/evaluation.md

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,6 @@
33

44
Once your test set is ready (whether you've created your own or used the [synthetic test set generation module](get-started-testset-generation)), it's time to evaluate your RAG pipeline. This guide assists you in setting up Ragas as quickly as possible, enabling you to focus on enhancing your Retrieval Augmented Generation pipelines while this library ensures that your modifications are improving the entire pipeline.
55

6-
<p align="left">
7-
<img src="../_static/imgs/ragas_workflow_white.png" alt="test-outputs" width="800" height="600" />
8-
</p>
9-
106
This guide utilizes OpenAI for running some metrics, so ensure you have your OpenAI key ready and available in your environment.
117

128
```python
@@ -21,26 +17,26 @@ Let's begin with the data.
2117

2218
## The Data
2319

24-
For this tutorial, we'll use an example dataset that we created using example in [data preparation](./prepare_data.ipynb). The dataset contains the following columns:
20+
For this tutorial, we'll use an example dataset from one of the baselines we created for the [Amnesty QA](https://huggingface.co/datasets/explodinggradients/amnesty_qa) dataset. The dataset contains the following columns:
2521

2622
- question: `list[str]` - These are the questions your RAG pipeline will be evaluated on.
2723
- context: `list[list[str]]` - The contexts which were passed into the LLM to answer the question.
28-
- ground_truth: `str` - The ground truth answer to the questions.
29-
- answer: `str` - The answer generated by the RAG pipeline.
24+
- ground_truth: `list[str]` - The ground truth answer to the questions.
3025

3126
An ideal test data set should contain samples that closely mirror your real-world use case.
3227

3328
```{code-block} python
3429
:caption: import sample dataset
3530
from datasets import load_dataset
3631
37-
dataset = load_dataset("explodinggradients/prompt-engineering-guide-papers","test_data")
38-
dataset["test"]
32+
# loading the V2 dataset
33+
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
34+
amnesty_qa
3935
```
4036

4137
:::{seealso}
42-
See [test set generation](./testset_generation.md) to learn how to generate your own `Question/Ground_Truth` pairs for evaluation.
43-
See [dataset preparation](./prepare_data.ipynb) to learn how to prepare your own dataset for evaluation.
38+
See [test set generation](./testset_generation.md) to learn how to generate your own `Question/Context/Ground_Truth` triplets for evaluation.
39+
See [preparing your own dataset](/docs/howtos/applications/data_preparation.md) to learn how to prepare your own dataset for evaluation.
4440
:::
4541

4642
## Metrics
@@ -81,7 +77,7 @@ Running the evaluation is as simple as calling `evaluate` on the `Dataset` with
8177
from ragas import evaluate
8278
8379
result = evaluate(
84-
dataset["eval"],
80+
amnesty_qa["eval"],
8581
metrics=[
8682
context_precision,
8783
faithfulness,

docs/getstarted/index.md

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66
:hidden:
77
install.md
88
testset_generation.md
9-
prepare_data.md
109
evaluation.md
1110
monitoring.md
1211
:::
@@ -27,15 +26,7 @@ Let's get started!
2726
:link: get-started-testset-generation
2827
:link-type: ref
2928

30-
Learn how to generate high quality and diverse `Question/Ground_Truth` pairs to get started.
31-
:::
32-
33-
:::{card} Prepare data for evaluation
34-
:link: dataset-preparation
35-
:link-type: ref
36-
37-
Learn how to prepare a complete test dataset for evaluating using ragas metrics.
38-
29+
Learn how to generate `Question/Context/Ground_Truth` triplets to get started.
3930
:::
4031

4132
:::{card} Evaluate Using Your Testset

docs/getstarted/prepare_data.ipynb

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@
182182
],
183183
"source": [
184184
"eval_dataset = load_dataset(\"explodinggradients/prompt-engineering-guide-papers\")\n",
185-
"eval_dataset = eval_dataset['test'].to_pandas()\n",
185+
"eval_dataset = eval_dataset[\"test\"].to_pandas()\n",
186186
"eval_dataset.head()"
187187
]
188188
},
@@ -244,6 +244,7 @@
244244
"outputs": [],
245245
"source": [
246246
"import os\n",
247+
"\n",
247248
"PATH = \"./prompt-engineering-guide-papers\"\n",
248249
"os.environ[\"OPENAI_API_KEY\"] = \"your-open-ai-key\""
249250
]
@@ -266,30 +267,32 @@
266267
"\n",
267268
"def build_query_engine(documents):\n",
268269
" vector_index = VectorStoreIndex.from_documents(\n",
269-
" documents, service_context=ServiceContext.from_defaults(chunk_size=512),\n",
270+
" documents,\n",
271+
" service_context=ServiceContext.from_defaults(chunk_size=512),\n",
270272
" )\n",
271273
"\n",
272274
" query_engine = vector_index.as_query_engine(similarity_top_k=3)\n",
273275
" return query_engine\n",
274276
"\n",
277+
"\n",
275278
"# Function to evaluate as Llama index does not support async evaluation for HFInference API\n",
276279
"def generate_responses(query_engine, test_questions, test_answers):\n",
277-
" responses = [query_engine.query(q) for q in test_questions]\n",
280+
" responses = [query_engine.query(q) for q in test_questions]\n",
278281
"\n",
279-
" answers = []\n",
280-
" contexts = []\n",
281-
" for r in responses:\n",
282-
" answers.append(r.response)\n",
283-
" contexts.append([c.node.get_content() for c in r.source_nodes])\n",
284-
" dataset_dict = {\n",
282+
" answers = []\n",
283+
" contexts = []\n",
284+
" for r in responses:\n",
285+
" answers.append(r.response)\n",
286+
" contexts.append([c.node.get_content() for c in r.source_nodes])\n",
287+
" dataset_dict = {\n",
285288
" \"question\": test_questions,\n",
286289
" \"answer\": answers,\n",
287290
" \"contexts\": contexts,\n",
288-
" }\n",
289-
" if test_answers is not None:\n",
290-
" dataset_dict[\"ground_truth\"] = test_answers\n",
291-
" ds = Dataset.from_dict(dataset_dict)\n",
292-
" return ds"
291+
" }\n",
292+
" if test_answers is not None:\n",
293+
" dataset_dict[\"ground_truth\"] = test_answers\n",
294+
" ds = Dataset.from_dict(dataset_dict)\n",
295+
" return ds"
293296
]
294297
},
295298
{
@@ -299,8 +302,8 @@
299302
"metadata": {},
300303
"outputs": [],
301304
"source": [
302-
"reader = SimpleDirectoryReader(PATH,num_files_limit=30, required_exts=[\".pdf\"])\n",
303-
"documents = reader.load_data()\n"
305+
"reader = SimpleDirectoryReader(PATH, num_files_limit=30, required_exts=[\".pdf\"])\n",
306+
"documents = reader.load_data()"
304307
]
305308
},
306309
{
@@ -310,8 +313,8 @@
310313
"metadata": {},
311314
"outputs": [],
312315
"source": [
313-
"test_questions = eval_dataset['question'].values.tolist()\n",
314-
"test_answers = eval_dataset['ground_truth'].values.tolist()"
316+
"test_questions = eval_dataset[\"question\"].values.tolist()\n",
317+
"test_answers = eval_dataset[\"ground_truth\"].values.tolist()"
315318
]
316319
},
317320
{

docs/getstarted/testset_generation.md

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,7 @@
11
(get-started-testset-generation)=
2-
# Generate synthetic test data
3-
The first roadblock to evaluating your RAG system is the lack of a test set. This tutorial guides you in creating a synthetic Question and Ground truth pairs for assessing your RAG pipeline. The key idea here that to synthesize a test set, we need to generate a set of questions and their corresponding ground truths. The ground truths are the expected answers to the questions.
4-
5-
6-
Once we have the Question/Ground truth pairs we can feed questions into your RAG to get the contexts and answers. We can then evaluate the RAG pipeline using any metrics of your choice.
7-
8-
For this purpose, we will utilize OpenAI models. Ensure that your OpenAI API key is readily accessible within your environment.
2+
# Generate a Synthetic Test Set
93

4+
This tutorial guides you in creating a synthetic evaluation dataset for assessing your RAG pipeline. For this purpose, we will utilize OpenAI models. Ensure that your OpenAI API key is readily accessible within your environment.
105

116
```{code-block} python
127
import os
@@ -71,9 +66,4 @@ testset.to_pandas()
7166
```
7267
<p align="left">
7368
<img src="../_static/imgs/testset_output.png" alt="test-outputs" width="800" height="600" />
74-
</p>
75-
76-
Now you have a synthetic test set ready for evaluation, which contains `question` and `ground_truth`.
77-
78-
Next you can input these into your RAG to collect `contexts` and `answers` and evaluate your RAG pipeline using any metrics of your choice. Let's do a simple example using llama-index here. Check the [prepare your evaluation data](./prepare_data.ipynb) to know how.
79-
69+
</p>

docs/howtos/applications/custom_prompts.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This is a tutorial notebook that shows how to create and use custom prompts with
44

55
**Dataset**
66

7-
Here I’m using a dataset from HuggingFace.
7+
Here I’m using a dataset from HuggingFace.
88

99
```{code-block} python
1010
@@ -53,7 +53,7 @@ long_form_answer_prompt_new = Prompt(
5353
],
5454
input_keys=["question", "answer"],
5555
output_key="statements",
56-
output_type="JSON",
56+
output_type="json",
5757
)
5858
```
5959

docs/howtos/customisations/azure-openai.ipynb

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -453,11 +453,13 @@
453453
"from ragas.testset.evolutions import simple, reasoning, multi_context\n",
454454
"\n",
455455
"\n",
456-
"loader = DirectoryLoader(\"./2023-llm-papers/\", use_multithreading=True, silent_errors=True,sample_size=1)\n",
456+
"loader = DirectoryLoader(\n",
457+
" \"./2023-llm-papers/\", use_multithreading=True, silent_errors=True, sample_size=1\n",
458+
")\n",
457459
"documents = loader.load()\n",
458460
"\n",
459461
"for document in documents:\n",
460-
" document.metadata['filename'] = document.metadata['source']"
462+
" document.metadata[\"filename\"] = document.metadata[\"source\"]"
461463
]
462464
},
463465
{
@@ -475,11 +477,17 @@
475477
"metadata": {},
476478
"outputs": [],
477479
"source": [
478-
"generator = TestsetGenerator.from_langchain(generator_llm=azure_model,critic_llm=azure_model,embeddings=azure_embeddings)\n",
480+
"generator = TestsetGenerator.from_langchain(\n",
481+
" generator_llm=azure_model, critic_llm=azure_model, embeddings=azure_embeddings\n",
482+
")\n",
479483
"\n",
480-
"testset = generator.generate_with_langchain_docs(documents, test_size=10, \n",
481-
" raise_exceptions=False, with_debugging_logs=False,\n",
482-
" distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}) "
484+
"testset = generator.generate_with_langchain_docs(\n",
485+
" documents,\n",
486+
" test_size=10,\n",
487+
" raise_exceptions=False,\n",
488+
" with_debugging_logs=False,\n",
489+
" distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},\n",
490+
")"
483491
]
484492
},
485493
{

docs/howtos/integrations/index.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ langsmith.ipynb
1212
ragas-arize.ipynb
1313
langfuse.ipynb
1414
athina.ipynb
15-
openlayer.ipynb
1615
zeno.ipynb
1716
tonic-validate.ipynb
1817
ragas_haystack.ipynb

0 commit comments

Comments
 (0)