Skip to content

Commit 402dc7e

Browse files
Added testset generation for bedrock (#626)
Testset generation using bedrock model and embeddings --------- Co-authored-by: Shahules786 <[email protected]>
1 parent 2235ee9 commit 402dc7e

File tree

1 file changed

+149
-1
lines changed

1 file changed

+149
-1
lines changed

docs/howtos/customisations/aws-bedrock.ipynb

Lines changed: 149 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,10 @@
99
"\n",
1010
"Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.\n",
1111
"\n",
12-
"This tutorial will show you how to use Amazon Bedrock endpoints and LangChain."
12+
"This tutorial will show you how to use Amazon Bedrock with Ragas.\n",
13+
"\n",
14+
"1. [Metrics](#load-sample-dataset)\n",
15+
"2. [Testset generation](#test-data-generation)"
1316
]
1417
},
1518
{
@@ -22,6 +25,14 @@
2225
":::"
2326
]
2427
},
28+
{
29+
"cell_type": "markdown",
30+
"id": "f466494a",
31+
"metadata": {},
32+
"source": [
33+
"## Metrics"
34+
]
35+
},
2536
{
2637
"cell_type": "markdown",
2738
"id": "e54b5e01",
@@ -330,6 +341,143 @@
330341
"df.head()"
331342
]
332343
},
344+
{
345+
"cell_type": "markdown",
346+
"id": "b133aff0",
347+
"metadata": {},
348+
"source": [
349+
"## Test Data Generation"
350+
]
351+
},
352+
{
353+
"cell_type": "markdown",
354+
"id": "4c7192f2",
355+
"metadata": {},
356+
"source": [
357+
"Load the documents using desired dataloader."
358+
]
359+
},
360+
{
361+
"cell_type": "code",
362+
"execution_count": null,
363+
"id": "529266ad",
364+
"metadata": {},
365+
"outputs": [],
366+
"source": [
367+
"from langchain_community.document_loaders import UnstructuredURLLoader\n",
368+
"\n",
369+
"urls = [\n",
370+
" \"https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023\",\n",
371+
" \"https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023\",\n",
372+
"]\n",
373+
"loader = UnstructuredURLLoader(urls=urls)\n",
374+
"documents = loader.load()"
375+
]
376+
},
377+
{
378+
"cell_type": "markdown",
379+
"id": "87587749",
380+
"metadata": {},
381+
"source": [
382+
"now we have documents created in the form of langchain `Document`\n",
383+
"Next step is to wrap the embedding and llm model into ragas schema."
384+
]
385+
},
386+
{
387+
"cell_type": "code",
388+
"execution_count": null,
389+
"id": "1d5eaed2",
390+
"metadata": {},
391+
"outputs": [],
392+
"source": [
393+
"from ragas.llms import LangchainLLMWrapper\n",
394+
"from ragas.embeddings.base import LangchainEmbeddingsWrapper\n",
395+
"\n",
396+
"bedrock_model = LangchainLLMWrapper(bedrock_model)\n",
397+
"bedrock_embeddings = LangchainEmbeddingsWrapper(bedrock_embeddings)"
398+
]
399+
},
400+
{
401+
"cell_type": "markdown",
402+
"id": "d7d17468",
403+
"metadata": {},
404+
"source": [
405+
"Next Step is to create chunks from the documents and store the chunks `InMemoryDocumentStore`"
406+
]
407+
},
408+
{
409+
"cell_type": "code",
410+
"execution_count": null,
411+
"id": "4e717c13",
412+
"metadata": {},
413+
"outputs": [],
414+
"source": [
415+
"from ragas.testset.extractor import KeyphraseExtractor\n",
416+
"from langchain.text_splitter import TokenTextSplitter\n",
417+
"from ragas.testset.docstore import InMemoryDocumentStore\n",
418+
"\n",
419+
"splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=100)\n",
420+
"keyphrase_extractor = KeyphraseExtractor(llm=bedrock_model)\n",
421+
"\n",
422+
"docstore = InMemoryDocumentStore(\n",
423+
" splitter=splitter,\n",
424+
" embeddings=bedrock_embeddings,\n",
425+
" extractor=keyphrase_extractor,\n",
426+
")"
427+
]
428+
},
429+
{
430+
"cell_type": "markdown",
431+
"id": "7773f4b5",
432+
"metadata": {},
433+
"source": [
434+
"Initializing `TestsetGenerator` with required arguments and generating data"
435+
]
436+
},
437+
{
438+
"cell_type": "code",
439+
"execution_count": null,
440+
"id": "495ff805",
441+
"metadata": {},
442+
"outputs": [],
443+
"source": [
444+
"from ragas.testset import TestsetGenerator\n",
445+
"from ragas.testset.evolutions import simple, reasoning, multi_context\n",
446+
"\n",
447+
"test_generator = TestsetGenerator(\n",
448+
" generator_llm=bedrock_model,\n",
449+
" critic_llm=bedrock_model,\n",
450+
" embeddings=bedrock_embeddings,\n",
451+
" docstore=docstore,\n",
452+
")\n",
453+
"\n",
454+
"distributions = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}\n",
455+
"\n",
456+
"# use generator.generate_with_llamaindex_docs if you use llama-index as document loader\n",
457+
"testset = test_generator.generate_with_langchain_docs(\n",
458+
" documents=documents, test_size=10, distributions=distributions\n",
459+
")"
460+
]
461+
},
462+
{
463+
"cell_type": "markdown",
464+
"id": "8a80046b",
465+
"metadata": {},
466+
"source": [
467+
"Export the results into pandas¶"
468+
]
469+
},
470+
{
471+
"cell_type": "code",
472+
"execution_count": null,
473+
"id": "0b4633c8",
474+
"metadata": {},
475+
"outputs": [],
476+
"source": [
477+
"test_df = testset.to_pandas()\n",
478+
"test_df.head()"
479+
]
480+
},
333481
{
334482
"cell_type": "markdown",
335483
"id": "f668fce1",

0 commit comments

Comments
 (0)