|
9 | 9 | "\n", |
10 | 10 | "Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.\n", |
11 | 11 | "\n", |
12 | | - "This tutorial will show you how to use Amazon Bedrock endpoints and LangChain." |
| 12 | + "This tutorial will show you how to use Amazon Bedrock with Ragas.\n", |
| 13 | + "\n", |
| 14 | + "1. [Metrics](#load-sample-dataset)\n", |
| 15 | + "2. [Testset generation](#test-data-generation)" |
13 | 16 | ] |
14 | 17 | }, |
15 | 18 | { |
|
22 | 25 | ":::" |
23 | 26 | ] |
24 | 27 | }, |
| 28 | + { |
| 29 | + "cell_type": "markdown", |
| 30 | + "id": "f466494a", |
| 31 | + "metadata": {}, |
| 32 | + "source": [ |
| 33 | + "## Metrics" |
| 34 | + ] |
| 35 | + }, |
25 | 36 | { |
26 | 37 | "cell_type": "markdown", |
27 | 38 | "id": "e54b5e01", |
|
330 | 341 | "df.head()" |
331 | 342 | ] |
332 | 343 | }, |
| 344 | + { |
| 345 | + "cell_type": "markdown", |
| 346 | + "id": "b133aff0", |
| 347 | + "metadata": {}, |
| 348 | + "source": [ |
| 349 | + "## Test Data Generation" |
| 350 | + ] |
| 351 | + }, |
| 352 | + { |
| 353 | + "cell_type": "markdown", |
| 354 | + "id": "4c7192f2", |
| 355 | + "metadata": {}, |
| 356 | + "source": [ |
| 357 | + "Load the documents using desired dataloader." |
| 358 | + ] |
| 359 | + }, |
| 360 | + { |
| 361 | + "cell_type": "code", |
| 362 | + "execution_count": null, |
| 363 | + "id": "529266ad", |
| 364 | + "metadata": {}, |
| 365 | + "outputs": [], |
| 366 | + "source": [ |
| 367 | + "from langchain_community.document_loaders import UnstructuredURLLoader\n", |
| 368 | + "\n", |
| 369 | + "urls = [\n", |
| 370 | + " \"https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023\",\n", |
| 371 | + " \"https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023\",\n", |
| 372 | + "]\n", |
| 373 | + "loader = UnstructuredURLLoader(urls=urls)\n", |
| 374 | + "documents = loader.load()" |
| 375 | + ] |
| 376 | + }, |
| 377 | + { |
| 378 | + "cell_type": "markdown", |
| 379 | + "id": "87587749", |
| 380 | + "metadata": {}, |
| 381 | + "source": [ |
| 382 | + "now we have documents created in the form of langchain `Document`\n", |
| 383 | + "Next step is to wrap the embedding and llm model into ragas schema." |
| 384 | + ] |
| 385 | + }, |
| 386 | + { |
| 387 | + "cell_type": "code", |
| 388 | + "execution_count": null, |
| 389 | + "id": "1d5eaed2", |
| 390 | + "metadata": {}, |
| 391 | + "outputs": [], |
| 392 | + "source": [ |
| 393 | + "from ragas.llms import LangchainLLMWrapper\n", |
| 394 | + "from ragas.embeddings.base import LangchainEmbeddingsWrapper\n", |
| 395 | + "\n", |
| 396 | + "bedrock_model = LangchainLLMWrapper(bedrock_model)\n", |
| 397 | + "bedrock_embeddings = LangchainEmbeddingsWrapper(bedrock_embeddings)" |
| 398 | + ] |
| 399 | + }, |
| 400 | + { |
| 401 | + "cell_type": "markdown", |
| 402 | + "id": "d7d17468", |
| 403 | + "metadata": {}, |
| 404 | + "source": [ |
| 405 | + "Next Step is to create chunks from the documents and store the chunks `InMemoryDocumentStore`" |
| 406 | + ] |
| 407 | + }, |
| 408 | + { |
| 409 | + "cell_type": "code", |
| 410 | + "execution_count": null, |
| 411 | + "id": "4e717c13", |
| 412 | + "metadata": {}, |
| 413 | + "outputs": [], |
| 414 | + "source": [ |
| 415 | + "from ragas.testset.extractor import KeyphraseExtractor\n", |
| 416 | + "from langchain.text_splitter import TokenTextSplitter\n", |
| 417 | + "from ragas.testset.docstore import InMemoryDocumentStore\n", |
| 418 | + "\n", |
| 419 | + "splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=100)\n", |
| 420 | + "keyphrase_extractor = KeyphraseExtractor(llm=bedrock_model)\n", |
| 421 | + "\n", |
| 422 | + "docstore = InMemoryDocumentStore(\n", |
| 423 | + " splitter=splitter,\n", |
| 424 | + " embeddings=bedrock_embeddings,\n", |
| 425 | + " extractor=keyphrase_extractor,\n", |
| 426 | + ")" |
| 427 | + ] |
| 428 | + }, |
| 429 | + { |
| 430 | + "cell_type": "markdown", |
| 431 | + "id": "7773f4b5", |
| 432 | + "metadata": {}, |
| 433 | + "source": [ |
| 434 | + "Initializing `TestsetGenerator` with required arguments and generating data" |
| 435 | + ] |
| 436 | + }, |
| 437 | + { |
| 438 | + "cell_type": "code", |
| 439 | + "execution_count": null, |
| 440 | + "id": "495ff805", |
| 441 | + "metadata": {}, |
| 442 | + "outputs": [], |
| 443 | + "source": [ |
| 444 | + "from ragas.testset import TestsetGenerator\n", |
| 445 | + "from ragas.testset.evolutions import simple, reasoning, multi_context\n", |
| 446 | + "\n", |
| 447 | + "test_generator = TestsetGenerator(\n", |
| 448 | + " generator_llm=bedrock_model,\n", |
| 449 | + " critic_llm=bedrock_model,\n", |
| 450 | + " embeddings=bedrock_embeddings,\n", |
| 451 | + " docstore=docstore,\n", |
| 452 | + ")\n", |
| 453 | + "\n", |
| 454 | + "distributions = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}\n", |
| 455 | + "\n", |
| 456 | + "# use generator.generate_with_llamaindex_docs if you use llama-index as document loader\n", |
| 457 | + "testset = test_generator.generate_with_langchain_docs(\n", |
| 458 | + " documents=documents, test_size=10, distributions=distributions\n", |
| 459 | + ")" |
| 460 | + ] |
| 461 | + }, |
| 462 | + { |
| 463 | + "cell_type": "markdown", |
| 464 | + "id": "8a80046b", |
| 465 | + "metadata": {}, |
| 466 | + "source": [ |
| 467 | + "Export the results into pandas¶" |
| 468 | + ] |
| 469 | + }, |
| 470 | + { |
| 471 | + "cell_type": "code", |
| 472 | + "execution_count": null, |
| 473 | + "id": "0b4633c8", |
| 474 | + "metadata": {}, |
| 475 | + "outputs": [], |
| 476 | + "source": [ |
| 477 | + "test_df = testset.to_pandas()\n", |
| 478 | + "test_df.head()" |
| 479 | + ] |
| 480 | + }, |
333 | 481 | { |
334 | 482 | "cell_type": "markdown", |
335 | 483 | "id": "f668fce1", |
|
0 commit comments