|
1 | 1 | # ✨ Introduction |
2 | 2 |
|
3 | | -Ragas is a library that provides tools to supercharge the evaluation of Large Language Model (LLM) applications. It is designed to help you evaluate your LLM applications with ease and confidence. |
| 3 | +Ragas is a library that helps you move from "vibe checks" to systematic evaluation loops for your AI applications. It provides tools to supercharge the evaluation of Large Language Model (LLM) applications, enabling you to evaluate your LLM applications with ease and confidence. |
4 | 4 |
|
| 5 | +## Why Ragas? |
5 | 6 |
|
| 7 | +Traditional evaluation metrics don't capture what matters for LLM applications. Manual evaluation doesn't scale. Ragas solves this by combining **LLM-driven metrics** with **systematic experimentation** to create a continuous improvement loop. |
| 8 | + |
| 9 | +### Key Features |
| 10 | + |
| 11 | +- **Experiments-first approach**: Evaluate changes consistently with `experiments`. Make changes, run evaluations, observe results, and iterate to improve your LLM application. |
| 12 | + |
| 13 | +- **Ragas Metrics**: Create custom metrics tailored to your specific use case with simple decorators or use our library of [available metrics](./concepts/metrics/available_metrics/index.md). Learn more about [metrics in Ragas](./concepts/metrics/overview/index.md). |
| 14 | + |
| 15 | +- **Easy to integrate**: Built-in dataset management, result tracking, and integration with popular frameworks like LangChain, LlamaIndex, and more. |
6 | 16 |
|
7 | 17 | <div class="grid cards" markdown> |
8 | 18 | - 🚀 **Get Started** |
9 | 19 |
|
10 | | - Install with `pip` and get started with Ragas with these tutorials. |
| 20 | + Start evaluating in 5 minutes with our quickstart guide. |
11 | 21 |
|
12 | | - [:octicons-arrow-right-24: Get Started](getstarted/evals.md) |
| 22 | + [:octicons-arrow-right-24: Get Started](getstarted/experiments_quickstart.md) |
13 | 23 |
|
14 | 24 | - 📚 **Core Concepts** |
15 | 25 |
|
16 | | - In depth explanation and discussion of the concepts and working of different features available in Ragas. |
| 26 | + Understand experiments, metrics, and datasets—the building blocks of effective evaluation. |
17 | 27 |
|
18 | 28 | [:octicons-arrow-right-24: Core Concepts](./concepts/index.md) |
19 | 29 |
|
20 | 30 | - 🛠️ **How-to Guides** |
21 | 31 |
|
22 | | - Practical guides to help you achieve a specific goals. Take a look at these |
23 | | - guides to learn how to use Ragas to solve real-world problems. |
| 32 | + Integrate Ragas into your workflow with practical guides for specific use cases. |
24 | 33 |
|
25 | 34 | [:octicons-arrow-right-24: How-to Guides](./howtos/index.md) |
26 | 35 |
|
27 | 36 | - 📖 **References** |
28 | 37 |
|
29 | | - Technical descriptions of how Ragas classes and methods work. |
| 38 | + API documentation and technical details for diving deeper. |
30 | 39 |
|
31 | 40 | [:octicons-arrow-right-24: References](./references/index.md) |
32 | 41 |
|
33 | 42 | </div> |
34 | 43 |
|
35 | 44 |
|
| 45 | +## Want help improving your AI application using evals? |
36 | 46 |
|
| 47 | +In the past 2 years, we have seen and helped improve many AI applications using evals. |
37 | 48 |
|
| 49 | +We are compressing this knowledge into a product to replace vibe checks with eval loops so that you can focus on building great AI applications. |
38 | 50 |
|
39 | | -## Frequently Asked Questions |
40 | | - |
41 | | -<div class="toggle-list"><span class="arrow">→</span> What is the best open-source model to use?</div> |
42 | | -<div style="display: none;"> |
43 | | - There isn't a single correct answer to this question. With the rapid pace of AI model development, new open-source models are released every week, often claiming to outperform previous versions. The best model for your needs depends largely on your GPU capacity and the type of data you're working with. |
44 | | - <br><br> |
45 | | - It's a good idea to explore newer, widely accepted models with strong general capabilities. You can refer to <a href="https://github.com/eugeneyan/open-llms?tab=readme-ov-file#open-llms">this list</a> for available open-source models, their release dates, and fine-tuned variants. |
46 | | -</div> |
47 | | - |
48 | | -<div class="toggle-list"><span class="arrow">→</span> Why do NaN values appear in evaluation results?</div> |
49 | | -<div style="display: none;"> |
50 | | - NaN stands for "Not a Number." In ragas evaluation results, NaN can appear for two main reasons: |
51 | | - <ul style="margin: 0.5rem 0; padding-left: 1.5rem;"> |
52 | | - <li><strong>JSON Parsing Issue:</strong> The model's output is not JSON-parsable. ragas requires models to output JSON-compatible responses because all prompts are structured using Pydantic. This ensures efficient parsing of LLM outputs.</li> |
53 | | - <li><strong>Non-Ideal Cases for Scoring:</strong> Certain cases in the sample may not be ideal for scoring. For example, scoring the faithfulness of a response like "I don't know" might not be appropriate.</li> |
54 | | - </ul> |
55 | | -</div> |
56 | | - |
57 | | -<div class="toggle-list"><span class="arrow">→</span> How can I make evaluation results more explainable?</div> |
58 | | -<div style="display: none;"> |
59 | | - The best way is to trace and log your evaluation, then inspect the results using LLM traces. You can follow a detailed example of this process <a href="./howtos/customizations/metrics/tracing/">here</a>. |
60 | | -</div> |
61 | | - |
62 | | -<script> |
63 | | -// FAQ |
64 | | -(function() { |
65 | | - function initFAQ() { |
66 | | - const toggles = document.querySelectorAll('.toggle-list'); |
67 | | - |
68 | | - toggles.forEach(toggle => { |
69 | | - // Remove any existing listeners |
70 | | - const newToggle = toggle.cloneNode(true); |
71 | | - toggle.parentNode.replaceChild(newToggle, toggle); |
72 | | - }); |
73 | | - |
74 | | - // Re-select after cloning |
75 | | - const freshToggles = document.querySelectorAll('.toggle-list'); |
76 | | - |
77 | | - freshToggles.forEach(toggle => { |
78 | | - const arrow = toggle.querySelector('.arrow'); |
79 | | - const content = toggle.nextElementSibling; |
80 | | - |
81 | | - // Initialize as closed |
82 | | - if (arrow) arrow.innerText = '→'; |
83 | | - if (content) content.style.display = 'none'; |
84 | | - toggle.classList.remove('active'); |
85 | | - |
86 | | - // Add click listener |
87 | | - toggle.addEventListener('click', function() { |
88 | | - const myContent = this.nextElementSibling; |
89 | | - const myArrow = this.querySelector('.arrow'); |
90 | | - const isOpen = this.classList.contains('active'); |
91 | | - |
92 | | - // Close all others first |
93 | | - freshToggles.forEach(other => { |
94 | | - const otherContent = other.nextElementSibling; |
95 | | - const otherArrow = other.querySelector('.arrow'); |
96 | | - if (otherContent) otherContent.style.display = 'none'; |
97 | | - other.classList.remove('active'); |
98 | | - if (otherArrow) otherArrow.innerText = '→'; |
99 | | - }); |
100 | | - |
101 | | - // Open this one if it was closed |
102 | | - if (!isOpen) { |
103 | | - if (myContent) myContent.style.display = 'block'; |
104 | | - this.classList.add('active'); |
105 | | - if (myArrow) myArrow.innerText = '↓'; |
106 | | - } |
107 | | - }); |
108 | | - }); |
109 | | - } |
110 | | - |
111 | | - // Initialize when page loads |
112 | | - if (document.readyState === 'loading') { |
113 | | - document.addEventListener('DOMContentLoaded', function() { |
114 | | - initFAQ(); |
115 | | - }); |
116 | | - } else { |
117 | | - initFAQ(); |
118 | | - } |
119 | | -})(); |
120 | | -</script> |
| 51 | +If you want help with improving and scaling up your AI application using evals, 🔗 Book a [slot ](https://bit.ly/3EBYq4J) or drop us a line: [[email protected]](mailto:[email protected]). |
121 | 52 |
|
0 commit comments