|
33 | 33 | <a href="#fire-quickstart">Quickstart</a> | |
34 | 34 | <a href="#luggage-metrics">Metrics</a> | |
35 | 35 | <a href="#-community">Community</a> | |
| 36 | + <a href="#-open-analytics">Open Analytics</a> | |
36 | 37 | <a href="#raising_hand_man-faq">FAQ</a> | |
37 | 38 | <a href="https://huggingface.co/explodinggradients">Hugging Face</a> |
38 | 39 | <p> |
@@ -86,28 +87,29 @@ Ragas measures your pipeline's performance against two dimensions |
86 | 87 | Through repeated experiments, we have found that the quality of a RAG pipeline is highly dependent on these two dimensions. The final `ragas_score` is the harmonic mean of these two factors. |
87 | 88 |
|
88 | 89 | To read more about our metrics, checkout [docs](/docs/metrics.md). |
89 | | -## :question: How to use Ragas to improve your pipeline? |
90 | | -*"Measurement is the first step that leads to control and eventually to improvement" - James Harrington* |
| 90 | +## 🫂 Community |
| 91 | +If you want to get more involved with Ragas, check out our [discord server](https://discord.gg/5djav8GGNZ). It's a fun community where we geek out about LLM, Retrieval, Production issues and more. |
91 | 92 |
|
92 | | -Here we assume that you already have your RAG pipeline ready. When it comes to RAG pipelines, there are mainly two parts - Retriever and generator. A change in any of this should also impact your pipelines's quality. |
| 93 | +## 🔍 Open Analytics |
| 94 | +We track very basic usage metrics to guide us to figure out what our users want, what is working and what's not. As a young startup, we have to be brutally honest about this which is why we are tracking these metrics. But as an Open Startup we open-source all the data we collect. You can read more about this [here](https://github.com/explodinggradients/ragas/issues/49). If you want to take a look at exactly what we track, feel free to check the [code](./src/ragas/_analytics.py) |
93 | 95 |
|
94 | | -1. First, decide one parameter that you're interested in adjusting. for example the number of retrieved documents, K. |
95 | | -2. Collect a set of sample prompts (min 20) to form your test set. |
96 | | -3. Run your pipeline using the test set before and after the change. Each time record the prompts with context and generated output. |
97 | | -4. Run ragas evaluation for each of them to generate evaluation scores. |
98 | | -5. Compare the scores and you will know how much the change has affected your pipelines' performance. |
| 96 | +You can disable usage-tracking if you want by setting the `RAGAS_DO_NOT_TRACK` flag to true. |
99 | 97 |
|
100 | | -## 🫂 Community |
101 | | -If you want to get more involved with Ragas, check out our [discord server](https://discord.gg/5djav8GGNZ). It's a fun community where we geek out about LLM, Retrieval, Production issues and more. |
102 | 98 |
|
103 | 99 | ## :raising_hand_man: FAQ |
104 | 100 | 1. Why harmonic mean? |
105 | 101 |
|
106 | 102 | Harmonic mean penalizes extreme values. For example, if your generated answer is fully factually consistent with the context (faithfulness = 1) but is not relevant to the question (relevancy = 0), a simple average would give you a score of 0.5 but a harmonic mean will give you 0.0 |
107 | 103 |
|
| 104 | +2. How to use Ragas to improve your pipeline? |
108 | 105 |
|
| 106 | +*"Measurement is the first step that leads to control and eventually to improvement" - James Harrington* |
109 | 107 |
|
| 108 | +Here we assume that you already have your RAG pipeline ready. When it comes to RAG pipelines, there are mainly two parts - Retriever and generator. A change in any of this should also impact your pipelines's quality. |
110 | 109 |
|
111 | | - |
112 | | - |
| 110 | +1. First, decide one parameter that you're interested in adjusting. for example the number of retrieved documents, K. |
| 111 | +2. Collect a set of sample prompts (min 20) to form your test set. |
| 112 | +3. Run your pipeline using the test set before and after the change. Each time record the prompts with context and generated output. |
| 113 | +4. Run ragas evaluation for each of them to generate evaluation scores. |
| 114 | +5. Compare the scores and you will know how much the change has affected your pipelines' performance. |
113 | 115 |
|
0 commit comments