You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ground truth data, or reference data, is important for evaluation as it can offer a comprehensive and consistent measurement of system performance. However, it is often costly and time-consuming to manually curate such a golden dataset.
225
-
We have created a synthetic data pipeline that can custom generate user interaction data for a variety of use cases such as RAG, agents, copilots. They can serve a starting point for a golden dataset for evaluation or for other training purposes. Below is an example for Coding Agents.
225
+
We have created a synthetic data pipeline that can custom generate user interaction data for a variety of use cases such as RAG, agents, copilots. They can serve a starting point for a golden dataset for evaluation or for other training purposes.
226
226
227
-
<h1align="center">
228
-
<img
229
-
src="docs/public/synthetic-data-demo.png"
230
-
>
231
-
</h1>
227
+
To generate custom synthetic data, please visit [Relari](https://www.relari.ai/) to create a free account and you can then generate custom synthetic golden datasets through the Relari Cloud.
232
228
233
229
## 💡 Contributing
234
230
235
231
Interested in contributing? See our [Contribution Guide](CONTRIBUTING.md) for more details.
@@ -246,7 +242,7 @@ Interested in contributing? See our [Contribution Guide](CONTRIBUTING.md) for mo
246
242
- How to Make the Most Out of LLM Production Data: Simulated User Feedback [(link)](https://medium.com/towards-data-science/how-to-make-the-most-out-of-llm-production-data-simulated-user-feedback-843c444febc7)
247
243
- Generate Synthetic Data to Test LLM Applications [(link)](https://medium.com/relari/generate-synthetic-data-to-test-llm-applications-4bffeb51b80e)
248
244
-**Discord:** Join our community of LLM developers [Discord](https://discord.gg/GJnM8SRsHr)
249
-
-**Reach out to founders:**[Email](mailto:[email protected]) or [Schedule a chat](https://cal.com/pasquale/continuous-eval)
245
+
-**Reach out to founders:**[Email](mailto:[email protected]) or [Schedule a chat](https://cal.com/relari/demo)
Token Count calculates the number of tokens used in the retrieved context.
8
+
9
+
A required input for the metrics is `encoder_name` for tiktoken.
10
+
11
+
For example, for the most recent OpenAI models, you use `cl100k_base` as the encoder. For other models, you should look up the specific tokenizer used, or alternatively, you can also use `approx` to get an approximate token count which measures 1 token for every 4 characters.
12
+
13
+
:::tip
14
+
**Tokens in `retrieved_context` often accounts for the majority of LLM token usage in a RAG application.**
15
+
Token count is useful to keep track of if you are concerned about LLM cost, LLM context window limit, and LLM performance issued caused by low context precision (such as "needle-in-a-haystack" problems).
16
+
:::
17
+
18
+
Required data items: `retrieved_context`
19
+
20
+
```python
21
+
from continuous_eval.metrics.retrieval import TokenCount
22
+
23
+
datum = {
24
+
"retrieved_context": [
25
+
"Lyon is a major city in France.",
26
+
"Paris is the capital of France and also the largest city in the country.",
27
+
],
28
+
"ground_truth_context": ["Paris is the capital of France."],
@@ -93,6 +93,10 @@ Below is the list of metrics available:
93
93
-**Definition:** Rank-aware metrics including Mean Average Precision (MAP), Mean Reciprical Rank (MRR), NDCG (Normalized Discounted Cumulative Gain) of retrieved contexts
0 commit comments