Skip to content

Commit 23e8b4c

Browse files
authored
Docs improvements (#1805)
# PR Description Fixes some typos in the docs.
1 parent 2559eb0 commit 23e8b4c

File tree

6 files changed

+6
-6
lines changed

6 files changed

+6
-6
lines changed

docs/concepts/metrics/available_metrics/agents.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ scorer.llm = your_llm
101101
await scorer.multi_turn_ascore(sample)
102102
```
103103

104-
The tool call sequence specified in `reference_tool_calls` is used as the ideal outcome. If the tool calls made by the AI does not the the order or sequence of the `reference_tool_calls`, the metric will return a score of 0. This helps to ensure that the AI is able to identify and call the required tools in the correct order to complete a given task.
104+
The tool call sequence specified in `reference_tool_calls` is used as the ideal outcome. If the tool calls made by the AI does not match the order or sequence of the `reference_tool_calls`, the metric will return a score of 0. This helps to ensure that the AI is able to identify and call the required tools in the correct order to complete a given task.
105105

106106
By default the tool names and arguments are compared using exact string matching. But sometimes this might not be optimal, for example if the args are natural language strings. You can also use any ragas metrics (values between 0 and 1) as distance measure to identify if a retrieved context is relevant or not. For example,
107107

docs/concepts/metrics/available_metrics/answer_relevance.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Where:
1818
* $E_o$ is the embedding of the original question.
1919
* $N$ is the number of generated questions, which is 3 default.
2020

21-
Please note, that eventhough in practice the score will range between 0 and 1 most of the time, this is not mathematically guaranteed, due to the nature of the cosine similarity ranging from -1 to 1.
21+
Please note, that even though in practice the score will range between 0 and 1 most of the time, this is not mathematically guaranteed, due to the nature of the cosine similarity ranging from -1 to 1.
2222

2323
An answer is deemed relevant when it directly and appropriately addresses the original question. Importantly, our assessment of answer relevance does not consider factuality but instead penalizes cases where the answer lacks completeness or contains redundant details. To calculate this score, the LLM is prompted to generate an appropriate question for the generated answer multiple times, and the mean cosine similarity between these generated questions and the original question is measured. The underlying idea is that if the generated answer accurately addresses the initial question, the LLM should be able to generate questions from the answer that align with the original question.
2424

docs/concepts/metrics/available_metrics/context_precision.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ The following metrics uses LLM to identify if a retrieved context is relevant or
1717

1818
### Context Precision without reference
1919

20-
`LLMContextPrecisionWithoutReference` metric is can be used when you have both retrieved contexts and also reference contexts associated with a `user_input`. To estimate if a retrieved contexts is relevant or not this method uses the LLM to compare each of the retrieved context or chunk present in `retrieved_contexts` with `response`.
20+
`LLMContextPrecisionWithoutReference` metric can be used when you have both retrieved contexts and also reference contexts associated with a `user_input`. To estimate if a retrieved contexts is relevant or not this method uses the LLM to compare each of the retrieved context or chunk present in `retrieved_contexts` with `response`.
2121

2222
#### Example
2323

docs/concepts/metrics/available_metrics/factual_correctness.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Factual Correctness
22

3-
`FactualCorrectness` is a metric that compares and evaluates the factual accuracy of the generated `response` with the `reference`. This metric is used to determine the extent to which the generated response aligns with the reference. The factual correctness score ranges from 0 to 1, with higher values indicating better performance. To measure the alignment between the response and the reference, the metric uses the LLM for first break down the response and reference into claims and then uses natural language inference to determine the factual overlap between the response and the reference. Factual overlap is quantified using precision, recall, and F1 score, which can be controlled using the `mode` parameter.
3+
`FactualCorrectness` is a metric that compares and evaluates the factual accuracy of the generated `response` with the `reference`. This metric is used to determine the extent to which the generated response aligns with the reference. The factual correctness score ranges from 0 to 1, with higher values indicating better performance. To measure the alignment between the response and the reference, the metric uses the LLM to first break down the response and reference into claims and then uses natural language inference to determine the factual overlap between the response and the reference. Factual overlap is quantified using precision, recall, and F1 score, which can be controlled using the `mode` parameter.
44

55
The formula for calculating True Positive (TP), False Positive (FP), and False Negative (FN) is as follows:
66

docs/concepts/metrics/available_metrics/noise_sensitivity.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ await scorer.single_turn_ascore(sample)
5353
- As the largest institutional investor in India, LIC manages a substantial life fund, contributing to the financial stability of the country.
5454
5555
Irrelevant Retrieval:
56-
- The Indian economy is one of the fastest-growing major economies in the world, thanks to the secors like finance, technology, manufacturing etc.
56+
- The Indian economy is one of the fastest-growing major economies in the world, thanks to the sectors like finance, technology, manufacturing etc.
5757

5858
Let's examine how noise sensitivity in relevant context was calculated:
5959

docs/howtos/customizations/metrics/_write_your_own_metric_advanced.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ $$
1313
\text{Refusal rate} = \frac{\text{Total number of refused requests}}{\text{Total number of human requests}}
1414
$$
1515

16-
**Step 2**: Decide how are you going to derive this information from the sample. Here I am going to use LLM to do it, ie to check weather the request was refused or answered. You may use Non LLM based methods too. Since I am using LLM based method, this would become an LLM based metric.
16+
**Step 2**: Decide how are you going to derive this information from the sample. Here I am going to use LLM to do it, ie to check whether the request was refused or answered. You may use Non LLM based methods too. Since I am using LLM based method, this would become an LLM based metric.
1717

1818
**Step 3**: Decide if your metric should work in Single Turn and or Multi Turn data.
1919

0 commit comments

Comments
 (0)