Docs improvements (#1805)

EricGustin · web-flow · commit 23e8b4c035e0 · 2025-01-06T09:50:49.000+05:30
# PR Description
Fixes some typos in the docs.
diff --git a/docs/concepts/metrics/available_metrics/agents.md b/docs/concepts/metrics/available_metrics/agents.md
@@ -101,7 +101,7 @@ scorer.llm = your_llm
 await scorer.multi_turn_ascore(sample)
 ```
 
-The tool call sequence specified in `reference_tool_calls` is used as the ideal outcome. If the tool calls made by the AI does not the the order or sequence of the `reference_tool_calls`, the metric will return a score of 0. This helps to ensure that the AI is able to identify and call the required tools in the correct order to complete a given task.
+The tool call sequence specified in `reference_tool_calls` is used as the ideal outcome. If the tool calls made by the AI does not match the order or sequence of the `reference_tool_calls`, the metric will return a score of 0. This helps to ensure that the AI is able to identify and call the required tools in the correct order to complete a given task.
 
 By default the tool names and arguments are compared using exact string matching. But sometimes this might not be optimal, for example if the args are natural language strings. You can also use any ragas metrics (values between 0 and 1) as distance measure to identify if a retrieved context is relevant or not. For example,
 
diff --git a/docs/concepts/metrics/available_metrics/answer_relevance.md b/docs/concepts/metrics/available_metrics/answer_relevance.md
@@ -18,7 +18,7 @@ Where:
 * $E_o$ is the embedding of the original question.
 * $N$ is the number of generated questions, which is 3 default.
 
-Please note, that eventhough in practice the score will range between 0 and 1 most of the time, this is not mathematically guaranteed, due to the nature of the cosine similarity ranging from -1 to 1.
+Please note, that even though in practice the score will range between 0 and 1 most of the time, this is not mathematically guaranteed, due to the nature of the cosine similarity ranging from -1 to 1.
 
 An answer is deemed relevant when it directly and appropriately addresses the original question. Importantly, our assessment of answer relevance does not consider factuality but instead penalizes cases where the answer lacks completeness or contains redundant details. To calculate this score, the LLM is prompted to generate an appropriate question for the generated answer multiple times, and the mean cosine similarity between these generated questions and the original question is measured. The underlying idea is that if the generated answer accurately addresses the initial question, the LLM should be able to generate questions from the answer that align with the original question.
 
diff --git a/docs/concepts/metrics/available_metrics/context_precision.md b/docs/concepts/metrics/available_metrics/context_precision.md
@@ -17,7 +17,7 @@ The following metrics uses LLM to identify if a retrieved context is relevant or
 
 ### Context Precision without reference
 
-`LLMContextPrecisionWithoutReference` metric is can be used when you have both retrieved contexts and also reference contexts associated with a `user_input`. To estimate if a retrieved contexts is relevant or not this method uses the LLM to compare each of the retrieved context or chunk present in `retrieved_contexts` with `response`. 
+`LLMContextPrecisionWithoutReference` metric can be used when you have both retrieved contexts and also reference contexts associated with a `user_input`. To estimate if a retrieved contexts is relevant or not this method uses the LLM to compare each of the retrieved context or chunk present in `retrieved_contexts` with `response`.
 
 #### Example
     
diff --git a/docs/concepts/metrics/available_metrics/factual_correctness.md b/docs/concepts/metrics/available_metrics/factual_correctness.md
@@ -1,6 +1,6 @@
 ## Factual Correctness
 
-`FactualCorrectness` is a metric that compares and evaluates the factual accuracy of the generated `response` with the `reference`. This metric is used to determine the extent to which the generated response aligns with the reference. The factual correctness score ranges from 0 to 1, with higher values indicating better performance. To measure the alignment between the response and the reference, the metric uses the LLM for first break down the response and reference into claims and then uses natural language inference to determine the factual overlap between the response and the reference. Factual overlap is quantified using precision, recall, and F1 score, which can be controlled using the `mode` parameter.
+`FactualCorrectness` is a metric that compares and evaluates the factual accuracy of the generated `response` with the `reference`. This metric is used to determine the extent to which the generated response aligns with the reference. The factual correctness score ranges from 0 to 1, with higher values indicating better performance. To measure the alignment between the response and the reference, the metric uses the LLM to first break down the response and reference into claims and then uses natural language inference to determine the factual overlap between the response and the reference. Factual overlap is quantified using precision, recall, and F1 score, which can be controlled using the `mode` parameter.
 
 The formula for calculating True Positive (TP), False Positive (FP), and False Negative (FN) is as follows:
 
diff --git a/docs/concepts/metrics/available_metrics/noise_sensitivity.md b/docs/concepts/metrics/available_metrics/noise_sensitivity.md
@@ -53,7 +53,7 @@ await scorer.single_turn_ascore(sample)
         - As the largest institutional investor in India, LIC manages a substantial life fund, contributing to the financial stability of the country.
         
     Irrelevant Retrieval: 
-        - The Indian economy is one of the fastest-growing major economies in the world, thanks to the secors like finance, technology, manufacturing etc.
+        - The Indian economy is one of the fastest-growing major economies in the world, thanks to the sectors like finance, technology, manufacturing etc.
 
 Let's examine how noise sensitivity in relevant context was calculated:
 
diff --git a/docs/howtos/customizations/metrics/_write_your_own_metric_advanced.md b/docs/howtos/customizations/metrics/_write_your_own_metric_advanced.md
@@ -13,7 +13,7 @@ $$
 \text{Refusal rate} = \frac{\text{Total number of refused requests}}{\text{Total number of human requests}}
 $$
 
-**Step 2**: Decide how are you going to derive this information from the sample. Here I am going to use LLM to do it, ie to check weather the request was refused or answered. You may use Non LLM based methods too. Since I am using LLM based method, this would become an LLM based metric. 
+**Step 2**: Decide how are you going to derive this information from the sample. Here I am going to use LLM to do it, ie to check whether the request was refused or answered. You may use Non LLM based methods too. Since I am using LLM based method, this would become an LLM based metric.
 
 **Step 3**: Decide if your metric should work in Single Turn and or Multi Turn data.