Skip to content

Commit d586865

Browse files
authored
Evaluation function documentation improvements (#1965)
- Improvements in the clarity of the evaluation function documentation - Grammar corrections
1 parent 65de11c commit d586865

File tree

1 file changed

+35
-40
lines changed

1 file changed

+35
-40
lines changed

src/ragas/evaluation.py

Lines changed: 35 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -71,49 +71,44 @@ def evaluate(
7171
_pbar: t.Optional[tqdm] = None,
7272
) -> EvaluationResult:
7373
"""
74-
Run the evaluation on the dataset with different metrics
74+
Perform the evaluation on the dataset with different metrics
7575
7676
Parameters
7777
----------
78-
dataset : Dataset, EvaluationDataset
79-
The dataset in the format of ragas which the metrics will use to score the RAG
80-
pipeline with
81-
metrics : list[Metric] , optional
82-
List of metrics to use for evaluation. If not provided then ragas will run the
83-
evaluation on the best set of metrics to give a complete view.
84-
llm: BaseRagasLLM, optional
85-
The language model to use for the metrics. If not provided then ragas will use
86-
the default language model for metrics which require an LLM. This can we overridden by the llm specified in
87-
the metric level with `metric.llm`.
88-
embeddings: BaseRagasEmbeddings, optional
89-
The embeddings to use for the metrics. If not provided then ragas will use
90-
the default embeddings for metrics which require embeddings. This can we overridden by the embeddings specified in
91-
the metric level with `metric.embeddings`.
92-
experiment_name: str, optional
93-
The name of the experiment to track. This is used to track the evaluation in the tracing tools.
94-
callbacks: Callbacks, optional
95-
Lifecycle Langchain Callbacks to run during evaluation. Check the
96-
[langchain documentation](https://python.langchain.com/docs/modules/callbacks/)
97-
for more information.
98-
run_config: RunConfig, optional
99-
Configuration for runtime settings like timeout and retries. If not provided,
100-
default values are used.
101-
token_usage_parser: TokenUsageParser, optional
102-
Parser to get the token usage from the LLM result. If not provided then the
103-
the cost and total tokens will not be calculated. Default is None.
104-
raise_exceptions: False
105-
Whether to raise exceptions or not. If set to True then the evaluation will
106-
raise an exception if any of the metrics fail. If set to False then the
107-
evaluation will return `np.nan` for the row that failed. Default is False.
108-
column_map : dict[str, str], optional
109-
The column names of the dataset to use for evaluation. If the column names of
110-
the dataset are different from the default ones then you can provide the
111-
mapping as a dictionary here. Example: If the dataset column name is contexts_v1,
112-
column_map can be given as {"contexts":"contexts_v1"}
113-
show_progress: bool, optional
114-
Whether to show the progress bar during evaluation. If set to False, the progress bar will be disabled. Default is True.
115-
batch_size: int, optional
116-
How large should batches be. If set to None (default), no batching is done.
78+
dataset : Dataset, EvaluationDataset
79+
The dataset used by the metrics to evaluate the RAG pipeline.
80+
metrics : list[Metric], optional
81+
List of metrics to use for evaluation. If not provided, ragas will run
82+
the evaluation on the best set of metrics to give a complete view.
83+
llm : BaseRagasLLM, optional
84+
The language model (LLM) to use to generate the score for calculating the metrics.
85+
If not provided, ragas will use the default
86+
language model for metrics that require an LLM. This can be overridden by the LLM
87+
specified in the metric level with `metric.llm`.
88+
embeddings : BaseRagasEmbeddings, optional
89+
The embeddings model to use for the metrics.
90+
If not provided, ragas will use the default embeddings for metrics that require embeddings.
91+
This can be overridden by the embeddings specified in the metric level with `metric.embeddings`.
92+
experiment_name : str, optional
93+
The name of the experiment to track. This is used to track the evaluation in the tracing tool.
94+
callbacks : Callbacks, optional
95+
Lifecycle Langchain Callbacks to run during evaluation.
96+
Check the [Langchain documentation](https://python.langchain.com/docs/modules/callbacks/) for more information.
97+
run_config : RunConfig, optional
98+
Configuration for runtime settings like timeout and retries. If not provided, default values are used.
99+
token_usage_parser : TokenUsageParser, optional
100+
Parser to get the token usage from the LLM result.
101+
If not provided, the cost and total token count will not be calculated. Default is None.
102+
raise_exceptions : False
103+
Whether to raise exceptions or not. If set to True, the evaluation will raise an exception
104+
if any of the metrics fail. If set to False, the evaluation will return `np.nan` for the row that failed. Default is False.
105+
column_map : dict[str, str], optional
106+
The column names of the dataset to use for evaluation. If the column names of the dataset are different from the default ones,
107+
it is possible to provide the mapping as a dictionary here. Example: If the dataset column name is `contexts_v1`, it is possible to pass column_map as `{"contexts": "contexts_v1"}`.
108+
show_progress : bool, optional
109+
Whether to show the progress bar during evaluation. If set to False, the progress bar will be disabled. The default is True.
110+
batch_size : int, optional
111+
How large the batches should be. If set to None (default), no batching is done.
117112
118113
Returns
119114
-------

0 commit comments

Comments
 (0)