Made embeddings and LLMs dependent on metric in evaluate function (#628)

peanutshawny · web-flow · commit f6a932ad5bb7 · 2024-02-18T16:00:53.000-08:00
Since we already specify whether or not a metric requires an
LLM/embeddings model via inheritence from `MetricWithLLM` and
`MetricWithEmbeddings`, there isn't really a need to force the use of
default LLMs/embeddings in `evaluate` if, for example, no metrics that
need embeddings are specified in `metrics`.

I believe that initiating an LLM/embedding model at the metric level
will help clarify how to use `evaluate`, and will make things simpler in
the future when more metrics are added as it decouples needing to
initialize LLMs/embedding models for metrics that potentially don't need
it. They are even optional arguments to the function itself.

**Copilot Description**

This pull request mainly refactors the `evaluate` function in the
`src/ragas/evaluation.py` file. The changes aim to optimize the import
and usage of `llm_factory` and `embedding_factory`, and clarify the
function comments.

Here are the main changes:

*
[`src/ragas/evaluation.py`](diffhunk://#diff-ae27b15b33603128d151769a7a1a11ed36bd8151ff2326f81e1478889f87c91fL13-R14):
Two new imports were added to the top of the file: `embedding_factory`
and `llm_factory` from `ragas.embeddings.base` and `ragas.llms`
respectively. This change helps to avoid repetitive imports within the
`evaluate` function.

Changes within the `evaluate` function:

* The comments for the `llm` and `embeddings` parameters were updated to
specify that the default language model and embeddings are used for
metrics which require an LLM or embeddings. This provides more clarity
on the function's behavior.
* The conditional logic for setting `llm` and `embeddings` was
simplified. The `llm_factory` and `embedding_factory` are now only
called when `llm` and `embeddings` are `None` and the corresponding
metric requires them. This change removes the need for importing
`llm_factory` and `embedding_factory` inside the function.
diff --git a/src/ragas/evaluation.py b/src/ragas/evaluation.py
@@ -10,7 +10,8 @@
 
 from ragas._analytics import EvaluationEvent, track
 from ragas.callbacks import new_group
-from ragas.embeddings.base import BaseRagasEmbeddings, LangchainEmbeddingsWrapper
+from ragas.embeddings.base import BaseRagasEmbeddings, LangchainEmbeddingsWrapper, embedding_factory
+from ragas.llms import llm_factory
 from ragas.exceptions import ExceptionInRunner
 from ragas.executor import Executor
 from ragas.llms.base import BaseRagasLLM, LangchainLLMWrapper
@@ -57,11 +58,11 @@ def evaluate(
         evaluation on the best set of metrics to give a complete view.
     llm: BaseRagasLLM, optional
         The language model to use for the metrics. If not provided then ragas will use
-        the default language model. This can we overridden by the llm specified in
+        the default language model for metrics which require an LLM. This can we overridden by the llm specified in
         the metric level with `metric.llm`.
     embeddings: BaseRagasEmbeddings, optional
         The embeddings to use for the metrics. If not provided then ragas will use
-        the default embeddings. This can we overridden by the embeddings specified in
+        the default embeddings for metrics which require embeddings. This can we overridden by the embeddings specified in
         the metric level with `metric.embeddings`.
     callbacks: Callbacks, optional
         Lifecycle Langchain Callbacks to run during evaluation. Check the
@@ -144,34 +145,30 @@ def evaluate(
     validate_column_dtypes(dataset)
 
     # set the llm and embeddings
-    if llm is None:
-        from ragas.llms import llm_factory
-
-        llm = llm_factory()
-    elif isinstance(llm, LangchainLLM):
+    if isinstance(llm, LangchainLLM):
         llm = LangchainLLMWrapper(llm, run_config=run_config)
-    if embeddings is None:
-        from ragas.embeddings.base import embedding_factory
-
-        embeddings = embedding_factory()
-    elif isinstance(embeddings, LangchainEmbeddings):
+    if isinstance(embeddings, LangchainEmbeddings):
         embeddings = LangchainEmbeddingsWrapper(embeddings)
+
     # init llms and embeddings
     binary_metrics = []
     llm_changed: t.List[int] = []
     embeddings_changed: t.List[int] = []
     answer_correctness_is_set = -1
+
     for i, metric in enumerate(metrics):
         if isinstance(metric, AspectCritique):
             binary_metrics.append(metric.name)
-        if isinstance(metric, MetricWithLLM):
-            if metric.llm is None:
-                metric.llm = llm
-                llm_changed.append(i)
-        if isinstance(metric, MetricWithEmbeddings):
-            if metric.embeddings is None:
-                metric.embeddings = embeddings
-                embeddings_changed.append(i)
+        if isinstance(metric, MetricWithLLM) and metric.llm is None:
+            if llm is None:
+                llm = llm_factory()
+            metric.llm = llm
+            llm_changed.append(i)
+        if isinstance(metric, MetricWithEmbeddings) and metric.embeddings is None:
+            if embeddings is None:
+                embeddings = embedding_factory()
+            metric.embeddings = embeddings
+            embeddings_changed.append(i)
         if isinstance(metric, AnswerCorrectness):
             if metric.answer_similarity is None:
                 answer_correctness_is_set = i