Skip to content

Commit a69e557

Browse files
committed
ragas: bump ragas version, pass old rubric in RubricScore
Before ragas v0.2.11 RubricScores.rubrics wasn't being applied properly. This commit sets that as the minimum version for this library. A change in v0.2.11 from previous versions was a change in the prompt for domain specific knowledge evaluation with reference. The new prompt is hardcoded in case ragas makes any changes to their prompts again in the future. Signed-off-by: Ali Maredia <amaredia@redhat.com>
1 parent 03afb6c commit a69e557

File tree

2 files changed

+13
-5
lines changed

2 files changed

+13
-5
lines changed

requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ pandas
1010
pandas-stubs
1111
lm-eval>=0.4.4
1212
httpx
13-
ragas
13+
ragas>=0.2.11

src/instructlab/eval/ragas.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,7 @@
1111
from pydantic import BaseModel, ConfigDict, Field
1212
from ragas.evaluation import EvaluationDataset, EvaluationResult, RunConfig, evaluate
1313
from ragas.metrics import Metric
14-
from ragas.metrics._domain_specific_rubrics import ( # the rubrics we must instantiate are located inside of a file marked as private
15-
DEFAULT_WITH_REFERENCE_RUBRICS,
14+
from ragas.metrics._domain_specific_rubrics import (
1615
RubricsScore,
1716
)
1817

@@ -22,6 +21,16 @@
2221

2322
logger = setup_logger(__name__)
2423

24+
# DEFAULT_WITH_REFERENCE_RUBRICS from ragas v0.2.11.
25+
# This rubric is hardcoded in case ragas makes any changes to their DEFAULT_WITH_REFERENCE_RUBRICS in the future
26+
SCORING_RUBRICS = {
27+
"score1_description": "The response is entirely incorrect, irrelevant, or does not align with the reference in any meaningful way.",
28+
"score2_description": "The response partially matches the reference but contains major errors, significant omissions, or irrelevant information.",
29+
"score3_description": "The response aligns with the reference overall but lacks sufficient detail, clarity, or contains minor inaccuracies.",
30+
"score4_description": "The response is mostly accurate, aligns closely with the reference, and contains only minor issues or omissions.",
31+
"score5_description": "The response is fully accurate, completely aligns with the reference, and is clear, thorough, and detailed.",
32+
}
33+
2534

2635
class Sample(TypedDict):
2736
"""
@@ -256,9 +265,8 @@ def _generate_answers_from_model(
256265

257266
@staticmethod
258267
def _get_metrics() -> List[Metric]:
259-
# default set of metrics
260268
return [
261269
RubricsScore(
262-
rubrics=DEFAULT_WITH_REFERENCE_RUBRICS,
270+
rubrics=SCORING_RUBRICS,
263271
)
264272
]

0 commit comments

Comments
 (0)