Releases · mts-ai/rurage · GitHub

15 Nov 09:28

Ivan-Sidorov

Minor changes Latest

Latest

Updated repository information

Assets 4

21 Oct 22:13

Ivan-Sidorov

Metrics Ensemble

1. Automatic Ensemble Creation: Implement functionality for automatic creation of evaluation ensembles

Train your own RAG evaluation model (i. e. Gradient Boosting Classifier) based on the RuRAGE metrics
Prepare data for the RuRAGE ensemble model training right from the RuRAGE reports
Train your own RuRAGE ensemble model for any of the evaluation tasks: Correctness, Faithfulness, Relevance
Save (only basic functionality is implemented) and use trained model on the inference

2. Auto-adaptive thresholds: Implement functionality for automatic creation thresholds for features in ensemble

Automatic ensemble model optimization by selecting an optimal classification threshold(s)

3. Multiclass Labels: Extend support to work with multiclass labels.

Train and inference RAG evaluation model to predict not only binary labels, but also multiclass labels for the Correctness/Faithfulness/Relevance estimation
Automatic threshold selection supports multiclass labels

4. Detailed usage examples

Notebook for the basic RuRAGE functionality (preparing metric reports for the Correctness, Faithfulness, Relevance tasks)
Notebook for the RAG evaluation ensemble training and inferencing.

5. Different bug fixes in the metric calculation

Assets 4

10 Oct 10:43

Ivan-Sidorov

Basic functionality

Different types of RAG evaluation:

Correctness – how well does the model's response follow the context
Faithfulness – how well does the model follow the context
Relevance – how well does the model's answer relate to the question

Metrics to build evaluation

a) Deterministic Metrics:

ROUGE
BLEU
Bigram overlap Precision
Bigram overlap Recall
Bigram overlap F1
Unigram overlap Precision
Unigram overlap Recall
Unigram overlap F1

b) Model-based Metrics:

NLI Scores using Transformer models
Cosine Similarity using Transformer models

Assets 4