Skip to content

Releases: mts-ai/rurage

Minor changes

15 Nov 09:28
51c6c76

Choose a tag to compare

Updated repository information

Metrics Ensemble

21 Oct 22:13
8376282

Choose a tag to compare

1. Automatic Ensemble Creation: Implement functionality for automatic creation of evaluation ensembles

  • Train your own RAG evaluation model (i. e. Gradient Boosting Classifier) based on the RuRAGE metrics
  • Prepare data for the RuRAGE ensemble model training right from the RuRAGE reports
  • Train your own RuRAGE ensemble model for any of the evaluation tasks: Correctness, Faithfulness, Relevance
  • Save (only basic functionality is implemented) and use trained model on the inference

2. Auto-adaptive thresholds: Implement functionality for automatic creation thresholds for features in ensemble

  • Automatic ensemble model optimization by selecting an optimal classification threshold(s)

3. Multiclass Labels: Extend support to work with multiclass labels.

  • Train and inference RAG evaluation model to predict not only binary labels, but also multiclass labels for the Correctness/Faithfulness/Relevance estimation
  • Automatic threshold selection supports multiclass labels

4. Detailed usage examples

  • Notebook for the basic RuRAGE functionality (preparing metric reports for the Correctness, Faithfulness, Relevance tasks)
  • Notebook for the RAG evaluation ensemble training and inferencing.

5. Different bug fixes in the metric calculation

Basic functionality

10 Oct 10:43

Choose a tag to compare

  1. Different types of RAG evaluation:
  • Correctness – how well does the model's response follow the context
  • Faithfulness – how well does the model follow the context
  • Relevance – how well does the model's answer relate to the question
  1. Metrics to build evaluation

a) Deterministic Metrics:

  • ROUGE
  • BLEU
  • Bigram overlap Precision
  • Bigram overlap Recall
  • Bigram overlap F1
  • Unigram overlap Precision
  • Unigram overlap Recall
  • Unigram overlap F1

b) Model-based Metrics:

  • NLI Scores using Transformer models
  • Cosine Similarity using Transformer models