Releases: mts-ai/rurage
Releases · mts-ai/rurage
Minor changes
Updated repository information
Metrics Ensemble
1. Automatic Ensemble Creation: Implement functionality for automatic creation of evaluation ensembles
- Train your own RAG evaluation model (i. e. Gradient Boosting Classifier) based on the RuRAGE metrics
- Prepare data for the RuRAGE ensemble model training right from the RuRAGE reports
- Train your own RuRAGE ensemble model for any of the evaluation tasks: Correctness, Faithfulness, Relevance
- Save (only basic functionality is implemented) and use trained model on the inference
2. Auto-adaptive thresholds: Implement functionality for automatic creation thresholds for features in ensemble
- Automatic ensemble model optimization by selecting an optimal classification threshold(s)
3. Multiclass Labels: Extend support to work with multiclass labels.
- Train and inference RAG evaluation model to predict not only binary labels, but also multiclass labels for the Correctness/Faithfulness/Relevance estimation
- Automatic threshold selection supports multiclass labels
4. Detailed usage examples
- Notebook for the basic RuRAGE functionality (preparing metric reports for the Correctness, Faithfulness, Relevance tasks)
- Notebook for the RAG evaluation ensemble training and inferencing.
5. Different bug fixes in the metric calculation
Basic functionality
- Different types of RAG evaluation:
- Correctness – how well does the model's response follow the context
- Faithfulness – how well does the model follow the context
- Relevance – how well does the model's answer relate to the question
- Metrics to build evaluation
a) Deterministic Metrics:
- ROUGE
- BLEU
- Bigram overlap Precision
- Bigram overlap Recall
- Bigram overlap F1
- Unigram overlap Precision
- Unigram overlap Recall
- Unigram overlap F1
b) Model-based Metrics:
- NLI Scores using Transformer models
- Cosine Similarity using Transformer models