Commit 356d6bf
feat: ragas evals CLI (#2086)
```bash
❯ ragas evals test_app/evals/app_eval.py --dataset rag_dataset --metrics accuracy,fail_or_pass
Running evaluation: test_app/evals/app_eval.py
Dataset: rag_dataset
Getting dataset: rag_dataset
✓ Loaded dataset with 30 rows
✓ Completed experiments successfully
╭────────────────────────── Ragas Evaluation Results ──────────────────────────╮
│ Experiment: vibrant_naur │
│ Dataset: rag_dataset (30 rows) │
╰──────────────────────────────────────────────────────────────────────────────╯
Numerical Metrics
┏━━━━━━━━━━┳━━━━━━━━━┓
┃ Metric ┃ Current ┃
┡━━━━━━━━━━╇━━━━━━━━━┩
│ accuracy │ 0.933 │
└──────────┴─────────┘
Categorical Metrics
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ Metric ┃ Category ┃ Current ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ fail or pass │ fail │ 26 │
│ │ pass │ 4 │
└──────────────┴──────────┴─────────┘
✓ Experiment results displayed
✓ Evaluation completed successfully
```
```bash
❯ ragas evals test_app/evals/app_eval.py --dataset rag_dataset --metrics accuracy,fail_or_pass --baseline suspicious_babbage
Running evaluation: test_app/evals/app_eval.py
Dataset: rag_dataset
Baseline: suspicious_babbage
Getting dataset: rag_dataset
✓ Loaded dataset with 30 rows
✓ Completed experiments successfully
Comparing against baseline: suspicious_babbage
╭────────────────────────── Ragas Evaluation Results ──────────────────────────╮
│ Experiment: pedantic_mccarthy │
│ Dataset: rag_dataset (30 rows) │
│ Baseline: suspicious_babbage │
╰──────────────────────────────────────────────────────────────────────────────╯
Numerical Metrics
┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Metric ┃ Current ┃ Baseline ┃ Delta ┃ Gate ┃
┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ accuracy │ 0.900 │ 1.000 │ ▼0.100 │ fail │
└──────────┴─────────┴──────────┴────────┴──────┘
Categorical Metrics
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Metric ┃ Category ┃ Current ┃ Baseline ┃ Delta ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ fail or pass │ fail │ 26 │ 25 │ ▲1 │
│ │ pass │ 4 │ 5 │ ▼1 │
└──────────────┴──────────┴─────────┴──────────┴───────┘
✓ Comparison completed
✓ Evaluation completed successfully
```
---------
Co-authored-by: jjmachan <[email protected]>1 parent 8445350 commit 356d6bf
File tree
7 files changed
+492
-12
lines changed- experimental
- ragas_experimental
- metric
- project/backends
7 files changed
+492
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
50 | 53 | | |
51 | 54 | | |
52 | 55 | | |
| |||
0 commit comments