Skip to content

Commit e346349

Browse files
committed
fix readme
1 parent c562b2f commit e346349

File tree

2 files changed

+8
-0
lines changed

2 files changed

+8
-0
lines changed

scripts/evaluation/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Currently we have 2 types of evaluations.
1111
- QnAs were generated from OCP docs by LLMs. It is possible that some of the questions/answers are not entirely correct. We are constantly trying to verify both Questions & Answers manually. If you find any QnA pair to be modified or removed, please create a PR.
1212
- OLS API should be ready/live with all the required provider+model configured.
1313
- It is possible that we want to run both consistency and model evaluation together. To avoid multiple API calls for same query, *model* evaluation first checks .csv file generated by *consistency* evaluation. If response is not present in csv file, then only we call API to get the response.
14+
- User needs to install python `matplotlib`, and `rouge_score` before running the evaluation.
1415

1516
### e2e test case
1617

@@ -21,6 +22,11 @@ These evaluations are also part of **e2e test cases**. Currently *consistency* e
2122
python -m scripts.evaluation.driver
2223
```
2324

25+
### Sample run command
26+
```
27+
OPENAI_API_KEY=IGNORED python -m scripts.evaluation.driver --qna_pool_file ./scripts/evaluation/eval_data/aap-sample.parquet --eval_provider_model_id my_rhoai+granite3-8b --eval_metrics answer_relevancy answer_similarity_llm cos_score rougeL_precision --eval_modes vanilla --judge_model granite3-8b --judge_provider my_rhoai3 --eval_query_ids qna1
28+
```
29+
2430
### Input Data/QnA pool
2531
[Json file](eval_data/question_answer_pair.json)
2632

scripts/evaluation/utils/constants.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
"azure_openai+gpt-4o": ("azure_openai", "gpt-4o"),
1212
"ollama+llama3.1:latest": ("ollama", "llama3.1:latest"),
1313
"ollama+mistral": ("ollama", "mistral"),
14+
"my_rhoai+granite3-8b": ("my_rhoai", "granite3-8b"),
15+
"my_rhoai3+granite3-1-8b": ("my_rhoai3", "granite3-1-8b"),
1416
}
1517

1618
NON_LLM_EVALS = {

0 commit comments

Comments
 (0)