Skip to content

Commit 09054db

Browse files
committed
ground truth generation adapted to langchain4j attributes metadata model. documentation update
1 parent 3ae4519 commit 09054db

File tree

3 files changed

+59
-4
lines changed

3 files changed

+59
-4
lines changed

.vscode/launch.json

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,22 @@
3838
"console": "integratedTerminal",
3939
"justMyCode": false,
4040
"stopOnEntry": false
41+
},
42+
{
43+
"name": "Debug generate_ground_truth.py",
44+
"type": "debugpy",
45+
"request": "launch",
46+
"program": "${workspaceFolder}/evals/generate_ground_truth.py",
47+
"python": "${workspaceFolder}/.evalenv/bin/python",
48+
"cwd": "${workspaceFolder}",
49+
"args": [
50+
"--env-file-path", "./deploy/aca",
51+
"--numquestions", "1",
52+
"--numsearchdocs", "5"
53+
],
54+
"console": "integratedTerminal",
55+
"justMyCode": false,
56+
"stopOnEntry": false
4157
}
4258
]
43-
}
59+
}

docs/aca/evaluation.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ pip install -r evals/requirements.txt
6060
Generate ground truth data by running the following command:
6161

6262
```bash
63-
python evals/generate_ground_truth.py --numquestions=200 --numsearchdocs=1000
63+
python evals/generate_ground_truth.py --numquestions=200 --numsearchdocs=1000 --env-file-path ./deploy/aca
6464
```
6565

6666
The options are:
@@ -74,6 +74,7 @@ The options are:
7474

7575
Review the generated data in `evals/ground_truth.jsonl` after running that script, removing any question/answer pairs that don't seem like realistic user input.
7676
77+
7778
## Run bulk evaluation
7879
7980
Review the configuration in `evals/eval_config.json` to ensure that everything is correctly setup. You may want to adjust the metrics used. See [the ai-rag-chat-evaluator README](https://github.com/Azure-Samples/ai-rag-chat-evaluator) for more information on the available metrics.
@@ -94,10 +95,26 @@ The options are:
9495
For more details about how to run locally the chat api see [Local Development with IntelliJ](local-development-intellij.md#running-the-spring-boot-chat-api-locally).
9596
🕰️ This may take a long time, possibly several hours, depending on the number of ground truth questions, and the TPM capacity of the evaluation model, and the number of GPT metrics requested.
9697
98+
> [!IMPORTANT]
99+
> Ground truth data is generated using a knowledge graph created out of the same search index used by the rag flow. It's based on [RAGAS evaluation framework](https://docs.ragas.io/en/stable/).If you want to learn more about data generation approach you can check [Tesset Generation for RAG](https://docs.ragas.io/en/stable/concepts/test_data_generation/rag/)
100+
97101
## Review the evaluation results
98102

99103
The evaluation script will output a summary of the evaluation results, inside the `evals/results` directory.
100104

105+
The evaluation uses the following default metrics (as configured in `evaluate_config.json`), with results available in the `summary.json` file:
106+
107+
* **gpt_groundedness**: Measures how well the answer is grounded in the retrieved context. Returns a pass rate and mean rating (1-5 scale).
108+
* **gpt_relevance**: Evaluates the relevance of the answer to the user's question. Returns a pass rate and mean rating (1-5 scale).
109+
* **answer_length**: Tracks the length of generated answers in characters (mean, max, min values).
110+
* **latency**: Measures response time in seconds for each question (mean, max, min values).
111+
* **citations_matched**: Counts how many answers include properly matched citations from the source documents.
112+
* **any_citation**: Tracks whether answers include any citations at all.
113+
114+
> [!IMPORTANT]
115+
> **gpt_groundedness** and **gpt_relevance** are built-in metrics provided by [Azure AI evaluation sdk](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/evaluate-sdk).
116+
**answer length**, **latency**, **citations matched** and **any_citation** are custom metrics defined in [evaluate.py](../../evals/evaluate.py) or from [ai-rag-chat-evaluator project](https://github.com/Azure-Samples/ai-rag-chat-evaluator/blob/main/src/evaltools/eval/evaluate_metrics/code_metrics.py)
117+
101118
You can see a summary of results across all evaluation runs by running the following command:
102119
103120
```bash

evals/generate_ground_truth.py

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,22 @@ def generate_ground_truth_ragas(num_questions=200, num_search_documents=None, kg
9696
nodes = []
9797
for doc in search_docs:
9898
content = doc["content"]
99-
citation = doc["sourcepage"]
99+
100+
# Extract citation from metadata attributes
101+
if "metadata" in doc and "attributes" in doc["metadata"]:
102+
attributes = doc["metadata"]["attributes"]
103+
# Convert list of attributes to dictionary for easier lookup
104+
attr_dict = {attr["key"]: attr["value"] for attr in attributes}
105+
106+
file_name = attr_dict.get("file_name")
107+
index = attr_dict.get("page_number")
108+
109+
if file_name:
110+
if index is not None:
111+
citation = f"{file_name}#page={index}"
112+
else:
113+
citation = file_name
114+
100115
node = Node(
101116
type=NodeType.DOCUMENT,
102117
properties={
@@ -147,15 +162,22 @@ def generate_ground_truth_ragas(num_questions=200, num_search_documents=None, kg
147162
level=logging.WARNING, format="%(message)s", datefmt="[%X]", handlers=[RichHandler(rich_tracebacks=True)]
148163
)
149164
logger.setLevel(logging.INFO)
150-
load_azd_env()
165+
151166

152167
parser = argparse.ArgumentParser(description="Generate ground truth data using AI Search index and RAGAS.")
153168
parser.add_argument("--numsearchdocs", type=int, help="Specify the number of search results to fetch")
154169
parser.add_argument("--numquestions", type=int, help="Specify the number of questions to generate.", default=200)
155170
parser.add_argument("--kgfile", type=str, help="Specify the path to an existing knowledge graph file")
171+
parser.add_argument("--env-file-path", type=str, help="Specify the path to the environment file.")
156172

157173
args = parser.parse_args()
158174

175+
# Load environment variables from the specified file path or default
176+
if args.env_file_path:
177+
load_azd_env(args.env_file_path)
178+
else:
179+
load_azd_env()
180+
159181
generate_ground_truth_ragas(
160182
num_search_documents=args.numsearchdocs, num_questions=args.numquestions, kg_file=args.kgfile
161183
)

0 commit comments

Comments
 (0)