Skip to content

Commit 12010a6

Browse files
committed
Update README to remove old text
1 parent bf3cd53 commit 12010a6

File tree

1 file changed

+8
-58
lines changed

1 file changed

+8
-58
lines changed

README.md

Lines changed: 8 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,16 @@ This is a collection of reference implementations for Vector Institute's **Agent
66

77
## Reference Implementations
88

9-
This repository includes several modules, each showcasing a different aspect of agent-based RAG systems:
9+
This repository includes three modules, each demonstrating a different aspect of building and evaluating agent-based systems:
1010

11-
**3. Evals: Automated Evaluation Pipelines**
12-
Contains scripts and utilities for evaluating agent performance using LLM-as-a-judge and synthetic data generation. Includes tools for uploading datasets, running evaluations, and integrating with [Langfuse](https://langfuse.com/) for traceability.
11+
- **[Knowledge-Grounded QA Agent](implementations/knowledge_qa/README.md)**
12+
A ReAct agent using Google ADK and Google Search to answer questions grounded in live web content. Evaluated on the DeepSearchQA benchmark using LLM-as-a-judge metrics.
1313

14-
- **[3.1 LLM-as-a-Judge](src/3_evals/1_llm_judge/README.md)**
15-
Automated evaluation pipelines using LLM-as-a-judge with Langfuse integration.
16-
17-
- **[3.2 Evaluation on Synthetic Dataset](src/3_evals/2_synthetic_data/README.md)**
18-
Showcases the generation of synthetic evaluation data for testing agents.
14+
- **[AML Investigation Agent](implementations/aml_investigation/README.md)**
15+
An agent that investigates Anti-Money Laundering cases by querying a SQLite database of financial transactions via a read-only SQL tool. Produces structured analysis and supports batch evaluation.
1916

17+
- **[Report Generation Agent](implementations/report_generation/README.md)**
18+
An agent that accepts natural language queries and generates downloadable Excel reports from a relational database. Includes a Gradio demo UI and Langfuse-integrated evaluations.
2019

2120
## Getting Started
2221

@@ -32,7 +31,7 @@ Run integration tests to validate that your API keys are set up correctly.
3231
uv run --env-file .env pytest -sv tests/tool_tests/test_integration.py
3332
```
3433

35-
## Reference Implementations
34+
## Running the Implementations
3635

3736
For "Gradio App" reference implementations, running the script would print out a "public URL" ending in `gradio.live` (might take a few seconds to appear.) To access the gradio app with the full streaming capabilities, copy and paste this `gradio.live` URL into a new browser tab.
3837

@@ -53,55 +52,6 @@ ERROR:openai.agents:[non-fatal] Tracing client error 401: {
5352

5453
These warnings can be safely ignored, as they are the result of a bug in the upstream libraries. Your agent traces will be uploaded to LangFuse as configured.
5554

56-
### 3. Evals
57-
58-
Synthetic data.
59-
60-
```bash
61-
uv run --env-file .env \
62-
-m src.3_evals.2_synthetic_data.synthesize_data \
63-
--source_dataset hf://vector-institute/hotpotqa@d997ecf:train \
64-
--langfuse_dataset_name search-dataset-synthetic-20250609 \
65-
--limit 18
66-
```
67-
68-
Quantify embedding diversity of synthetic data
69-
70-
```bash
71-
# Baseline: "Real" dataset
72-
uv run \
73-
--env-file .env \
74-
-m src.3_evals.2_synthetic_data.annotate_diversity \
75-
--langfuse_dataset_name search-dataset \
76-
--run_name cosine_similarity_bge_m3
77-
78-
# Synthetic dataset
79-
uv run \
80-
--env-file .env \
81-
-m src.3_evals.2_synthetic_data.annotate_diversity \
82-
--langfuse_dataset_name search-dataset-synthetic-20250609 \
83-
--run_name cosine_similarity_bge_m3
84-
```
85-
86-
Visualize embedding diversity of synthetic data
87-
88-
```bash
89-
uv run \
90-
--env-file .env \
91-
gradio src/3_evals/2_synthetic_data/gradio_visualize_diversity.py
92-
```
93-
94-
Run LLM-as-a-judge Evaluation on synthetic data
95-
96-
```bash
97-
uv run \
98-
--env-file .env \
99-
-m src.3_evals.1_llm_judge.run_eval \
100-
--langfuse_dataset_name search-dataset-synthetic-20250609 \
101-
--run_name enwiki_weaviate \
102-
--limit 18
103-
```
104-
10555
## Requirements
10656

10757
- Python 3.12+

0 commit comments

Comments
 (0)