Skip to content

Commit b3dd6fe

Browse files
committed
Update pyproject.toml
1 parent 7e03c84 commit b3dd6fe

File tree

3 files changed

+3993
-302
lines changed

3 files changed

+3993
-302
lines changed

README.md

Lines changed: 98 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,113 @@
1-
# Implementation Template Repository
1+
# Agent Bootcamp
22

3-
This repository serves as the template for implementations created by the Vector AI
4-
Engineering team. It is designed to be used as a starting point for bootcamps, labs,
5-
or workshops.
3+
----------------------------------------------------------------------------------------
64

7-
## About [Implementation Name]
5+
This is a collection of reference implementations for Vector Institute's **Agentic AI Evaluation Bootcamp**.
86

9-
*Add info on the implementations.*
7+
## Reference Implementations
108

11-
## Repository Structure
9+
This repository includes several modules, each showcasing a different aspect of agent-based RAG systems:
1210

13-
- **docs/**: Contains detailed documentation, additional resources, installation guides, and setup instructions that are not covered in this README.
14-
- **implementations/**: Implementations are organized by topics. Each topic has its own directory containing notebooks, and a README for guidance.
15-
- **pyproject.toml**: The `pyproject.toml` file in this repository configures various build system requirements and dependencies, centralizing project settings in a standardized format.
11+
**3. Evals: Automated Evaluation Pipelines**
12+
Contains scripts and utilities for evaluating agent performance using LLM-as-a-judge and synthetic data generation. Includes tools for uploading datasets, running evaluations, and integrating with [Langfuse](https://langfuse.com/) for traceability.
1613

17-
### Implementations Directory
14+
- **[3.1 LLM-as-a-Judge](src/3_evals/1_llm_judge/README.md)**
15+
Automated evaluation pipelines using LLM-as-a-judge with Langfuse integration.
1816

19-
Each topic within the [choice of bootcamp/lab/workshop] has a dedicated directory in the `implementations/` directory. In each directory, there is a README file that provides an overview of the topic, prerequisites, and notebook descriptions.
17+
- **[3.2 Evaluation on Synthetic Dataset](src/3_evals/2_synthetic_data/README.md)**
18+
Showcases the generation of synthetic evaluation data for testing agents.
2019

21-
Here is the list of the covered topics:
22-
- [Implementation 1]
23-
- [Implementation 2]
2420

2521
## Getting Started
2622

27-
To get started with this bootcamp (*Change or modify the following steps based your needs.*):
28-
1. Clone this repository to your machine.
29-
2. *Include setup and installation instructions here. For additional documentation, refer to the `docs/` directory.*
30-
3. Begin with each topic in the `implementations/` directory, as guided by the README files.
23+
Set your API keys in `.env`. Use `.env.example` as a template.
3124

32-
## License
33-
*Add appropriate LICENSE for this bootcamp in the main directory.*
34-
This project is licensed under the terms of the [LICENSE](LICENSE.md) file located in the root directory of this repository.
25+
```bash
26+
cp -v .env.example .env
27+
```
3528

36-
## Contribution
37-
*Add appropriate CONTRIBUTING.md for this bootcamp in the main directory.*
38-
To get started with contributing to our project, please read our [CONTRIBUTING.md](CONTRIBUTING.md) guide.
29+
Run integration tests to validate that your API keys are set up correctly.
3930

40-
## Contact Information
31+
```bash
32+
uv run --env-file .env pytest -sv tests/tool_tests/test_integration.py
33+
```
4134

42-
For more information or help with navigating this repository, please contact [Vector AI ENG Team/Individual] at [Contact Email].
35+
## Reference Implementations
36+
37+
For "Gradio App" reference implementations, running the script would print out a "public URL" ending in `gradio.live` (might take a few seconds to appear.) To access the gradio app with the full streaming capabilities, copy and paste this `gradio.live` URL into a new browser tab.
38+
39+
For all reference implementations, to exit, press "Ctrl/Control-C" and wait up to ten seconds. If you are a Mac user, you should use "Control-C" and not "Command-C". Please note that by default, the gradio web app reloads automatically as you edit the Python script. There is no need to manually stop and restart the program each time you make some code changes.
40+
41+
You might see warning messages like the following:
42+
43+
```json
44+
ERROR:openai.agents:[non-fatal] Tracing client error 401: {
45+
"error": {
46+
"message": "Incorrect API key provided. You can find your API key at https://platform.openai.com/account/api-keys.",
47+
"type": "invalid_request_error",
48+
"param": null,
49+
"code": "invalid_api_key"
50+
}
51+
}
52+
```
53+
54+
These warnings can be safely ignored, as they are the result of a bug in the upstream libraries. Your agent traces will be uploaded to LangFuse as configured.
55+
56+
### 3. Evals
57+
58+
Synthetic data.
59+
60+
```bash
61+
uv run --env-file .env \
62+
-m src.3_evals.2_synthetic_data.synthesize_data \
63+
--source_dataset hf://vector-institute/hotpotqa@d997ecf:train \
64+
--langfuse_dataset_name search-dataset-synthetic-20250609 \
65+
--limit 18
66+
```
67+
68+
Quantify embedding diversity of synthetic data
69+
70+
```bash
71+
# Baseline: "Real" dataset
72+
uv run \
73+
--env-file .env \
74+
-m src.3_evals.2_synthetic_data.annotate_diversity \
75+
--langfuse_dataset_name search-dataset \
76+
--run_name cosine_similarity_bge_m3
77+
78+
# Synthetic dataset
79+
uv run \
80+
--env-file .env \
81+
-m src.3_evals.2_synthetic_data.annotate_diversity \
82+
--langfuse_dataset_name search-dataset-synthetic-20250609 \
83+
--run_name cosine_similarity_bge_m3
84+
```
85+
86+
Visualize embedding diversity of synthetic data
87+
88+
```bash
89+
uv run \
90+
--env-file .env \
91+
gradio src/3_evals/2_synthetic_data/gradio_visualize_diversity.py
92+
```
93+
94+
Run LLM-as-a-judge Evaluation on synthetic data
95+
96+
```bash
97+
uv run \
98+
--env-file .env \
99+
-m src.3_evals.1_llm_judge.run_eval \
100+
--langfuse_dataset_name search-dataset-synthetic-20250609 \
101+
--run_name enwiki_weaviate \
102+
--limit 18
103+
```
104+
105+
## Requirements
106+
107+
- Python 3.12+
108+
- API keys as configured in `.env`.
109+
110+
### Tidbit
111+
112+
If you're curious about what "uv" stands for, it appears to have been more or
113+
less chosen [randomly](https://github.com/astral-sh/uv/issues/1349#issuecomment-1986451785).

pyproject.toml

Lines changed: 64 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,76 @@
11
[project]
2-
name = "aieng-template-implementation"
2+
name = "agentic-ai-evaluation-bootcamp-202602"
33
version = "0.1.0"
4-
description = ""
5-
authors = [ {name = "Vector AI Engineering", email = "ai_engineering@vectorinstitute.ai"}]
4+
description = "Vector Institute Agentic AI Evaluation Bootcamp 202602"
65
readme = "README.md"
7-
repository = "https://github.com/VectorInstitute/aieng-template-implementation"
6+
authors = [ {name = "Vector AI Engineering", email = "ai_engineering@vectorinstitute.ai"}]
7+
license = "Apache-2.0"
88
requires-python = ">=3.12"
99
dependencies = [
10-
"aieng-topic-impl",
11-
"jupyterlab>=4.4.8",
12-
"pip>=25.3",
10+
"aiohttp>=3.12.14",
11+
"beautifulsoup4>=4.13.4",
12+
"datasets>=3.6.0",
13+
"e2b-code-interpreter>=1.5.2",
14+
"gradio>=5.37.0",
15+
"langfuse>=3.1.3",
16+
"lxml>=6.0.0",
17+
"nest-asyncio>=1.6.0",
18+
"numpy<2.3.0",
19+
"openai>=1.93.1",
20+
"openai-agents>=0.1.0",
21+
"plotly>=6.2.0",
22+
"pydantic>=2.11.7",
23+
"pydantic-ai-slim[logfire]>=0.3.7",
24+
"pytest-asyncio>=0.25.2",
25+
"scikit-learn>=1.7.0",
26+
"weaviate-client>=4.15.4",
1327
]
1428

29+
[build-system]
30+
requires = ["hatchling"]
31+
build-backend = "hatchling.build"
32+
33+
[tool.hatch.build.targets.wheel]
34+
packages = ["src"]
35+
1536
[dependency-groups]
1637
dev = [
38+
"aieng-platform-onboard>=0.3.6",
1739
"codecov>=2.1.13",
18-
"mypy>=1.14.1",
40+
"ipykernel>=6.29.5",
41+
"ipython>=9.4.0",
42+
"ipywidgets>=8.1.7",
43+
"jupyter>=1.1.1",
44+
"jupyterlab>=4.4.8",
1945
"nbqa>=1.9.1",
2046
"pip-audit>=2.7.3",
2147
"pre-commit>=4.1.0",
2248
"pytest>=8.3.4",
2349
"pytest-asyncio>=0.25.2",
2450
"pytest-cov>=6.0.0",
2551
"pytest-mock>=3.14.0",
26-
"ruff>=0.9.2",
52+
"ruff>=0.12.2",
53+
"transformers>=4.54.1",
2754
]
28-
29-
[tool.uv.workspace]
30-
members = [
31-
"aieng-topic-impl",
55+
docs = [
56+
"jinja2>=3.1.6", # Pinning version to address vulnerability GHSA-cpwx-vrp4-4pq7
57+
"mkdocs>=1.6.0",
58+
"mkdocs-material>=9.6.15",
59+
"mkdocstrings>=0.24.1",
60+
"mkdocstrings-python>=1.16.12",
61+
"ipykernel>=6.29.5",
62+
"ipython>=9.4.0",
63+
]
64+
web-search = [
65+
"google-cloud-firestore>=2.21.0",
66+
"fastapi[standard]>=0.116.1",
67+
"google-genai>=1.46.0",
68+
"simplejson>=3.20.2",
3269
]
3370

34-
[tool.uv.sources]
35-
aieng-topic-impl = { workspace = true }
36-
37-
[tool.mypy]
38-
ignore_missing_imports = true
39-
install_types = true
40-
pretty = true
41-
namespace_packages = true
42-
explicit_package_bases = true
43-
non_interactive = true
44-
warn_unused_configs = true
45-
allow_any_generics = false
46-
allow_subclassing_any = false
47-
allow_untyped_calls = false
48-
allow_untyped_defs = false
49-
allow_incomplete_defs = false
50-
check_untyped_defs = true
51-
allow_untyped_decorators = false
52-
warn_redundant_casts = true
53-
warn_unused_ignores = true
54-
warn_return_any = true
55-
implicit_reexport = false
56-
strict_equality = true
57-
extra_checks = true
71+
# Default dependency groups to be installed
72+
[tool.uv]
73+
default-groups = ["dev", "docs"]
5874

5975
[tool.ruff]
6076
include = ["*.py", "pyproject.toml", "*.ipynb"]
@@ -94,11 +110,16 @@ ignore = [
94110
"PLR2004", # Replace magic number with named constant
95111
"PLR0913", # Too many arguments
96112
"COM812", # Missing trailing comma
113+
"N999", # Number in module names.
114+
"ERA001", # Commented out lines.
97115
]
98116

99117
# Ignore import violations in all `__init__.py` files.
118+
# Ignore missing docstring in `__init__.py` files
100119
[tool.ruff.lint.per-file-ignores]
101-
"__init__.py" = ["E402", "F401", "F403", "F811"]
120+
"__init__.py" = ["E402", "F401", "F403", "F811", "D104"]
121+
122+
102123

103124
[tool.ruff.lint.pep8-naming]
104125
ignore-names = ["X*", "setUp"]
@@ -116,3 +137,8 @@ max-doc-length = 88
116137
markers = [
117138
"integration_test: marks tests as integration tests",
118139
]
140+
141+
[tool.coverage]
142+
[tool.coverage.run]
143+
source=["src/"]
144+
omit=["tests/*", "*__init__.py"]

0 commit comments

Comments
 (0)