|
1 | | -# Implementation Template Repository |
| 1 | +# Agent Bootcamp |
2 | 2 |
|
3 | | -This repository serves as the template for implementations created by the Vector AI |
4 | | -Engineering team. It is designed to be used as a starting point for bootcamps, labs, |
5 | | -or workshops. |
| 3 | +---------------------------------------------------------------------------------------- |
6 | 4 |
|
7 | | -## About [Implementation Name] |
| 5 | +This is a collection of reference implementations for Vector Institute's **Agentic AI Evaluation Bootcamp**. |
8 | 6 |
|
9 | | -*Add info on the implementations.* |
| 7 | +## Reference Implementations |
10 | 8 |
|
11 | | -## Repository Structure |
| 9 | +This repository includes several modules, each showcasing a different aspect of agent-based RAG systems: |
12 | 10 |
|
13 | | -- **docs/**: Contains detailed documentation, additional resources, installation guides, and setup instructions that are not covered in this README. |
14 | | -- **implementations/**: Implementations are organized by topics. Each topic has its own directory containing notebooks, and a README for guidance. |
15 | | -- **pyproject.toml**: The `pyproject.toml` file in this repository configures various build system requirements and dependencies, centralizing project settings in a standardized format. |
| 11 | +**3. Evals: Automated Evaluation Pipelines** |
| 12 | + Contains scripts and utilities for evaluating agent performance using LLM-as-a-judge and synthetic data generation. Includes tools for uploading datasets, running evaluations, and integrating with [Langfuse](https://langfuse.com/) for traceability. |
16 | 13 |
|
17 | | -### Implementations Directory |
| 14 | +- **[3.1 LLM-as-a-Judge](src/3_evals/1_llm_judge/README.md)** |
| 15 | + Automated evaluation pipelines using LLM-as-a-judge with Langfuse integration. |
18 | 16 |
|
19 | | -Each topic within the [choice of bootcamp/lab/workshop] has a dedicated directory in the `implementations/` directory. In each directory, there is a README file that provides an overview of the topic, prerequisites, and notebook descriptions. |
| 17 | +- **[3.2 Evaluation on Synthetic Dataset](src/3_evals/2_synthetic_data/README.md)** |
| 18 | + Showcases the generation of synthetic evaluation data for testing agents. |
20 | 19 |
|
21 | | -Here is the list of the covered topics: |
22 | | -- [Implementation 1] |
23 | | -- [Implementation 2] |
24 | 20 |
|
25 | 21 | ## Getting Started |
26 | 22 |
|
27 | | -To get started with this bootcamp (*Change or modify the following steps based your needs.*): |
28 | | -1. Clone this repository to your machine. |
29 | | -2. *Include setup and installation instructions here. For additional documentation, refer to the `docs/` directory.* |
30 | | -3. Begin with each topic in the `implementations/` directory, as guided by the README files. |
| 23 | +Set your API keys in `.env`. Use `.env.example` as a template. |
31 | 24 |
|
32 | | -## License |
33 | | -*Add appropriate LICENSE for this bootcamp in the main directory.* |
34 | | -This project is licensed under the terms of the [LICENSE](LICENSE.md) file located in the root directory of this repository. |
| 25 | +```bash |
| 26 | +cp -v .env.example .env |
| 27 | +``` |
35 | 28 |
|
36 | | -## Contribution |
37 | | -*Add appropriate CONTRIBUTING.md for this bootcamp in the main directory.* |
38 | | -To get started with contributing to our project, please read our [CONTRIBUTING.md](CONTRIBUTING.md) guide. |
| 29 | +Run integration tests to validate that your API keys are set up correctly. |
39 | 30 |
|
40 | | -## Contact Information |
| 31 | +```bash |
| 32 | +uv run --env-file .env pytest -sv tests/tool_tests/test_integration.py |
| 33 | +``` |
41 | 34 |
|
42 | | -For more information or help with navigating this repository, please contact [Vector AI ENG Team/Individual] at [Contact Email]. |
| 35 | +## Reference Implementations |
| 36 | + |
| 37 | +For "Gradio App" reference implementations, running the script would print out a "public URL" ending in `gradio.live` (might take a few seconds to appear.) To access the gradio app with the full streaming capabilities, copy and paste this `gradio.live` URL into a new browser tab. |
| 38 | + |
| 39 | +For all reference implementations, to exit, press "Ctrl/Control-C" and wait up to ten seconds. If you are a Mac user, you should use "Control-C" and not "Command-C". Please note that by default, the gradio web app reloads automatically as you edit the Python script. There is no need to manually stop and restart the program each time you make some code changes. |
| 40 | + |
| 41 | +You might see warning messages like the following: |
| 42 | + |
| 43 | +```json |
| 44 | +ERROR:openai.agents:[non-fatal] Tracing client error 401: { |
| 45 | + "error": { |
| 46 | + "message": "Incorrect API key provided. You can find your API key at https://platform.openai.com/account/api-keys.", |
| 47 | + "type": "invalid_request_error", |
| 48 | + "param": null, |
| 49 | + "code": "invalid_api_key" |
| 50 | + } |
| 51 | +} |
| 52 | +``` |
| 53 | + |
| 54 | +These warnings can be safely ignored, as they are the result of a bug in the upstream libraries. Your agent traces will be uploaded to LangFuse as configured. |
| 55 | + |
| 56 | +### 3. Evals |
| 57 | + |
| 58 | +Synthetic data. |
| 59 | + |
| 60 | +```bash |
| 61 | +uv run --env-file .env \ |
| 62 | +-m src.3_evals.2_synthetic_data.synthesize_data \ |
| 63 | +--source_dataset hf://vector-institute/hotpotqa@d997ecf:train \ |
| 64 | +--langfuse_dataset_name search-dataset-synthetic-20250609 \ |
| 65 | +--limit 18 |
| 66 | +``` |
| 67 | + |
| 68 | +Quantify embedding diversity of synthetic data |
| 69 | + |
| 70 | +```bash |
| 71 | +# Baseline: "Real" dataset |
| 72 | +uv run \ |
| 73 | +--env-file .env \ |
| 74 | +-m src.3_evals.2_synthetic_data.annotate_diversity \ |
| 75 | +--langfuse_dataset_name search-dataset \ |
| 76 | +--run_name cosine_similarity_bge_m3 |
| 77 | + |
| 78 | +# Synthetic dataset |
| 79 | +uv run \ |
| 80 | +--env-file .env \ |
| 81 | +-m src.3_evals.2_synthetic_data.annotate_diversity \ |
| 82 | +--langfuse_dataset_name search-dataset-synthetic-20250609 \ |
| 83 | +--run_name cosine_similarity_bge_m3 |
| 84 | +``` |
| 85 | + |
| 86 | +Visualize embedding diversity of synthetic data |
| 87 | + |
| 88 | +```bash |
| 89 | +uv run \ |
| 90 | +--env-file .env \ |
| 91 | +gradio src/3_evals/2_synthetic_data/gradio_visualize_diversity.py |
| 92 | +``` |
| 93 | + |
| 94 | +Run LLM-as-a-judge Evaluation on synthetic data |
| 95 | + |
| 96 | +```bash |
| 97 | +uv run \ |
| 98 | +--env-file .env \ |
| 99 | +-m src.3_evals.1_llm_judge.run_eval \ |
| 100 | +--langfuse_dataset_name search-dataset-synthetic-20250609 \ |
| 101 | +--run_name enwiki_weaviate \ |
| 102 | +--limit 18 |
| 103 | +``` |
| 104 | + |
| 105 | +## Requirements |
| 106 | + |
| 107 | +- Python 3.12+ |
| 108 | +- API keys as configured in `.env`. |
| 109 | + |
| 110 | +### Tidbit |
| 111 | + |
| 112 | +If you're curious about what "uv" stands for, it appears to have been more or |
| 113 | +less chosen [randomly](https://github.com/astral-sh/uv/issues/1349#issuecomment-1986451785). |
0 commit comments