A comprehensive toolkit for generating synthetic test data and evaluating LLM applications with RAG capabilities.
Open Evals is a modular evaluation framework designed to help developers test and improve their AI applications. It provides tools for:
- Synthetic Data Generation: Create realistic test datasets using knowledge graphs, personas, and scenarios
- Evaluation Metrics: Pre-built and custom metrics for assessing LLM performance
- RAG Utilities: Text splitters for retrieval-augmented generation
- Evaluation Framework: Core abstractions for running comprehensive evaluations
This monorepo contains the following packages:
Core evaluation framework with abstractions for datasets, metrics, and evaluation pipelines.
pnpm add @open-evals/coreSynthetic test data generation using knowledge graphs, personas, and query synthesis.
pnpm add @open-evals/generatorRAG utilities including recursive character and markdown text splitters.
pnpm add @open-evals/ragPre-built evaluation metrics including faithfulness, factual correctness, and more.
pnpm add @open-evals/metricsThis project uses pnpm workspaces for managing multiple packages.
# Install dependencies
pnpm install
# Build all packages
pnpm build
# Run tests
pnpm testThe agents/ directory contains example implementations:
- doc-assistant: A RAG-based documentation assistant demonstrating the full stack