draft: batch/simulation fixups #890

mhordynski · 2025-12-20T06:38:45Z

No description provided.

Consolidate hotel-api fixtures into a single hotel module with an integrated service layer. Rename duet_cli.py to hotel_simulation.py and personalities.json to personas.json for clarity.

Introduce a flexible adapter system for chat response transformations. Includes ChatAdapter protocol, AdapterPipeline for chaining adapters, and built-in adapters for common transformations like JSON extraction and format detection.

Improve the core agent simulation module with better conversation handling, enhanced results tracking, scenario grouping support, and improved display utilities. Refactor simulation execution for better modularity and testability.

Introduce a comprehensive checkers system for validating agent simulation outputs. Includes LLM-based checkers for evaluating task completion, response quality, and conversation flow.

Restructure the metrics module with improved collectors, enhanced built-in metrics, and add DeepEval integration for comprehensive evaluation metrics including conversation coherence and task completion assessment.

Introduce MemoryTraceHandler and TraceAnalyzer for capturing and analyzing LLM calls, tool invocations, and token usage during simulation runs. Extract tool calls and usage from traces for more reliable data collection.

Introduce a comprehensive storage layer with key-value and SQL store abstractions. Add connection managers for SQLite and PostgreSQL databases, along with a memory-based trace handler for capturing audit traces.

Introduce a store abstraction for persisting evaluation data including scenarios, personas, simulation runs, and results. Includes file-based and key-value store implementations for flexible data persistence.

Introduce a FastAPI-based evaluation API for managing simulation scenarios, personas, and runs. Add ExecutionManager for orchestrating simulation execution with real-time progress updates via SSE.

Introduce a comprehensive evaluation dashboard built with React and TypeScript. Features include scenario management, persona configuration, simulation run execution with real-time progress, and detailed results visualization with conversation views and metrics display.

mhordynski added 11 commits December 19, 2025 09:13

refactor(examples): restructure hotel simulation example

b96388d

Consolidate hotel-api fixtures into a single hotel module with an integrated service layer. Rename duet_cli.py to hotel_simulation.py and personalities.json to personas.json for clarity.

feat(chat): add chat adapters system

ce34b23

Introduce a flexible adapter system for chat response transformations. Includes ChatAdapter protocol, AdapterPipeline for chaining adapters, and built-in adapters for common transformations like JSON extraction and format detection.

feat(evaluate): add agent simulation checkers module

4020991

Introduce a comprehensive checkers system for validating agent simulation outputs. Includes LLM-based checkers for evaluating task completion, response quality, and conversation flow.

feat(evaluate): refactor agent simulation metrics system

f1ec513

Restructure the metrics module with improved collectors, enhanced built-in metrics, and add DeepEval integration for comprehensive evaluation metrics including conversation coherence and task completion assessment.

feat(evaluate): add tracing support for agent simulation

9001bc0

Introduce MemoryTraceHandler and TraceAnalyzer for capturing and analyzing LLM calls, tool invocations, and token usage during simulation runs. Extract tool calls and usage from traces for more reliable data collection.

feat(core): add storage system with database connections

e62f480

Introduce a comprehensive storage layer with key-value and SQL store abstractions. Add connection managers for SQLite and PostgreSQL databases, along with a memory-based trace handler for capturing audit traces.

feat(evaluate): add evaluation data stores

9b8b063

Introduce a store abstraction for persisting evaluation data including scenarios, personas, simulation runs, and results. Includes file-based and key-value store implementations for flexible data persistence.

feat(evaluate): add evaluation API and execution manager

a1df1e1

Introduce a FastAPI-based evaluation API for managing simulation scenarios, personas, and runs. Add ExecutionManager for orchestrating simulation execution with real-time progress updates via SSE.

fix: small tweaks

05a41d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: batch/simulation fixups #890

draft: batch/simulation fixups #890

Uh oh!

mhordynski commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

draft: batch/simulation fixups #890

Are you sure you want to change the base?

draft: batch/simulation fixups #890

Uh oh!

Conversation

mhordynski commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants