Skip to content

Conversation

@mhordynski
Copy link
Member

No description provided.

Consolidate hotel-api fixtures into a single hotel module with an
integrated service layer. Rename duet_cli.py to hotel_simulation.py
and personalities.json to personas.json for clarity.
Introduce a flexible adapter system for chat response transformations.
Includes ChatAdapter protocol, AdapterPipeline for chaining adapters,
and built-in adapters for common transformations like JSON extraction
and format detection.
Improve the core agent simulation module with better conversation
handling, enhanced results tracking, scenario grouping support,
and improved display utilities. Refactor simulation execution for
better modularity and testability.
Introduce a comprehensive checkers system for validating agent
simulation outputs. Includes LLM-based checkers for evaluating
task completion, response quality, and conversation flow.
Restructure the metrics module with improved collectors, enhanced
built-in metrics, and add DeepEval integration for comprehensive
evaluation metrics including conversation coherence and task
completion assessment.
Introduce MemoryTraceHandler and TraceAnalyzer for capturing and
analyzing LLM calls, tool invocations, and token usage during
simulation runs. Extract tool calls and usage from traces for
more reliable data collection.
Introduce a comprehensive storage layer with key-value and SQL store
abstractions. Add connection managers for SQLite and PostgreSQL
databases, along with a memory-based trace handler for capturing
audit traces.
Introduce a store abstraction for persisting evaluation data including
scenarios, personas, simulation runs, and results. Includes file-based
and key-value store implementations for flexible data persistence.
Introduce a FastAPI-based evaluation API for managing simulation
scenarios, personas, and runs. Add ExecutionManager for orchestrating
simulation execution with real-time progress updates via SSE.
Introduce a comprehensive evaluation dashboard built with React and
TypeScript. Features include scenario management, persona configuration,
simulation run execution with real-time progress, and detailed results
visualization with conversation views and metrics display.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants