All notable changes to the Rhesis project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
This is the main changelog for the entire Rhesis repository. For detailed component-specific changes, please refer to:
This release includes the following component versions:
- Backend 0.6.8
- Frontend 0.6.9
- SDK 0.6.9
- Polyphemus 0.2.8
Backend v0.6.8:
- Adds multi-target review annotations for test runs, allowing reviews of turns, metrics, and test results with override logic to mutate test data based on review verdicts.
- Introduces SDK-side metric evaluation via a new
@metricdecorator and connector protocol, enabling client-side metric execution and integration with the backend. - Enhances conversation evaluation with per-turn metadata, context, and tool_calls, providing richer information for conversational metrics and judges.
- Refactors backend logging to use the standard Python
logginglibrary, improving maintainability and configurability.
Frontend v0.6.9:
- Adds multi-target review annotations, including @mention support in review comments, resizable split panels, and review overrides reflected in test results.
- Enhances conversation evaluation with per-turn metadata, context, and tool_calls, displayed in the conversation history.
- Improves project management with pagination, search, and filters in the projects list, along with disambiguation of duplicate project names.
- Upgrades dependencies to address security vulnerabilities and adds comprehensive E2E test coverage for all overview and detail pages.
SDK v0.6.9:
- Fix: Resolved issues with the Polyphemus provider, including bad requests and updated default model name.
- Feature: Upgraded Garak to v0.14 with dynamic probe generation, centralized detector registry, and code quality improvements.
- Feature: Implemented async-first model generation for improved performance, including updates to LiteLLM, RhesisLLM, and VertexAILLM providers.
- Feature: Added per-turn metadata, context (RAG sources), and tool_calls to conversation evaluation, enhancing conversational metrics and providing more detailed information in the UI.
- Feature: Implemented batch processing and retry mechanisms in the synthesizer for more efficient test set generation.
- Fix: Resolved multiple security vulnerabilities by upgrading dependencies and adding npm overrides.
Polyphemus v0.2.8:
- Upgraded to Python 3.12 with corresponding dependency updates and conflict resolution.
- Added
generate_batchendpoint for handling multiple requests. - Implemented rolling model replacement for Vertex AI deployments to ensure zero-downtime updates.
- Resolved multiple security vulnerabilities in dependencies.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.6.7
- Frontend 0.6.8
- SDK 0.6.8
Backend v0.6.7:
- Added multi-file attachment support for tests, traces, and playground, including file upload/download, format filters, and UI elements.
- Enhanced chatbot functionality with file upload support and JSON output mode.
- Improved test run detail view with metadata, file sections, and trace drawer integration.
- Added LiteLLM Proxy, Azure AI, and Azure OpenAI provider support.
Frontend v0.6.8:
- Added support for file attachments to tests, test results, and chat functionality, including UI elements for upload, download, and display.
- Enhanced test run detail view with metadata, context, pretty-printed JSON, file attachments, and navigation improvements.
- Introduced LiteLLM Proxy, Azure AI, and Azure OpenAI provider support with optional API base and version configurations.
- Improved test coverage with new unit, integration, E2E, and accessibility tests, along with fixes for various bugs and UI issues.
SDK v0.6.8:
- Added multi-file attachment support for tests, traces, and playground, including file upload, download, and display in the UI.
- Added Azure AI Studio and Azure OpenAI providers.
- Added
connect()blocking API for connector-only scripts. - Enhanced SDK model factory with registry-driven creation for easier provider management.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.6.6
- Frontend 0.6.7
- SDK 0.6.7
- Polyphemus 0.2.7
Backend v0.6.6:
- Added explicit
min_turnsparameter to test configuration, allowing control over early stopping behavior. - Improved turn budget handling in Penelope, including turn-aware prompts and deepening strategies, and preventing premature stopping.
- Enhanced metric handling, including pagination on the frontend and passing
conversation_historyto conversational metrics. - Added methods to TestSet for bulk association/disassociation of tests.
Frontend v0.6.7:
- Enhanced multi-turn evaluation with accurate turn counting (user-assistant pairs),
min_turns/max_turnsconfiguration, and SDK improvements for metric handling. - Added explicit
min_turnsparameter for early stop control in conversational tests, configurable through a range slider in the frontend. - Improved metric handling in the SDK with create-or-update support, ID preservation, and fixes for null value overwrites.
- Added methods to associate/disassociate tests with TestSets without recreating them.
- Fixed metrics page pagination to display all backend type tabs.
SDK v0.6.7:
- Added explicit
min_turnsparameter to control early stopping in tests, replacing instruction-based regex parsing. - Improved turn budget handling in Penelope, including turn-aware prompts, explicit min/max turn labeling, and preventing spurious turn count criteria in goal judging.
- Enhanced metric handling in the SDK, including create-or-update support for metric push, preservation of IDs on pull, and fixes for conversational metric evaluation.
- Added methods to TestSet for bulk association/disassociation of tests, improving test set management.
Polyphemus v0.2.7: Initial release or no significant changes.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.6.5
- Frontend 0.6.6
- SDK 0.6.6
- Polyphemus 0.2.6
Backend v0.6.5:
- Security: Mitigated OAuth callback URL host header poisoning vulnerability. Addressed multiple Dependabot security alerts by updating vulnerable dependencies (cryptography, pillow, fastmcp, redis, langgraph-checkpoint, marshmallow, virtualenv, mammoth, langchain-core).
- Polyphemus Integration: Added core Polyphemus integration including service delegation tokens, access control system with request/grant workflow, and frontend UI for access requests.
- Tracing: Implemented conversation-based tracing across SDK, backend, and frontend, enabling the linking of multi-turn conversation interactions under a shared trace_id. Added UI improvements for conversation traces, including turn navigation, edge labels, and resizable trace detail drawer.
- Test Set Type Enforcement: Enforced the requirement of
test_set_typeon test set creation across the backend, frontend, and SDK, and enforced type-matching when assigning tests to test sets.
Frontend v0.6.6:
- Added Polyphemus access request UI, including access request modal, model card UI states, and Polyphemus provider icon/logo.
- Improved trace UI with conversation tracing support, including conversation icon in trace list, type filter buttons, Conversation View tab, turn labels, and turn navigation.
- Enhanced trace graph view with turn labels on edges, progressive agent invocation count, and improved edge routing.
- Fixed security vulnerabilities in frontend transitive dependencies.
SDK v0.6.6:
- Enhanced Polyphemus integration with access control, delegation tokens, and UI.
- Improved LLM error handling with retries, logging, and fallback mechanisms.
- Added conversation-based tracing across SDK, backend, and frontend for multi-turn interactions.
- Addressed multiple security vulnerabilities by updating dependencies and migrating to PyJWT.
Polyphemus v0.2.6:
- Added rate limiting to the Polyphemus service.
- Implemented access control and delegation tokens for Polyphemus authentication, including a request/grant workflow and frontend UI.
- Deployed vLLM to Vertex AI for Polyphemus, including caching GCP credentials and adding retry logic.
- Resolved multiple security vulnerabilities by updating dependencies, including migrating from python-jose to PyJWT.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.6.4
- Frontend 0.6.5
- SDK 0.6.5
Backend v0.6.4: Key changes include: Evaluate three-tier metrics during live multi-turn execution (#1366)
- feat: evaluate three-tier metrics during live multi-turn execution
Previously, only Penelope's Goal Achievement metric was evaluated during live multi-turn test execution. Additional metrics defined in the three-tier model (behavior > test set > execution) were ignored.
Backend: add exclude_class_names to evaluate_multi_turn_metrics() and call it after Penelope's evaluation to pick up additional metrics. Frontend: show metrics table for multi-turn tests with additional metrics and use all metrics for pass/fail determination.
- fix(test): update multi-turn runner tests for additional metrics evaluation
Update test expectations to reflect that evaluate_multi_turn_metrics is now called during live execution to pick up additional three-tier metrics. Add test for merging additional metrics with Penelope results., Add AI-powered auto-configure for endpoint mappings (#1364)
- feat(backend): add auto-configure endpoint service
Add AI-powered auto-configuration that analyzes user-provided reference material (curl commands, code snippets, API docs) and generates Rhesis request/response mappings. Includes LLM-driven analysis, endpoint probing with self-correction, and comprehensive prompt engineering for conversation mode detection and platform variable mapping.
- feat(frontend): add auto-configure modal and UI
Add AutoConfigureModal with two-step stepper (input + review), integrate auto-configure button in endpoint creation form, move auth token to basic info tab, and fix project-context redirect after creation.
- fix(frontend): fix redirect after endpoint creation
Remove /endpoints suffix from project-context redirect so it navigates to the project detail page instead of a non-existent route.
- feat(sdk): add auto_configure class method to Endpoint
Add Endpoint.auto_configure() for code-first auto-configuration using the backend service, with probe control and result inspection.
- docs: update endpoint docs for auto-configure and platform variables
Add auto-configure documentation page. Update platform variable tables to include all managed fields (conversation_id, context, metadata, tool_calls, system_prompt). Fix conversation_id scope to cover all conversational endpoints, not just stateful. Add response mappings for tool_calls and metadata in provider examples.
- refactor(backend): restructure auto-configure prompts with endpoint taxonomy
Rewrite both auto_configure.jinja2 and auto_configure_correct.jinja2 around a clear hierarchical taxonomy: single-turn vs multi-turn (stateless | stateful). This replaces the previous flat three-category approach with a two-step classification that improves LLM accuracy.
- feat(backend): add API key detection and redaction for auto-configure
Prevent real API keys from being sent to the LLM during auto-configure. Backend redacts secrets (OpenAI, AWS, Google, Bearer tokens) before prompt rendering. Frontend shows a warning and blocks submission when keys are detected. Environment variable placeholders are preserved.
-
chore: update chatbot uv.lock
-
style(frontend): use theme borderRadius in AutoConfigureModal
-
fix(frontend): resolve ambiguous label queries in auto-configure tests
Use getByRole('textbox') instead of getByLabelText to avoid matching the Tooltip aria-label on the disabled Auto-configure button. Also prefix unused FAILED_RESULT fixture with _ to fix eslint warning.
- ci: ignore empty uv cache in sdk test workflow
The uv cache clean step wipes the cache directory, causing the post-job save to fail. Add ignore-nothing-to-cache to prevent this.
- fix(frontend): resolve test timeouts in auto-configure tests
Use userEvent.setup({ delay: null }) to remove inter-keystroke delays that caused CI timeout exceeding the 5s limit.
-
fix: resolve lint errors and test timeouts
-
chore: formatting and lint fixes
-
fix(backend): resolve UnmappedClassError in tests
-
fix: address PR #1364 review comments
- Update schema: request_mapping and response_mapping now support nested JSON (Dict[str, Any])
- Add SSRF protection: block cloud metadata services (169.254.0.0/16) while allowing localhost
- Fix auth token substitution: support custom headers like x-api-key, not just Authorization
- Update LLM prompts: clarify nested JSON support and warn against mapping $.id to conversation_id
- Improve UX: remove auth_token requirement for auto-configure (support open APIs)
- Fix: pre-existing TypeScript error in BehaviorsClient.tsx
Addresses all 5 issues raised by @peqy in PR review.....
Frontend v0.6.5: Key changes include: Evaluate three-tier metrics during live multi-turn execution (#1366)
- feat: evaluate three-tier metrics during live multi-turn execution
Previously, only Penelope's Goal Achievement metric was evaluated during live multi-turn test execution. Additional metrics defined in the three-tier model (behavior > test set > execution) were ignored.
Backend: add exclude_class_names to evaluate_multi_turn_metrics() and call it after Penelope's evaluation to pick up additional metrics. Frontend: show metrics table for multi-turn tests with additional metrics and use all metrics for pass/fail determination.
- fix(test): update multi-turn runner tests for additional metrics evaluation
Update test expectations to reflect that evaluate_multi_turn_metrics is now called during live execution to pick up additional three-tier metrics. Add test for merging additional metrics with Penelope results., Fix e2e projects content test selectors (#1365)
- fix(e2e): fix projects content test selectors
Add networkidle wait, use .MuiCard-root instead of [class*="ProjectCard"], and broaden create button detection to include link role.
- fix(e2e): use resilient selectors in projects content test
Restore broader /create|new/i regex for create button matching and replace generic main element fallback with projects-specific heading.....
SDK v0.6.5:
- Added AI-powered auto-configuration for endpoints, simplifying setup with intelligent analysis and probing.
- Introduced Adaptive Testing features, including a Test Explorer UI, CRUD operations for tests and topics, and output generation capabilities.
- Unified model handling with a single
get_model()function, simplifying model selection and configuration. - Improved performance and concurrency in output generation using asynchronous HTTP requests.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.6.3
- Frontend 0.6.4
- SDK 0.6.4
- Polyphemus 0.2.5
Watch the overview of this release:
- Metrics per Run + Output re-runs: https://youtu.be/vdbWfqhQpZs
- Multi-Agent Workflow Testing & Observability: https://youtu.be/OH7e_7q7_oU
Backend v0.6.3:
- Added split-view playground with test creation from conversations.
- Implemented file import for test sets with support for CSV, JSON, JSONL, and Excel formats.
- Enhanced test set execution with rescoring, last run retrieval, and metric management.
- Introduced user-configurable embedding model settings and connection testing.
- Replaced Auth0 with a native authentication system including email verification, password reset, and magic link.
Frontend v0.6.4:
- Added split-view playground with test creation from conversations, including a drawer with LLM-extracted pre-filled fields for both single-turn and multi-turn tests.
- Implemented file import for test sets, supporting CSV, JSON, JSONL, and Excel formats with column mapping and user-friendly error handling.
- Added user-configurable embedding model settings, allowing users to select their preferred embedding model and test the connection.
- Introduced test output reuse and re-scoring capabilities, enabling users to re-evaluate metrics on stored outputs without invoking endpoints, and added a "Scoring Target" dropdown to execution drawers.
- Replaced Auth0 with a native authentication system, including email verification, password reset, magic link, and OAuth support, and added refresh token rotation for improved security.
SDK v0.6.4:
- Introduces a split-view playground with test creation from conversations, including file import for test sets and flat convenience fields for multi-turn test configurations.
- Adds rescore, last_run, and metric management capabilities to TestSet, along with user-configurable embedding model settings and improved model connection testing.
- Implements a native authentication system replacing Auth0, featuring email verification, password reset, magic link login, and refresh token rotation.
- Improves error messages for model configuration and worker availability, providing clearer guidance for users to resolve issues.
Polyphemus v0.2.5:
- Enabled BetterTransformer optimization in Polyphemus, resulting in a 1.5-2x inference speedup.
- Improved error messages for model configuration and worker availability issues, providing users with clear guidance on how to resolve them.
- Enhanced model connection testing, validating model configurations with actual API calls and displaying specific error messages in the UI.
- Updated dependencies to address security vulnerabilities (CVEs) in packages like cryptography, nbconvert, langsmith, protobuf, python-multipart, and others.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.6.2
- Frontend 0.6.3
- SDK 0.6.3
Backend v0.6.2:
- Added interactive "Playground" for testing endpoints with real-time WebSocket communication, including trace linking and message copy functionality.
- Implemented Jira ticket creation directly from tasks via MCP, with optional Jira space selection during tool connection setup.
- Enhanced WebSocket retry mechanism for improved robustness, including increased reconnect attempts and visibility detection.
- Added creation dates to tests and test sets in the frontend.
Frontend v0.6.3:
- feat: Added interactive playground for testing conversational endpoints with real-time WebSocket communication and trace linking.
- feat: Added Jira ticket creation from tasks via MCP integration.
- feat: Enhanced trace visualization with a new Graph View and improved agent tracing, including I/O capture and handoffs.
- feat: Added a
./rh devcommand for simplified local development setup. - fix: Improved UI/UX with various fixes, including validation improvements and dark mode adjustments.
SDK v0.6.3:
- Adds a new interactive playground for testing conversational endpoints with real-time WebSocket communication, including trace viewing and endpoint pre-selection.
- Enhances trace visualization with a new Graph View, agent tracing support, and improved token extraction.
- Introduces JSON and JSONL import/export methods for TestSets.
- Improves WebSocket robustness with enhanced retry mechanism and fixes several SDK test issues.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.6.1
- Frontend 0.6.2
- SDK 0.6.2
Backend v0.6.1:
- Added Garak LLM vulnerability scanner integration with Redis caching for probe enumeration.
- Implemented 3-level metrics hierarchy for test execution allowing execution-time metric overrides.
- Added MCP Jira/Confluence (Atlassian Stdio), GitHub repository retrieval, and observability integrations.
- Upgraded FastAPI/Starlette, security dependencies, and optimized Docker image with CPU-only PyTorch.
Frontend v0.6.2:
- Added 3-level metrics hierarchy UI for test execution with metric source selection.
- Integrated Garak LLM vulnerability scanner import UI for test sets.
- Added MCP Atlassian, GitHub, and observability support in the UI.
- Added context and expected response fields to test run detail view.
SDK v0.6.2:
- Added Model entity with provider auto-resolution and Project entity with integration tests.
- Implemented batch processing, embedders framework, and Vertex AI support.
- Added Garak detector metric integration and MCP observability with OpenTelemetry tracing.
- Implemented continuous slow retry mode for connector resilience.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.6.0
- Frontend 0.6.0
- SDK 0.6.0
- Polyphemus 0.2.4
Backend v0.6.0:
- Enhanced connectivity with MCP (Multi Cloud Provider) including Github support and multi-transport capabilities.
- Improved observability with comprehensive OpenTelemetry integration for tracing, visualization, and filtering.
- Chatbot intent recognition added.
- Streamlined organization onboarding with integrated test execution.
Frontend v0.6.0:
- Enhanced SDK tracing with improved visualization and filtering in the UI.
- Improved MCP connection stability and added a new GitHub MCP provider.
- Integrated test execution into the organization onboarding process.
SDK v0.6.0:
- Added dependency injection support with
bindparameter in endpoint decorators. - Enhanced SDK tracing with async support, smart serialization, and improved I/O display.
- Introduced OpenTelemetry integration for basic telemetry.
- Added Chatbot Intent Recognition functionality.
Polyphemus v0.2.4:
- Added
bindparameter to endpoint decorator for dependency injection. - Introduced Bucket Model for improved data management.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.5.4
- Frontend 0.5.4
- SDK 0.5.2
- Polyphemus 0.2.3
Backend v0.5.4:
- Added a new Polyphemus provider.
- Polyphemus provider now supports schema definitions.
Frontend v0.5.4:
- Added new Polyphemus provider with schema support.
- Documentation improvements including guides and SDK metrics.
- Dependency updates for Next.js and Nodemailer.
SDK v0.5.2:
- Added a new Polyphemus provider with schema support.
- Improved MCP (likely referring to a component within the SDK) error handling and usability.
- Enhanced metric creation with support for categories and threshold operators.
- Improved generation prompts using research-backed techniques.
Polyphemus v0.2.3: Initial release or no significant changes.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.5.3
- Frontend 0.5.3
- Polyphemus 0.2.2
Backend v0.5.3:
- Improved MCP authentication and error handling, enhancing usability.
- Added unique constraint to
nano_idcolumns in the database. - Enhanced testing capabilities with endpoint connection testing and multi-turn test support.
- Notifications now separate execution status from test results in emails.
Frontend v0.5.3:
- Improved trial drawer with multi-turn support and UX enhancements.
- Added multi-turn test support in manual test writer.
- Fixed authentication errors and improved UX for MCP (Model Comparison Platform).
- Resolved cursor focus loss in title and description fields.
Polyphemus v0.2.2:
- Fix: Rate limiting now occurs after authentication, preventing unintended rate limits on unauthenticated requests.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.5.2
- Frontend 0.5.2
Backend v0.5.2:
- Added a Test Connection Tool for easier configuration and troubleshooting.
- Removed permission restrictions from entity routes.
- Fixed: Activities API response now returns valid fields.
Frontend v0.5.2:
- Security: Updated React and Next.js to address CVE-2025-55182.
- Test Runs: Added support for tags.
- Metrics: Added support for categories and threshold operators.
- Added Test Connection Tool.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.5.1
- Frontend 0.5.1
- Polyphemus 0.2.1
Backend v0.5.1:
- Modernized dashboard with MUI X charts and activity timeline.
- Improved connector output mapping with message field support.
- Added support for OpenRouter provider.
- Increased exporter timeout from 10 to 30 seconds.
Frontend v0.5.1:
- Modernized dashboard with MUI X charts and activity timeline.
- Improved MCP import and tool selector dialogs.
- Added grid state persistence to localStorage.
- Backend execution RPC fixes and UI improvements.
Polyphemus v0.2.1:
- Added "Is Verified" status to user profiles.
- Users can now be marked as verified.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.5.0
- Frontend 0.5.0
- SDK 0.5.0
- Polyphemus 0.2.0
Backend v0.5.0:
- Added support for multi-turn conversations, including preview, generation, and testing.
- Implemented tool configuration frontend and database persistence for onboarding progress.
- Introduced bidirectional SDK connector with intelligent auto-mapping and in-place test execution.
- Improved test generation and fixed various bugs related to test creation, listing, and format.
Frontend v0.5.0:
- Redesigned test results and knowledge detail pages with improved filters and UX.
- Implemented client-side search filter for test results and improved behavior-metrics relation UI.
- Introduced interactive onboarding tour and multi-turn conversation preview in test generation.
- Added tool configuration frontend, tool source type, and bidirectional SDK connector.
SDK v0.5.0:
- Added bidirectional SDK connector with intelligent auto-mapping.
- Improved multi-turn test support, including format fixes and comprehensive functionality.
- Enhanced synthesizers and model listing for providers.
- Introduced database functionality for MCP Tool.
Polyphemus v0.2.0:
- Added authentication and Google Cloud deployment support.
- Implemented a new benchmarking framework with improved model handling and integration with SDK modules.
- Introduced new metrics including cost heuristic, context retention, and refusal detection; also added summaries and reports.
- Improved judge model memory management and optimized file access.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.4.3
- Frontend 0.4.3
- SDK 0.4.2
Backend v0.4.3:
- Added centralized conversation tracking for improved multi-turn conversation handling.
Frontend v0.4.3:
- Fixes a bug that caused the frontend Docker image to fail during local deployment.
- Resolves a file ownership issue (chown bug).
SDK v0.4.2: Initial release or no significant changes.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.4.2
- Frontend 0.4.2
- SDK 0.4.1
Backend v0.4.2:
- Added support for multi-turn tests, including configuration, execution, and conversational metrics.
- Improved local development setup with Docker Compose and enhanced command-line interface.
- Introduced generic MCP (Multi-Choice Prompting) integration endpoints and user model configuration.
- Added scenarios, tags and comments infrastructure for sources, and improved test set type handling.
Frontend v0.4.2:
- Implemented multi-turn test support with configuration UI, goal display, and metrics integration.
- Simplified MCP import workflow to a single-step process with Notion integration in the knowledge page.
- Enhanced test set management with test type display and filtering.
- Improved local development setup with Docker Compose and auto-login feature.
SDK v0.4.1:
- Added support for Langchain integration and Penelope language model.
- Introduced Conversational Metrics infrastructure with Goal Achievement Judge and DeepEval integration.
- Enhanced Multi-Chain Processing (MCP) Agent with autonomous ReAct loop, improved error handling, and provider configuration.
- Added structured output support for tool calling via Pydantic schemas and improved VertexAI provider reliability.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.4.1
- Frontend 0.4.1
- SDK 0.4.0
Backend v0.4.1:
- Added comprehensive OpenTelemetry telemetry system for enhanced monitoring and analytics.
- Enhanced test generation with iteration context support and replaced document uploads with source IDs for improved source tracking.
- Integrated SDK metrics, simplified metric evaluation, and migrated database to SDK format for improved metric handling.
- Introduced soft deletion and cascade-aware restoration for entities, enhancing data management and recovery capabilities.
Frontend v0.4.1:
- Enhanced test generation with improved UI, backend support, and source context display. Replaced "Documents" terminology with "Sources" throughout the application.
- Implemented OpenTelemetry for enhanced monitoring and improved telemetry data handling.
- Added support for additional file formats (.pptx, .xlsx, .html, .htm, .zip) for knowledge source uploads with drag-and-drop functionality.
- Improved the display of test results, including error status icons, execution time for failed runs, and quick search functionality in the test runs grid.
SDK v0.4.0:
- Added Cohere and Vertex AI LLM providers, and Ollama integration.
- Enhanced AI-based test generation with iteration context support and source ID tracking.
- Improved metrics integration with Ragas and DeepEval, including updated DeepEval to v3.6.7 and new metrics.
- Refactored and improved error handling and schema support for LLM providers, including OpenAI-wrapped schemas.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.4.0
- Frontend 0.4.0
- SDK 0.3.1
Backend v0.4.0:
- Added support for user-defined LLM providers and model configuration, including metric-specific models and a dedicated model connection test service.
- Implemented soft delete functionality for users and organizations, including recycle bin management and GDPR user anonymization.
- Enhanced source handling with dynamic source types, hybrid cloud/local storage, and improved document extraction.
- Added user settings API endpoints for managing default models and other user-specific configurations.
Frontend v0.4.0:
- Enhanced Knowledge section with source upload functionality, improved source preview, and OData filtering for sources grid.
- Redesigned Test Runs detail page with a modern dashboard interface, comprehensive comparison view, and human review integration.
- Improved Models (formerly LLM Providers) management with a new edit modal, connection testing, and API key visibility toggle.
- Added advanced filtering for test results and improved overall UI consistency by using theme values and standardizing styling across components.
SDK v0.3.1:
- Added support for user-defined LLM provider generation and execution.
- Enhanced DocumentExtractor with BytesIO support.
- Added
modelparameter support to synthesizer factory and updated ParaphrasingSynthesizer.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.3.0
- Frontend 0.3.0
- SDK 0.3.0
Backend v0.3.0:
- Added persistent storage for documents with new
StorageServiceand updated document endpoints. - Implemented robust organization-level data isolation and access control across all entities and CRUD operations.
- Enhanced comment and task management features, including email notifications and improved comment counting.
- Introduced a new endpoint for generating test configurations.
Frontend v0.3.0:
- Complete rebranding initiative: Introduced new Rhesis AI brand identity with updated color palette, logos, and visual design system.
- Implemented comprehensive frontend testing infrastructure.
- Enhanced task management features, including editable task titles, improved UI consistency, and navigation improvements.
- Improved UI/UX across various components, including dashboards, metrics pages, and data grids, with a focus on theme consistency and error handling.
SDK v0.3.0:
- Added functionality to push and pull metrics, including categorical and numeric prompt metrics.
- Introduced configuration options for metrics, including enum support and backend configuration.
- Refactored metric classes for improved structure and reusability.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.2.4
- Frontend 0.2.4
- SDK 0.2.4
Backend v0.2.4:
- Added task management functionality with statuses, priorities, assignments, and email notifications.
- Integrated DocumentSynthesizer for automated document-based test generation.
- Enhanced test set attributes with document sources and metadata tracking.
- Improved database session handling and refactored routes for better performance and maintainability.
Frontend v0.2.4:
- Added "Source Documents" section to individual test detail and Test Set Details pages.
- Test sets now display document name and description.
- Project title/description updates without requiring a page reload.
- Added a send button to the comment text box.
SDK v0.2.4:
- Rewritten benchmarking framework to integrate SDK modules and improve model handling.
- Introduced
Documentdataclass andDocumentSynthesizerfor document text extraction and chunking, replacing dictionary-based document handling. - Added new LLM providers (including Ollama) and improved error handling.
- Refactored metrics, including prompt metrics, and moved them from the backend to the SDK.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.2.3
- Frontend 0.2.3
- SDK 0.2.3
Backend v0.2.3:
- Added test run stats endpoint with performance improvements.
- Implemented comment support with CRUD operations, API endpoints, and emoji reactions.
- Introduced LLM service integration with schema support and provider modes.
- Improved environment variable handling for local development and deployment flexibility.
Frontend v0.2.3:
- Added comments feature for collaboration on tests, test sets, and test runs.
- Improved metrics creation and editing workflow with visual feedback, loading states, and optimized API calls.
- Enhanced test run details with dynamic charts and a test run stats endpoint.
- Fixed tooltip visibility issues and improved performance of the test run datagrid.
SDK v0.2.3:
- Renamed and reorganized LLM provider components for clarity and improved structure.
- Added support for JSON schemas in LLM requests, enabling structured responses.
- Introduced API key handling for LLM providers.
- Removed pip from SDK dependencies and updated uv.lock.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.2.2
- Frontend 0.2.2
- SDK 0.2.2
Backend v0.2.2:
- Added document content extraction endpoint and document support to the
/test-sets/generateendpoint, enabling processing of.docx,.pptx, and.xlsxformats. - Implemented Redis authentication and updated environment configuration for enhanced security and management.
- Improved Docker configuration and startup scripts for a more robust and streamlined deployment process.
- Enhanced error handling for foreign key violations and improved consistency across backend routes, particularly for UUID validation and demographic routers.
- Added unit tests for backend components.
Frontend v0.2.2:
- Improved document upload experience with automatic metadata generation and updated supported file extensions.
- Enhanced project creation and management, including fixes for project name truncation and automatic refreshing after creation.
- Refactored and improved form validation and UI elements across the application.
- Updated Docker configuration for production mode and improved startup scripts.
SDK v0.2.2:
- Migrated document extraction from docling to markitdown, adding support for docx, pptx, and xlsx formats.
- Removed support for .url and .youtube file extensions.
- Improved code style and consistency with automated linting and formatting.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.2.1
- Frontend 0.2.1
- SDK 0.2.1
- Polyphemus 0.1.0
Backend v0.2.1:
- Added support for filtering test sets related to runs and document upload functionality via
/documents/uploadendpoint. - Enhanced test generation with optional documents parameter and improved response models.
- Added test result statistics support and "last login" functionality.
- Fixed document validation, GUID import issues, and Auth0 user handling.
Frontend v0.2.1:
- Introduced Test Results functionality for viewing and analyzing test outcomes.
- Added interfaces for handling test results statistics.
- Fixed infinite loading issues for test sets and updated contributing guides.
SDK v0.2.1:
- Added
get_field_names_from_schemamethod toBaseEntityclass for dynamic property access. - Updated default base URL for API endpoint and improved documentation.
Polyphemus v0.1.0:
- Initial release of the LLM inference and benchmarking service.
- FastAPI-based REST API with Dolphin 3.0 Llama 3.1 8B model support.
- Modular benchmarking suite and OWASP-based security test sets.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.2.0
- Frontend 0.2.0
- SDK 0.2.0
Backend v0.2.0:
- Enhanced team invitation process with improved security, validation, rate limiting, and email uniqueness checks.
- Implemented email-based notification system for test execution results.
- Improved test execution framework with sequential execution, configuration options, and enhanced task orchestration using Redis.
- Fixed issues related to OData filtering validation, JWT expiration, test set downloads, and score calculation for metrics.
Frontend v0.2.0:
- Added version information display to the frontend.
- Introduced a new team invitation flow with enhanced security and validation, including email uniqueness checks, rate limiting, and max team size.
- Improved session management with server logout upon session expiration and redirection to the home page.
- Numerous bug fixes and UI improvements across various components, including test sets, test runs, endpoints, and dark mode contrast.
SDK v0.2.0:
- Added support for
.txtfiles to theDocumentExtractor. - Introduced
documentsparameter toPromptSynthesizerfor enhanced document handling. - Added functionality for custom behaviors informed by prompts to the
PromptSynthesizer.
See individual component changelogs for detailed changes:
This release includes the following component versions:
- Backend 0.2.0
- Frontend 0.2.0
- SDK 0.2.0
Backend v0.2.0: Version 0.2.0 introduces a new team invitation feature with improved security and validation, including email uniqueness checks, rate limiting, and team size restrictions. This release also includes significant refactoring and improvements to the test execution and evaluation process, leveraging Redis for worker infrastructure and adding email-based notifications for test completion.
Frontend v0.2.0: Version 0.2.0 introduces a new component for displaying version information and makes environment variables available to the client. This release also includes improvements to team invitation security and validation, as well as numerous bug fixes and UI enhancements across various components like test sets, test runs, and endpoints.
SDK v0.2.0:
Version 0.2.0 introduces enhanced document handling with .txt file support in the DocumentExtractor and a new documents parameter for the PromptSynthesizer. This release also adds custom behavior informed by prompts, allowing for more flexible and tailored content generation.
See individual component changelogs for detailed changes:
0.1.0 - 2025-05-15
First release of the Rhesis main repo, including all components. Note that the SDK was previously developed separately and is now at version 0.1.8 internally, but is included in this repository-wide v0.1.0 release.
-
Backend v0.1.0
- Core API for test management
- Database models and schemas
- Authentication system with JWT
- CRUD operations for main entities
- API documentation with Swagger/OpenAPI
- PostgreSQL integration
- Error handling and logging
-
Frontend v0.1.0
- Next.js 15 with App Router
- Material UI v6 component library
- Authentication with NextAuth.js
- Protected routes and middleware
- Dashboard and test management interface
- Test visualization and monitoring
- Dark/light theme support
- Responsive design
-
SDK v0.1.8 (see SDK Changelog for detailed history)
- Test set management and generation capabilities
- Prompt synthesizers for test case generation
- Paraphrasing capabilities
- LLM service integration
- CLI scaffolding
- Documentation with Sphinx
- Docker containerization for all services
- CI/CD pipeline setup
- Development environment configuration
- Migrated SDK from its standalone repository (https://github.com/rhesis-ai/rhesis-sdk) into the main repo
- Updated repository structure to accommodate all components
- The SDK was previously developed and released (up to v0.1.8) in a separate repository at https://github.com/rhesis-ai/rhesis-sdk
- While the SDK is at version 0.1.8 internally, it's included in this repository-wide v0.1.0 release tag
- After this initial release, each component will follow its own versioning lifecycle with component-specific tags