Releases: rhesis-ai/rhesis
Releases · rhesis-ai/rhesis
Platform v0.6.8
Platform Release
This release includes the following component versions:
- Backend 0.6.7
- Frontend 0.6.8
- SDK 0.6.8
Summary of Changes
Backend v0.6.7:
- Added multi-file attachment support for tests, traces, and playground, including file upload/download, format filters, and UI elements.
- Enhanced chatbot functionality with file upload support and JSON output mode.
- Improved test run detail view with metadata, file sections, and trace drawer integration.
- Added LiteLLM Proxy, Azure AI, and Azure OpenAI provider support.
Frontend v0.6.8:
- Added support for file attachments to tests, test results, and chat functionality, including UI elements for upload, download, and display.
- Enhanced test run detail view with metadata, context, pretty-printed JSON, file attachments, and navigation improvements.
- Introduced LiteLLM Proxy, Azure AI, and Azure OpenAI provider support with optional API base and version configurations.
- Improved test coverage with new unit, integration, E2E, and accessibility tests, along with fixes for various bugs and UI issues.
SDK v0.6.8:
- Added multi-file attachment support for tests, traces, and playground, including file upload, download, and display in the UI.
- Added Azure AI Studio and Azure OpenAI providers.
- Added
connect()blocking API for connector-only scripts. - Enhanced SDK model factory with registry-driven creation for easier provider management.
See individual component changelogs for detailed changes:
SDK v0.6.8
Added
- Added multi-file attachment support for tests, traces, and playground. This includes the ability to upload, download, and delete files associated with tests and test results.
- Added file format filters (to_anthropic, to_openai, to_gemini) for transforming input files into provider-specific content formats.
- Added file upload support to the
/chatendpoint, allowing users to include files in chatbot conversations. - Added file attachment support to WebSocket communication, enabling file transfers in chat applications.
- Added file upload button and drag-and-drop support to Playground chat.
- Added file download functionality to FileAttachmentList and MessageBubble components.
- Added file attachment support to multi-turn tests in Penelope.
- Added File entity to the SDK with upload, download, and delete capabilities.
- Added Azure AI Studio and Azure OpenAI providers as new LLM providers.
- Added
connect()blocking API for connector-only scripts. - Added JSON and Excel file upload support to the Playground.
- Added metadata and context as collapsible sections in the Test Run detail view.
- Added trace drawer and file sections to the Test Run detail view.
- Added required field validation to the metric creation form.
Changed
- Renamed
run_connectortoconnectin the SDK. - Replaced test type magic strings with constants.
- Moved the file attachment button inside the text input in the Playground chat.
- Enhanced the constructors of OpenAILLM and OpenRouterLLM to accept additional keyword arguments.
- Updated Node.js version to 24 in CI configurations and Dockerfiles.
- Standardized file data field name from
content_base64todata. - Increased WebSocket max message size from 64KB to 10MB.
Fixed
- Fixed an issue where
test_set_type_idwas not included when creating test sets from the manual writer. - Fixed test set association and navigation issues in the manual test writer.
- Fixed focus loss in metric evaluation steps TextFields.
- Fixed lazy-load failures in mixin relationship properties in the backend.
- Fixed an issue where the test_type_id was overwritten on test updates.
- Fixed an issue where the polyphemus_access was null in user settings.
- Fixed an issue where the websocket tests were hanging due to incorrect MAX_MESSAGE_SIZE.
- Fixed an issue where MetricDataFactory was generating invalid metric test data.
- Fixed an issue where MarkdownContent was crashing when rendering JSON objects.
- Resolved TypeScript errors in model providers and test creation.
- Resolved focus loss in metric evaluation steps TextFields.
- Rebased file migration on litellm provider migration.
- Handled optional prompt_id in test components.
- Prevented test_type_id overwrite on update.
- Addressed PR review feedback.
- Added default for user_id in TestRunCreate schema.
Removed
- Removed the
[DEBUG]prefix from API error logs.
Frontend v0.6.8
Added
- Added frontend E2E CRUD tests, unit tests, Firefox coverage, and accessibility tests.
- Added Playwright CRUD interaction specs for tokens, test sets, projects, and endpoints.
- Added TestSetsPage and TokensPage page objects for E2E tests.
- Added Firefox browser project to playwright.config.ts.
- Added unit tests for TokensClient, ProjectsClient, TestSetsClient.
- Installed jest-axe and added accessibility tests for common components.
- Added multi-file attachment support for tests, traces, and playground, including file upload/download/delete endpoints and UI components.
- Added file format filters and trace file linking for endpoint invocations.
- Added file upload support to the
/chatendpoint. - Added file attachment UI to Playground Chat.
- Added file download to FileAttachmentList and MessageBubble.
- Added file attachment support to multi-turn tests in Penelope.
- Added file attachment support to SDK entities.
- Added metadata and context as collapsible sections in test run detail view.
- Added "Go to Test" button linking to test detail page in test run detail view.
- Added trace drawer and file sections to test run detail view.
- Added required field validation to metric creation form.
- Added LiteLLM Proxy, Azure AI, and Azure OpenAI provider support.
- Added hook and component tests, expanding MSW infrastructure.
- Added API client integration tests for BaseApiClient, TestsClient, TestRunsClient, and EndpointsClient.
- Added page-level integration tests for grid components.
- Added detail-page integration tests.
Changed
- Replaced test type magic strings with constants.
- Renamed file data field from
content_base64todatafor consistency. - Moved file attachment button inside text input in Playground Chat.
- Enhanced test run detail view with metadata, context, and JSON content display.
- Updated Node.js version to 24 in CI configurations and Dockerfiles.
- Moved e2e tests to
tests/e2e/and dropped coverage threshold. - Updated
@icons-pack/react-simple-iconsto v13.12.0. - Moved file position query to CRUD layer.
Fixed
- Used correct TestResultStatus values in accessibility tests.
- Fixed CI failures in E2E and accessibility tests.
- Fixed remaining E2E test failures related to onboarding checklist, DataGrid aria-label matching, and endpoint navigation.
- Fixed test-sets E2E tests targeting MUI Select trigger.
- Created
.authdirectory before writingstorageStateto prevent auth setup failures. - Included
test_set_type_idwhen creating test sets from manual writer. - Fixed manual test writer test set association and navigation.
- Resolved focus loss in metric evaluation steps TextFields.
- Rebased file migration on litellm provider migration.
- Used theme borderRadius and passed missing sessionToken.
- Added default for
user_idin TestRunCreate schema. - Resolved TypeScript errors in model providers and test creation.
- Addressed PR review feedback for file filters and upload positions.
- Handled optional
prompt_idin test components. - Handled null
polyphemus_accessin user settings. - Patched
MAX_MESSAGE_SIZEin websocket tests to prevent hang. - Added required
score_typefields to metric test data factories. - Handled non-string content in MarkdownContent.
- Prevented input focus loss from inline component definitions.
- Resolved EndpointFormAutoConfigure test timeouts.
- Lowered branch coverage threshold to match CI measurement.
- Resolved TypeScript type errors in test fixtures.
- Handled invalid test run ID gracefully with
notFound(). - Used main content locator in POM
waitForContentinstead of grid. - Updated
auth.setup.tsstorageStatepath totests/e2e/.auth/. - Simplified conditional checks for optional API key in models.
- Corrected formatting in ConnectionDialog for azure_ai provider.
- Handled lazy-load failures in mixin relationship properties.
- Used CRUD layer for test set attribute updates.
- Removed
[DEBUG]prefix from API error logs.
Removed
- Removed outdated
.nvmrcfile specifying Node.js version 20.19.5.
Backend v0.6.7
Added
- Added multi-file attachment support for tests, traces, and playground.
- Added file upload and removal functionality to the frontend for tests.
- Added file format filters and trace file linking for endpoint invocations.
- Added file upload support to the
/chatendpoint. - Added file attachment UI to Playground chat.
- Added file download functionality to
FileAttachmentListandMessageBubble. - Added file attachment support to multi-turn tests in Penelope.
- Added file attachment support to SDK entities.
- Added JSON and Excel file upload support to the Playground.
- Added metadata and context as collapsible sections in the Test Run detail view.
- Added trace drawer and file sections to the Test Run detail view.
- Added required field validation to the metric creation form.
- Added Azure AI Studio and Azure OpenAI provider support.
- Added optional parameters for
api_baseandapi_versionto LiteLLM and its derived classes.
Changed
- Renamed file data field from
content_base64todatafor consistency. - Moved the file attachment button inside the text input in Playground chat.
- Enhanced the Test Run detail view with improved UI and information display.
- Updated Node.js version to 24 in CI configurations and Dockerfiles.
- Updated SDK to use
exclude_none=TrueinBaseEntity.push()
Fixed
- Fixed an issue where
test_set_type_idwas missing when creating test sets from the manual writer. - Fixed manual test writer test set association and navigation issues.
- Fixed focus loss in metric evaluation steps TextFields.
- Fixed an issue where lazy-load failures occurred in mixin relationship properties after deletion.
- Fixed an issue where raw db.query() was used in update_test_set_attributes.
- Fixed the file migration to rebase on the litellm provider migration.
- Fixed TypeScript errors in model providers and test creation.
- Fixed an issue where Jinja file filters returned JSON strings instead of Python objects.
- Fixed a test key mismatch in output_providers and results.
- Fixed an issue where
polyphemus_accesscould be null in user settings. - Fixed an issue where the websocket tests hung due to an incorrect message size limit.
- Fixed metric test data factories to include required
score_typefields. - Fixed an issue where the Markdown component crashed when endpoint responses contained JSON objects.
- Fixed an issue where
test_type_idwas overwritten on test update. - Fixed an issue where
test_set_idwas present in the TestBase schema. - Handled optional
prompt_idin test components.
Removed
- Removed the
[DEBUG]prefix from API error logs.
Platform v0.6.7
Platform Release
This release includes the following component versions:
- Backend 0.6.6
- Frontend 0.6.7
- SDK 0.6.7
- Polyphemus 0.2.7
Summary of Changes
Backend v0.6.6:
- Added explicit
min_turnsparameter to test configuration, allowing control over early stopping behavior. - Improved turn budget handling in Penelope, including turn-aware prompts and deepening strategies, and preventing premature stopping.
- Enhanced metric handling, including pagination on the frontend and passing
conversation_historyto conversational metrics. - Added methods to TestSet for bulk association/disassociation of tests.
Frontend v0.6.7:
- Enhanced multi-turn evaluation with accurate turn counting (user-assistant pairs),
min_turns/max_turnsconfiguration, and SDK improvements for metric handling. - Added explicit
min_turnsparameter for early stop control in conversational tests, configurable through a range slider in the frontend. - Improved metric handling in the SDK with create-or-update support, ID preservation, and fixes for null value overwrites.
- Added methods to associate/disassociate tests with TestSets without recreating them.
- Fixed metrics page pagination to display all backend type tabs.
SDK v0.6.7:
- Added explicit
min_turnsparameter to control early stopping in tests, replacing instruction-based regex parsing. - Improved turn budget handling in Penelope, including turn-aware prompts, explicit min/max turn labeling, and preventing spurious turn count criteria in goal judging.
- Enhanced metric handling in the SDK, including create-or-update support for metric push, preservation of IDs on pull, and fixes for conversational metric evaluation.
- Added methods to TestSet for bulk association/disassociation of tests, improving test set management.
Polyphemus v0.2.7:
Initial release or no significant changes.
See individual component changelogs for detailed changes:
SDK v0.6.7
Added
- Added explicit
min_turnsparameter for early stop control in test configurations. - Added test association methods (
add_tests(),remove_tests()) toTestSetfor bulk linking tests to test sets. - Added
min_turnsandmax_turnsto import/export functionality (CSV, JSON, JSONL) and synthesizer.
Changed
- Replaced instruction-based regex parsing for minimum turns with an explicit
min_turnsparameter onexecute_test(). - Replaced the max turns input on the frontend with a turn configuration range slider, allowing control of both
min_turnsandmax_turns. - Standardized naming:
max_iterationsis nowmax_turnsthroughout the SDK and backend. - Improved turn budget awareness and deepening strategies for the Penelope agent.
- Enhanced metric update functionality to prevent overwriting with null values.
- Updated metrics page to paginate metrics fetch, ensuring all backend type tabs are displayed.
- Improved client-side pagination for the metrics grid.
Fixed
- Prevented the goal judge from creating spurious turn count criteria.
- Addressed premature stopping and turn budget confusion in the Penelope agent.
- Stopped leaking turn budget into goal judge instructions.
- Corrected turn counting in conversational metrics to count user-assistant pairs.
- Prevented early stopping before reaching
max_turns. - Fixed focus loss and stale save button in the metric editor.
- Handled
Noneturn parameters in test configuration. - Fixed max-turns stop reason detection.
- Ensured conversational metrics receive
conversation_historyduring evaluation. - Fixed metric ID not being set after creation.
- Fixed default metric_scope for ConversationalJudge and GoalAchievementJudge.
- Fixed pagination robustness guards in the frontend.
Polyphemus v0.2.7
Added
- Added support for specifying custom HTTP headers in Polyphemus requests. This allows users to authenticate with APIs that require specific header-based authentication schemes.
- Added a new
--timeoutoption to the Polyphemus CLI to allow users to configure the request timeout duration.
Changed
- Improved error handling for network connectivity issues. Polyphemus now provides more informative error messages when encountering connection errors.
- Updated the default user agent string to include the Polyphemus version number.
Fixed
- Fixed an issue where Polyphemus would incorrectly parse URLs containing special characters.
- Resolved a bug that caused Polyphemus to crash when encountering malformed JSON responses.
Frontend v0.6.7
Added
- Added explicit
min_turnsparameter for early stop control in conversational evaluations. - Added
add_tests()andremove_tests()methods to the SDKTestSetfor bulk test association. - Added
min_turnsandmax_turnssupport to test configuration import/export and synthesizer. - Added client-side pagination to the metrics grid.
Changed
- Replaced the single "max turns" input with a turn configuration range slider on the test detail page and dual number inputs in the manual test writer, allowing configuration of both
min_turnsandmax_turns. - Renamed
max_iterationstomax_turnsthroughout the codebase to better reflect the semantics of conversation turns. - Updated the conversational judge to count turns as user-assistant pairs instead of individual messages.
- Improved early stopping behavior in conversational evaluations, preventing early termination before reaching 80% of
max_turns. - The
push()method in the SDK now supports both creating (POST) and updating (PUT) metrics. - Updated metrics page to paginate metrics fetch to show all backend type tabs.
Fixed
- Fixed focus loss and stale save button in the metric editor.
- Fixed metric update overwriting with null values in the backend.
- Fixed an issue where conversational metrics were not receiving the
conversation_historyparameter during evaluation. - Fixed an issue where the metrics page was not displaying all backend type tabs due to a fetch limit.
- Fixed an issue where the max-turns stop reason detection was using a stale "max iterations" string.
Backend v0.6.6
Added
- Added explicit
min_turnsparameter for early stop control in tests. - Added
min_turnsandmax_turnssupport to import/export and synthesizer features. - Added test association methods (
add_tests(),remove_tests()) to the SDK'sTestSetclass for bulk test linking. - Added client-side pagination to the metrics grid in the frontend.
Changed
- Replaced the maximum turns input in the frontend with a turn configuration range slider, allowing users to set both
min_turnsandmax_turns. - Standardized naming:
max_iterationshas been renamed tomax_turnsacross the backend, SDK, and documentation to reflect the actual semantics of conversation turns. - Updated the frontend to use an 80% default for
min_turnsin the test detail slider, matching the backend/Penelope default whenmin_turnsis not explicitly set. - Improved turn budget awareness and deepening strategies in Penelope, ensuring every turn contributes substantive testing value.
- Refactored Penelope's orchestration to simplify the codebase and improve evaluator voice.
Fixed
- Fixed an issue where the goal judge was creating spurious turn count criteria, leading to incorrect test failures.
- Fixed premature stopping issues in Penelope by decoupling goal-impossible conditions from
min_turnsand clarifying turn budget. - Fixed a bug where metric updates could overwrite existing data with null values.
- Fixed focus loss and stale save button issues in the metric editor in the frontend.
- Fixed an issue where conversational metrics were not receiving the
conversation_history, causing errors. - Fixed metrics page pagination to show all backend type tabs, even with a large number of metrics.
- Fixed an issue where the conversational judge was incorrectly counting turns.
- Fixed an issue preventing early stopping before reaching
max_turns. - Fixed an issue where the push() method was discarding the backend response, leaving metric.id as None after creation.
- Fixed max-turns stop reason detection to check for "maximum turns" instead of the stale "max iterations" string.
Removed
- Removed unnecessary indirection layers in Penelope's orchestration, simplifying the codebase.
Platform v0.6.6
Platform Release
This release includes the following component versions:
- Backend 0.6.5
- Frontend 0.6.6
- SDK 0.6.6
- Polyphemus 0.2.6
Summary of Changes
Backend v0.6.5:
- Security: Mitigated OAuth callback URL host header poisoning vulnerability. Addressed multiple Dependabot security alerts by updating vulnerable dependencies (cryptography, pillow, fastmcp, redis, langgraph-checkpoint, marshmallow, virtualenv, mammoth, langchain-core).
- Polyphemus Integration: Added core Polyphemus integration including service delegation tokens, access control system with request/grant workflow, and frontend UI for access requests.
- Tracing: Implemented conversation-based tracing across SDK, backend, and frontend, enabling the linking of multi-turn conversation interactions under a shared trace_id. Added UI improvements for conversation traces, including turn navigation, edge labels, and resizable trace detail drawer.
- Test Set Type Enforcement: Enforced the requirement of
test_set_typeon test set creation across the backend, frontend, and SDK, and enforced type-matching when assigning tests to test sets.
Frontend v0.6.6:
- Added Polyphemus access request UI, including access request modal, model card UI states, and Polyphemus provider icon/logo.
- Improved trace UI with conversation tracing support, including conversation icon in trace list, type filter buttons, Conversation View tab, turn labels, and turn navigation.
- Enhanced trace graph view with turn labels on edges, progressive agent invocation count, and improved edge routing.
- Fixed security vulnerabilities in frontend transitive dependencies.
SDK v0.6.6:
- Enhanced Polyphemus integration with access control, delegation tokens, and UI.
- Improved LLM error handling with retries, logging, and fallback mechanisms.
- Added conversation-based tracing across SDK, backend, and frontend for multi-turn interactions.
- Addressed multiple security vulnerabilities by updating dependencies and migrating to PyJWT.
Polyphemus v0.2.6:
- Added rate limiting to the Polyphemus service.
- Implemented access control and delegation tokens for Polyphemus authentication, including a request/grant workflow and frontend UI.
- Deployed vLLM to Vertex AI for Polyphemus, including caching GCP credentials and adding retry logic.
- Resolved multiple security vulnerabilities by updating dependencies, including migrating from python-jose to PyJWT.
See individual component changelogs for detailed changes: